order.dataset
Contents
order.dataset#
Classes to define datasets.
Contents
Class Dataset#
- class Dataset(*args, **kwargs)[source]#
Bases:
order.unique.UniqueObject,order.mixins.CopyMixin,order.mixins.AuxDataMixin,order.mixins.TagMixin,order.mixins.DataSourceMixin,order.mixins.LabelMixinDataset definition providing two kinds of information:
(systematic) shift-dependent, and
shift-indepent information.
Independent is e.g. whether or not it contains real data, whereas shift-dependent information is e.g. the number of events in the nominal or a shifted variation. Latter information is contained in
DatasetInfoobjects that are stored in this class and mapped to strings. These info objects can be accessed viaget_info()or via items (__getitem__). For convenience, some of the properties of the nominalDatasetInfoobject are accessible on this class via forwarding.Arguments
A dataset is always measured in (real data) / created for (MC) a dedicated campaign, therefore it belongs to a
Campaignobject. In addition, physics processes can be linked to a dataset, therefore it hasProcessobjects.When info is does not contain a nominal
DatasetInfoobject (mapped to the keyorder.shift.Shift.NOMINAL, i.e.,"nominal"), all kwargs are used to create one. Otherwise, it should be a dictionary matching the format of the info mapping. label and label_short are forwarded to theLabelMixin, is_data to theDataSourceMixin, tags to theTagMixin, aux to theAuxDataMixin, and name and id to theUniqueObjectconstructor.Copy behavior
copy()The
campaignattribute is carried over as a reference, all remaining attributes are copied. Note that the copied dataset is also registered in the campaign.copy_shallow()All attributs are copied except for the
campaignand containdprocesseswhich are set to default values instead.Example
import order as od campaign = od.Campaign("2017B", 1, ...) d = od.Dataset( name="ttH_bb", id=1, campaign=campaign, keys=["/ttHTobb_M125.../.../..."], n_files=123, n_events=456789, gen_order="nlo", ) d.info.keys() # -> ["nominal"] d["nominal"].n_files # -> 123 d.n_files # -> 123 # similar to above, but set explicit info objects d = Dataset( name="ttH_bb", id=1, campaign=campaign, info={ "nominal": { "keys": ["/ttHTobb_M125.../.../..."], "n_files": 123, "n_events": 456789, "gen_order": "nlo", }, "scale_up": { "keys": ["/ttHTobb_M125_scaleUP.../.../..."], "n_files": 100, "n_events": 40000, "gen_order": "nlo", }, }, ) d.info.keys() # -> ["nominal", "scale_up"] d["nominal"].n_files # -> 123 d.n_files # -> 123 d["scale_up"].n_files # -> 100
Members
- campaign#
type:
Campaign, NoneThe
Campaignobject this dataset belongs to. When set, this dataset is also added to the dataset index of the campaign object.
- info#
type: dictionary
Mapping of shift names to
DatasetInfoinstances.
- keys#
type: list (read-only)
The dataset keys of the nominal
DatasetInfoobject.
- n_files#
type: integer (read-only)
The number of files of the nominal
DatasetInfoobject.
- n_events#
type: integer (read-only)
The number of events of the nominal
DatasetInfoobject.
- gen_order#
type: string (read-only)
The generator perturbation order of the nominal
DatasetInfoobject.
- processes#
type:
UniqueObjectIndex(read-only)The
UniqueObjectIndexof child processes.
Methods:
copy(*args, **kwargs[, _specs, _skip])Creates a copy of this instance and returns it.
set_info(name, info)Sets an
DatasetInfoobject info for a given name.get_info(name)Returns the
DatasetInfoobject for a given name.add_process(*args, **kwargs)Adds a child process to the
processesindex and returns it.Removes all child processes from the
processesindex.extend_processes(objs)Adds multiple child processes to the
processesindex and returns the added objects in a list.Returns all child processes from the
processesindex that have no child processes themselves in a recursive fashion.get_process(obj[, deep, default])Returns a child process given by obj, which might be a name, id, or an instance from the
processesindex.has_process(obj[, deep])Checks if the
processesindex contains an obj which might be a name, id, or an instance.remove_process(obj[, silent])Removes a child process given by obj, which might be a name, id, or an instance from the
processesindex and returns the removed object.walk_processes([algo, depth_first, include_self])Walks through the
processesindex and per iteration, yields a child process, its depth relative to this process, and its child processes in a list that can be modified to alter the walking.Attributes:
Returns True when this process has child processes, False otherwise.
Returns True when this process has no child processes, False otherwise.
- copy(*args, **kwargs, _specs=None, _skip=None)[source]#
Creates a copy of this instance and returns it. All args and kwargs are converted to named arguments (based on the init signature) and set as attributes of the created copy. Additional specifications per attribute are taken from
copy_specsor _specs if set. _skip can be a sequence of source attribute names that should be skipped.
- set_info(name, info)[source]#
Sets an
DatasetInfoobject info for a given name. Returns the object.
- get_info(name)[source]#
Returns the
DatasetInfoobject for a given name.
- add_process(*args, **kwargs)#
Adds a child process to the
processesindex and returns it. SeeUniqueObjectIndex.add()for more info.
- extend_processes(objs)#
Adds multiple child processes to the
processesindex and returns the added objects in a list.
- get_leaf_processes()#
Returns all child processes from the
processesindex that have no child processes themselves in a recursive fashion. Possible duplicates due to nested structures are removed.
- get_process(obj, deep=True, default=no_default)#
Returns a child process given by obj, which might be a name, id, or an instance from the
processesindex. If deep is True, the lookup is recursive through potentially nested child processes. When no process is found, default is returned when set. Otherwise, an error is raised.
- has_process(obj, deep=True)#
Checks if the
processesindex contains an obj which might be a name, id, or an instance. If deep is True, the lookup is recursive through potentially nested child processes.
- property has_processes#
Returns True when this process has child processes, False otherwise.
- property is_leaf_process#
Returns True when this process has no child processes, False otherwise.
- remove_process(obj, silent=False)#
Removes a child process given by obj, which might be a name, id, or an instance from the
processesindex and returns the removed object. Unless silent is True, an error is raised if the object was not found. SeeUniqueObjectIndex.remove()for more info.
- walk_processes(algo='bfs', depth_first=False, include_self=False)#
Walks through the
processesindex and per iteration, yields a child process, its depth relative to this process, and its child processes in a list that can be modified to alter the walking.The traversal order is defined by algo which allows different values (more info):
"bfs": Breadth-first search."dfs_preorder": Pre-order depth-first search."dfs_postorder": Post-order depth-first search."dfs": Alias for"dfs_preorder"."dfs_pre": Alias for"dfs_preorder"."dfs_post": Alias for"dfs_postorder".
When include_self is True, this process instance is yielded as well with a depth of 0.
Class DatasetInfo#
- class DatasetInfo(keys=None, n_files=- 1, n_events=- 1, gen_order=None, tags=None, aux=None)[source]#
Bases:
order.mixins.CopyMixin,order.mixins.AuxDataMixin,order.mixins.TagMixinContainer class holding information on particular dataset variations. Instances of this class are typically used in
Datasetobjects to store shift-dependent information, such as the number of files or events for a particular shift (e.g. nominal, scale_up, etc).Arguments
keys denote the identifiers or origins of a dataset. n_files and n_events can be used for further bookkeeping. tags are forwarded to the
TagMixin, and aux to theAuxDataMixin.Copy behavior
All attributes are copied.
Members
- keys#
type: list
The dataset keys, e.g.
["/ttHTobb_M125.../.../..."].
- n_files#
type: integer
The number of files.
- n_events#
type: integer
The number of events.
- gen_order#
type: string
The generator perturbation order.