order.dataset

Classes to define datasets.

Class Dataset

class Dataset(*args, **kwargs)[source]

Bases: UniqueObject, CopyMixin, AuxDataMixin, TagMixin, DataSourceMixin, LabelMixin

Dataset definition providing two kinds of information:

  1. (systematic) shift-dependent, and

  2. shift-indepent information.

Independent is e.g. whether or not it contains real data, whereas shift-dependent information is e.g. the number of events in the nominal or a shifted variation. Latter information is contained in DatasetInfo objects that are stored in this class and mapped to strings. These info objects can be accessed via get_info() or via items (__getitem__). For convenience, some of the properties of the nominal DatasetInfo object are accessible on this class via forwarding.

Arguments

A dataset is always measured in (real data) / created for (MC) a dedicated campaign, therefore it belongs to a Campaign object. In addition, physics processes can be linked to a dataset, therefore it has Process objects.

When info is does not contain a nominal DatasetInfo object (mapped to the key order.shift.Shift.NOMINAL, i.e., "nominal"), all kwargs are used to create one. Otherwise, it should be a dictionary matching the format of the info mapping. label and label_short are forwarded to the LabelMixin, is_data to the DataSourceMixin, tags to the TagMixin, aux to the AuxDataMixin, and name and id to the UniqueObject constructor.

Copy behavior

The campaign attribute is carried over as a reference, all remaining attributes are copied. Note that the copied dataset is also registered in the campaign.

Example

import order as od

campaign = od.Campaign("2017B", 1, ...)

d = od.Dataset(
    name="ttH_bb",
    id=1,
    campaign=campaign,
    keys=["/ttHTobb_M125.../.../..."],
    n_files=123,
    n_events=456789,
)

d.info.keys()
# -> ["nominal"]

d["nominal"].n_files
# -> 123

d.n_files
# -> 123

# similar to above, but set explicit info objects
d = Dataset(
    name="ttH_bb",
    id=1,
    campaign=campaign,
    info={
        "nominal": {
            "keys": ["/ttHTobb_M125.../.../..."],
            "n_files": 123,
            "n_events": 456789,
        },
        "scale_up": {
            "keys": ["/ttHTobb_M125_scaleUP.../.../..."],
            "n_files": 100,
            "n_events": 40000,
        },
    },
)

d.info.keys()
# -> ["nominal", "scale_up"]

d["nominal"].n_files
# -> 123

d.n_files
# -> 123

d["scale_up"].n_files
# -> 100

Members

campaign
type: Campaign, None

The Campaign object this dataset belongs to. When set, this dataset is also added to the dataset index of the campaign object.

info
type: dictionary

Mapping of shift names to DatasetInfo instances.

keys
type: list
read-only

The dataset keys of the nominal DatasetInfo object.

n_files
type: integer
read-only

The number of files of the nominal DatasetInfo object.

n_events
type: integer
read-only

The number of events of the nominal DatasetInfo object.

processes
type: UniqueObjectIndex
read-only

The UniqueObjectIndex of child processes.

Methods:

copy(*args, **kwargs[, _specs, _skip])

Creates a copy of this instance and returns it.

set_info(shift_name, info)

Sets an DatasetInfo object info for a given shift_name.

get_info(shift_name)

Returns the DatasetInfo object for a given shift_name.

add_process(*args, **kwargs)

Adds a child process to the processes index and returns it.

clear_processes()

Removes all child processes from the processes index.

extend_processes(objs)

Adds multiple child processes to the processes index and returns the added objects in a list.

get_leaf_processes()

Returns all child processes from the processes index that have no child processes themselves in a recursive fashion.

get_process(obj[, deep, default])

Returns a child process given by obj, which might be a name, id, or an instance from the processes index.

has_process(obj[, deep])

Checks if the processes index contains an obj which might be a name, id, or an instance.

remove_process(obj[, silent])

Removes a child process given by obj, which might be a name, id, or an instance from the processes index and returns the removed object.

walk_processes([depth_first, include_self])

Walks through the processes index and per iteration, yields a child process, its depth relative to this process, and its child processes in a list that can be modified to alter the walking.

Attributes:

has_processes

Returns True when this process has child processes, False otherwise.

is_leaf_process

Returns True when this process has no child processes, False otherwise.

copy(*args, **kwargs, _specs=None, _skip=None)[source]

Creates a copy of this instance and returns it. All args and kwargs are converted to named arguments (based on the init signature) and set as attributes of the created copy. Additional specifications per attribute are taken from copy_specs or _specs if set. _skip can be a sequence of source attribute names that should be skipped.

set_info(shift_name, info)[source]

Sets an DatasetInfo object info for a given shift_name. Returns the object.

get_info(shift_name)[source]

Returns the DatasetInfo object for a given shift_name.

add_process(*args, **kwargs)

Adds a child process to the processes index and returns it. See UniqueObjectIndex.add() for more info.

clear_processes()

Removes all child processes from the processes index.

extend_processes(objs)

Adds multiple child processes to the processes index and returns the added objects in a list.

get_leaf_processes()

Returns all child processes from the processes index that have no child processes themselves in a recursive fashion.

get_process(obj, deep=True, default=no_default)

Returns a child process given by obj, which might be a name, id, or an instance from the processes index. If deep is True, the lookup is recursive. When no process is found, default is returned when set. Otherwise, an error is raised.

has_process(obj, deep=True)

Checks if the processes index contains an obj which might be a name, id, or an instance. If deep is True, the lookup is recursive.

property has_processes

Returns True when this process has child processes, False otherwise.

property is_leaf_process

Returns True when this process has no child processes, False otherwise.

remove_process(obj, silent=False)

Removes a child process given by obj, which might be a name, id, or an instance from the processes index and returns the removed object. Unless silent is True, an error is raised if the object was not found. See UniqueObjectIndex.remove() for more info.

walk_processes(depth_first=False, include_self=False)

Walks through the processes index and per iteration, yields a child process, its depth relative to this process, and its child processes in a list that can be modified to alter the walking. When depth_first is True, iterate depth-first instead of the default breadth-first. When include_self is True, also yield this process instance with a depth of 0.

Class DatasetInfo

class DatasetInfo(keys=None, n_files=-1, n_events=-1, tags=None, aux=None)[source]

Bases: CopyMixin, AuxDataMixin, TagMixin

Container class holding information on particular dataset variations. Instances of this class are typically used in Dataset objects to store shift-dependent information, such as the number of files or events for a particular shift (e.g. nominal, scale_up, etc).

Arguments

keys denote the identifiers or origins of a dataset. n_files and n_events can be used for further bookkeeping. tags are forwarded to the TagMixin, and aux to the AuxDataMixin.

Copy behavior

All attributes are copied.

Members

keys
type: list

The dataset keys, e.g. ["/ttHTobb_M125.../.../..."].

n_files
type: integer

The number of files.

n_events
type: integer

The number of events.