Quickstart#

The sections below explain the most important classes, the concept of mixins and some notes on copying objects. Open the intro.ipynb notebook to see these classes in action or run it interactively on colab or binder:

Open in colab Open in binder

order follows a very strict API design and naming scheme.

  1. Methods start with (or at least contain) verbs. E.g. setters and getters are prefixed with set_ and get_, respectively.

  2. Properties implement type checks and, where applicable, conversions! Assigments such as my_object.some_string_attribute = 123 (can) immediately fail. This way, type ambiguities during analysis execution are avoided and detected early.

  3. The code style is PEP 8 compatible (checked via flake8).

UniqueObject and UniqueObjectIndex#

Before diving into the actual physics-related classes, it is necessary to understand their primary underlying base class: UniqueObject.

A UniqueObject essentially has two attributes: a name (string) and an id (integer). Within the same index, implemented by UniqueObjectIndex, both name and id are ensured to be unique. Therefore, the mechanism unambiguously assigns

  1. a human-readable name to objects (e.g.) for expressive use on the command-line, and

  2. a unique identifier to objects (e.g.) for encoding information in data structures such as ROOT trees.

Consider this example implementation of a physics process that is assigned to a dataset, which holds in index of processes.

import order as od

dataset = od.Dataset(name="tt_powheg", id=1)
process = od.Process(name="tt", id=1)

# add process to dataset
dataset.add_process(process)

Now, when creating another process with either the same name or id and trying to add it to the dataset, an exception will be raised.

process_same_name = od.Process(name="tt", id=2)
process_same_id = od.Process(name="dy", id=1)

dataset.add_process(process_same_name)
# -> DuplicateNameException

dataset.add_process(process_same_id)
# -> DuplicateIdException

The UniqueObjectIndex provides various convenience methods to add, extend, remove or lookup objects.

# dataset.processes is an index that stores od.Process objects

dataset.processes.get("tt")
# -> <Process at 0x10340d250, name=tt, id=1>

1 in dataset.processes
# -> True

"tt" in dataset.processes
# -> True

"dy" in dataset.processes
# -> False

dataset.processes.n.tt  # lookup by the name proxy "n"
# -> <Process at 0x10340d250, name=tt, id=1>

len(dataset.processes)
# -> 1

dataset.processes.clear()
len(dataset.processes)
# -> 0

As most classses within order inherit from UniqueObject or have one or more UniqueObjectIndex’s, you will find similar constructs all across the API.

Analysis, Campaign and Config#

An instance of the Analysis class represents the overarching object containing information of a physics analysis.

Varying requirements across data-taking periods, complex sub measurements, or simply different revisions of the same analysis are typical reasons why the structuring of information is quite demanding over the course of an analysis with sometimes unpredictable incidents and deadlines (code ↔︎ time uncertainty is a thing). For this purpose, order introduces two classes: Campaign and Config.

A Campaign describes and contains analysis-independent information, such as detector alignment settings, event simulation settings, recorded / simulated datasets, etc. In general, a pre-configured campaign object could be provided centrally by a working group or collaboration.

A Config object holds analysis-dependent information related to a certain campaign. Thus, a config is unambiguously assigned to both an analysis and a campaign.

Consider, for example, offline triggers that are used to select events specifically in one analysis. By construction, they should not be stored in a campaign object (which is analysis-independent), but they might also change between different data-taking periods and therefore should not be stored in the analysis object itself. Instead, a config object is an ideal place for such information.

import order as od

# the campaign (could be configured externally)
campaign_2018 = od.Campaign("data_taking_2018", 1, ecm=13)

# create the analysis
analysis = od.Analysis("my_analysis", 1)

# add a config for the 2018 campaign
# when no name or id are passed, it has the same as the campaign
cfg = analysis.add_config(campaign_2018)

# add trigger information as auxiliary data
cfg.set_aux("triggers", ["trigger_ee", "trigger_emu", "trigger_mumu"])

An analysis can contain several config objects for the same campaign. Just note that uniqueness rules apply here as well.

See the intro.ipynb notebook for more examples.

Dataset and Process#

Physics processes and simualted / recorded datasets are described by two classes: Process and Dataset.

Besides a name and an id, a Process object has cross sections for different center-of-mass energies, labels and colors for plotting purposes, and information about whether or not it describes real data or MC (it inherits from the DataSourceMixin, see mixin classes). Cross section values are automatically converted to scinum.Number instances, which are able to store multiple uncertainties, provide automatic error propagation and also support NumPy arrays.

Moreover, processes can have subprocesses and order provides convenience methods to work with arbitrarily deep process lookup.

import order as od
from scinum import Number, UP, DOWN

ttH = od.Process(
    name="ttH",
    id=1,
    xsecs={
        13: Number(0.5071, {
            "scale": (0.058j, 0.092j),  # relative scale uncertainty of +5.8/-9.2 %
            "pdf"  : 0.036j,            # relative pdf uncertainty of +-3.6 %
        }),
    },
   label=r"t\bar{t}H",
   color=(255, 0, 0),
)

# print the ttH cross section at 13 TeV with the up-shifted scale uncertainty
print(ttH.get_xsec(13)(UP, "scale"))
# -> 0.5365118

print(ttH.get_xsec(13).__class__)
# -> "scinum.Number"

# add the ttH (H -> bb) subprocess
ttH_bb = ttH.add_process(
    name="ttH_bb",
    id=2,
    xsecs={
        13: ttH.get_xsec(13) * 0.5824,  # branching ratio of H -> bb
    },
    label=ttH.label + r", H \rightarrow b\bar{b}",
)

# again, print the cross section for the up-shfited uncertainty, note the correct propagation
print(ttH_bb.get_xsec(13)(UP, "scale"))
# -> 0.3124645

# print the label in ROOT-style latex
print(ttH_bb.label_root)
# -> "t#bar{t}H, H #rightarrow b#bar{b}"

# check that the subprocess is really contained in the ttH subprocesses
print("ttH_bb" in ttH.processes)
# -> True

Information about datasets is stored in Dataset objects. Standard attributes are name and id, labels, and data/MC information. Optionally, a dataset can be assigned to a Campaign and to one or more Process objects. Let’s extend the above example:

dataset_ttH_bb = od.Dataset(
    name="ttH_bb",
    id=1,
    campaign=campaign_2018,
    processes=[ttH_bb],
    n_files=1000,
)

# the campaign is now aware of this dataset
print(dataset_ttH_bb in campaign_2018.datasets)
# -> True

# and the dataset knows about the ttH_bb process
print(ttH_bb in dataset_ttH_bb.processes)
# -> True

# as a little exercise, get all ttH subprocesses which are contained in the dataset
# this is, of course, only the ttH_bb process itself
print([p for p in ttH.processes if p in dataset_ttH_bb.processes])
# -> ["<Dataset at 0x1169421d0, name=ttH_bb, id=1>"]

# print the number of files
print(datasaet_ttH_bb.n_files)
# -> 1000

The last statement prints the number of files in that dataset. But what happens when systematic variations exist for that dataset? Let’s assume there are two variants that were generated with different top quark masses. Do we create two additional datasets? No. They are stored in the same dataset object.

A dataset stores DatasetInfo objects, containing information that may vary (e.g.) across systematic uncertainties. Examples are the number of files, the number of total events, or arbitrary auxiliary information (see the AuxDataMixin in the mixin classes below). In fact, the number of files n_files above is already stored in a DatasetInfo object, stored as dataset_ttH_bb.info["nominal"]. The attributes n_files and n_events of the nominal info object are forwarded to the dataset object itself. Say the dataset with the up variation of the top mass has 300 files. We can extend the dataset above retrospectively

dataset_ttH_bb.set_info("mtop_up", od.DatasetInfo(
    n_files=300,
    aux=dict(mtop=173.5),
))

or directly in the constructor

dataset_ttH_bb = od.Dataset(
    name="ttH_bb",
    id=1,
    campaign=campaign_2018,
    processes=[ttH_bb],
    info={
        "nominal": dict(n_files=1000, mtop=172.5),
        "mtop_up": dict(n_files=300, mtop=173.5),
    },
)

Note that the dictionaries passed in info are automatically converted to a DatasetInfo objects, and are accessible on the dataset itself via items (__getitem__). Also, the example shows how to use the auxiliary data storage capabilities, that most objects in order provide.

# print the number of files in the dataset with the up-varied top quark mass
print(dataset_ttH_bb["mtop_up"].n_files)
# -> 300

# also, print the respective top quark mass itself
print(dataset_ttH_bb["mtop_up"].get_aux("mtop"))
# -> 173.5

Channel and Category#

The typical nomenclature for distinguishing between phase space regions comprises channels and categories. The distinction between them is often somewhat arbitrary and may vary from analysis to analysis. A channel often refers to a very distinct event / data signature and when combining analyses, multiple of these channels are usually merged. In this scenario, a category describes a sub phase space of events within a channel.

order introduces the Channel and Category classes. However, as the definition above might not apply to all use cases, they can be used quite independently. They have a simple difference: while categories can have selection strings, channels cannot. This distinction might appear marginal but in some cases it turned out to be very helpful.

Categories can be (optionally) assigned to a channel. Likewise, categories are nestable:

import order as od

# create a channel
channel_e = od.Channel(
    name="e",
    id=1,
    title="Single electron",
)

# create a category
category_4j = od.Category(
    name="4j",
    id=1,
    channel=channel_e,
    selection="nJets == 4",
    title="4 jets",
)

# now, the channel knows about the category and vice versa
print(category_4j in channel_e.categories)
# -> True

# print the full category label
print(category_4j.full_label)
# -> "Single electron, 4 jets"

# now, add a subcategory
category_4j2b = category_4j.add_category(
    name="4j2b",
    id=2,
    channel=channel_e,
    selection=od.util.join_root_selection(category_4j.selection, "nBTags == 2"),
    title=category_4j.title + ", 2 b-tags",
)

# print the selection string
print(category_4j2b.selection)
# -> "Single electron, 4 jets"

# print the full category label
print(category_4j2b.full_label)
# -> "Single electron, 4 jets, 2 b-tags"

Shift#

A Shift object can be used to describe a systematic uncertainty. Its name must obey a simple naming scheme: either it is nominal or it has the format <source>_<direction> where source can be an arbitrary string and direction is either "up" or "down". Also, a shift can have a type such as Shift.RATE, Shift.SHAPE, or Shift.RATE_SHAPE to signify exclusive rate- or shape-changing effects, or both. As usual, shifts are unqique objects.

import order as od

pdf_up = od.Shift(name="pdf_up", id=1, type=Shift.SHAPE)

# print some properties
print(pdf_up.name)
# -> "pdf_up"

print(pdf_up.source)
# -> "pdf"

print(pdf_up.direction)
# -> "up"

print(pdf_up.is_down)
# -> False

# "nominal" has some special behavior
nom = od.Shift("nominal")
print(nom.name)
print(nom.source)
print(nom.direction)
# 3 x -> "nominal"

Variable#

A Variable is supposed to provide convenience for plotting purposes. Essentially, it stores a variable expression, additional selection strings (especially useful in conjunction with categories, see above), binning helpers, axis titles, and unit information. Here are some examples:

import order as od

v = od.Variable(
    name="myVar",
    expression="myBranchA * myBranchB",
    selection="myBranchC > 0",
    binning=(20, 0., 10.),
    x_title=r"$\mu p_{T}$",
    unit="GeV",
)

# access and print some attributes
print(v.expression)
# -> "myBranchA * myBranchB"

print(v.n_bins)
# -> 20

print(v.even_binning)
# -> True

print(v.x_title_root)
# -> "#mu p_{T}"

print(v.full_title())
# -> "myVar;$\mu p_{T}$" / GeV;Entries / 0.5 GeV'"

# add further selections
v.add_selection("myBranchD < 0", op="&&")
print(v.selection)
# -> "(myBranchC > 0) && (myBranchD < 0)"

v.add_selection("myBranchE < 5", op="||")
print(v.selection)
# -> "((myBranchC > 0) && (myBranchD < 0)) || (myBranchE < 5)"

# change the binning
v.binning = [0., 1., 5., 7., 9., 10.]
print(v.even_binning)
# -> False

print(v.n_bins)
# -> 5

Variables are unique objects. However, no id was set in the example above. This is because variables make use of the auto id mechanism. The default id in the variable constructor is Variable.AUTO_ID (or simply "+"), which tells order to automatically use an id that was not used before (usually the maximum of the currently used ids plus one).

Mixin classes#

Within order, common functionality is implemented in so-called mixin classes in the order.mixins module. Examples are the handling of auxiliary data, labels, data sources (data or MC), selection strings, etc. Most classes inherit from one or (often) more mixin classes listed below.

Copying objects#

Most classes inherit from the CopyMixin. As a result, instances can be copied via copy(), returning a full, deep copy including relations to other objects, or via copy_shallow() which copies everything but those relations. You can pass keyword arguments to configure / overwrite certain attributes of the copied object instead of copying them from the original one.

The copy mechanism can be demonstrated using Variable’s.

import order as od

jet1_pt = od.Variable(
    name="jet1_pt",
    id=1,  # explict id
    expression="jet_pt[0]",
    unit="GeV",
    binning=[40, 0., 400.],
    x_title=r"Leading jet p_{T}",
    x_title_short=r"jet1 p_{T}",
    log_y=True,
    tags={"jet_variable"},
)

# copy the variable and add a selection for high pt regimes
jet1_pt_high = jet1_pt.copy(
   name="jet1_pt_high",
   id=2,
   selection="jet_pt[0] > 200",
)

# copy the variable with same name and id, but overwrite the tags attribute
jet1_pt_untagged = jet1_pt.copy(tags=[])

print(jet1_pt.has_tag("jet_variable"))
# -> True

print(jet1_pt_untagged.has_tag("jet_variable"))
# -> False

Checkout the API reference of the specific classes to find detailed notes on their copy behavior.