NAME

fccanalysis-script – analysis steering script specification

SYNOPSIS

: *

DESCRIPTION

The analysis script is expected to be a valid Python script containing either part of or the full analysis. There are two basic modes how to run an analysis, one is to run in the managed mode like so:

: fccanalysis run analysis_script.py

or

: fccanalysis final analysis_script.py

where user needs to provide minimal number of variables and settings. In this mode the RDataFrame is managed for the user and it can be controlled by defining several global attributes in the analysis script. The other mode is to run the analysis script as a regular python script:

: python analysis_script.py

here user has full control over the RDataFrame, but has to create all necessary scaffolding.

It is expected that the whole analysis will be split into several stages, which can be done in one of the two styles:

: anascript_stage1.py -> anascript_stage2.py -> ... -> anascript_stage_final.py -> plots.py

or

: analysis_histmaker.py -> plots.py

In the case of the first style there are at least three stages required (anascript_stage1.py, anascript_stage_final.py, plots.py) and there is no upper limit on the number of stages. In the case of the second style only two stages are required. The first style is named "staged analysis" and the second "histmaker analysis".

staged analysis: The analysis script needs to contain Analysis class of the following structure:

class Analysis():
def __init__(self, cmdline_args):
...

def analyzers(self, dframe):
dframe2 = (
dframe
# define the muon collection
.Define("muons",
"ReconstructedParticle::get(Muon0, ReconstructedParticles)")
...
)
return dframe2

def output(self):
return ["muons", "muon_mass"]

histmaker analysis: The analysis script needs to contain build_graph function of the following structure:

: def build_graph(df, dataset):
results = []
df = df.Define("weight", "1.0")
weightsum = df.Sum("weight")
df = df.Define("muons", "FCCAnalyses::ReconstructedParticle::sel_p(20)(muons_all)")
...
results.append(df.Histo1D(("muons_p_cut0", "", *bins_p_mu), "muons_p"))
return results, weightsum

final and plots stages: These stages do not require Analysis class neither build_graph function, they have their own set of attributes, please see the examples in the examples directory or fccanalysis-final-script(7) and fccanalysis-plots-script(7).

ATTRIBUTES

In case of running the FCCAnalysis in the managed mode user can use the following global attributes to control the behavior of the analysis.

process_list (mandatory): Dictionary of process samples to be run over. Each process can have several parameters: fraction
The analysis will run over reduced number of input files roughly corresponding to the fraction of total events specified.
Default value: 1 (full process sample)
input_dir
Specifies the location of the process different from the globally set one. This overrides both prod_tag and input_dir attributes of the analysis script.
Default value: None
output
Specifies the stem for the output file(s). The stem will be used to create output directory if there is more than one chunk or as a filename if there is only one.
Default value: output.root
chunks
The analysis can be split into several output chunks (useful on distributed systems).
Default value: 1
prod_tag (mandatory): Provides information where to find input files. There are several way how to find the information, one of them uses YAML file which is being searched for in the sub-folders of $FCCDICTSDIR.
output_dir (mandatory): User can specify the directory for the output files.
analysis_name (optional): Optional name for the analysis
Default value: empty string
n_threads (optional): Number of threads the RDataFrame will use during processing.
Default value: 1
batch_queue (optional): Batch queue name when running on HTCondor.
Default value: "longlunch"
comp_group (optional): Computing account when running on HTCondor.
Default value: "group_u_FCC.local_gen"
output_dir_eos (optional): Output directory on EOS, if specified files will be copied there once the batch job is done.
Default value: empty string
eos_type (optional): Type of the EOS proxy to be used.
Default value: user
test_file (optional): Location of the test file provided as a string (str) or a template (string.Template). In case of the template two variables are supported: key4hep_os and key4hep_stack.
Default value: empty string
graph (optional): The computational graph of the analysis will be generated.
Default value: False
graph_path (optional): Location where the computational graph of the analysis should be stored. Only paths with .dot and .png extensions are accepted.
Default value: empty string
use_data_source (optional): User can specify how to provide physical object collections to the analyzers. If True the events will be loaded through the EDM4hep RDataSource.
Default value: False
do_weighted (optional): Whether to use weighted or raw events. If True the events will be weighted with EDM4hep's EventHeader.weight and all normalisation factors calculated with sum of weights accordingly.
Default value: False
proc_dict: This variable controls which process dictionary will be used. It can be either simple file name, absolute path or url. In the case of simple filename, the file is being searched for first in the working directory and then at the locations indicated in the $FCCDICTSDIR environment variable.

This section is under construction. You are invited to help :)

BUGS

Many

AUTHORS

There are many contributors to the FCCAnalyses framework, but the principal authors are:
Clement Helsens
Valentin Volkl
Gerardo Ganis

FCCANALYSES

Part of the FCCAnalyses framework.

LINKS

FCCAnalyses webpage

FCCAnalysises GitHub repository

FCCSW Forum

CONTACT

FCC-PED-SoftwareAndComputing-Analysis