FCCANALYSIS-SCRIPT(7) fccanalysis-script man page FCCANALYSIS-SCRIPT(7)

fccanalysis-script – analysis steering script specification

*

The analysis script is expected to be a valid Python script containing either part of or the full analysis. There are two basic modes how to run an analysis, one is to run in the managed mode like so:

fccanalysis run analysis_script.py

or

fccanalysis final analysis_script.py

where user needs to provide minimal number of variables and settings. In this mode the RDataFrame is managed for the user and it can be controlled by defining several global attributes in the analysis script. The other mode is to run the analysis script as a regular python script:

python analysis_script.py

here user has full control over the RDataFrame, but has to create all necessary scaffolding.

It is expected that the whole analysis will be split into several stages and it can be done in one of the two styles:

anascript_stage1.py -> anascript_stage2.py -> ... -> anascript_stage_final.py -> plots.py

or

analysis_histmaker.py -> plots.py

In the case of the first style there are at least three stages required (anascript_stage1.py, anascript_stage_final.py, plots.py) and there is no upper limit on the number of stages. In the case of the second style only two stages are required. The first style is named "staged analysis" and the second "histmaker analysis".

The analysis script needs to contain RDFanalysis class of the following structure:
class RDFanalysis():
def analysers(df):
df2 = (
df
# define the muon collection
.Define("muons", "ReconstructedParticle::get(Muon0, ReconstructedParticles)")
...
)
return df2
def output():
return ["muons", "muon_mass"]
The analysis script needs to contain build_graph function of the following structure:
def build_graph(df, dataset):
results = []
df = df.Define("weight", "1.0")
weightsum = df.Sum("weight")
df = df.Define("muons", "FCCAnalyses::ReconstructedParticle::sel_p(20)(muons_all)")
...
results.append(df.Histo1D(("muons_p_cut0", "", *bins_p_mu), "muons_p"))
return results, weightsum
This stage does not require neither RDFanalysis class neither build_graph function, it has it's own set of attributes, please see the examples in the examples directory.

In case of running the FCCAnalysis in the managed mode user can use the following global attributes to control the behavior of the analysis.

Dictionary of process samples to be run over. Each process can have several parameters: fraction
The analysis will run over reduced number of input files roughly corresponding to the fraction of total events specified.
Default value: 1 (full process sample)
output
Specifies the stem for the output file(s). The stem will be used to create output directory if there is more than one chunk or as a filename if there is only one.
Default value: output.root
chunks
The analysis RDataFrame can be split into several chunks.
Default value: 1
Provides information where to find input files. There are several way how to find the information, one of them uses YAML file which is being searched for in the subfolders of $FCCDICTSDIR.
User can specify the directory for the output files. The output directory can be overwriten by specifiing absolute path with `--output` commandline argument.
Optional name for the analysis
Default value: empty string
Number of threads the RDataFrame will use.
Default value: 4
Run the analysis on the HTCondor batch system.
Default value: False
Batch queue name when running on HTCondor.
Default value: "longlunch"
Computing account when running on HTCondor.
Default value: "group_u_FCC.local_gen"
Output directory on EOS, if specified files will be copied there once the batch job is done.
Default value: empty string
Type of the EOS proxy to be used.
Default value: empty string
Location of the test file.
Default value: empty string
The computational graph of the analysis will be generated.
Default value: False
Location where the computational graph of the analysis should be stored. Only paths with .dot and .png extensions are accepted.
Default value: empty string
User can specify how to provide physical object collections to the analyzers. If True the events will be loaded through the EDM4hep RDataSource.
Default value: False
Whether to use weighted or raw events. If True the events will be weighted with EDM4hep's EventHeader.weight and all normalisation factors calculated with sum of weights accordingly.
Default value: False
This variable controls which process dictionary will be used. It can be either simple file name, absolute path or url. In the case of simple filename, the file is being searched for first in the working directory and then at the locations indicated in the $FCCDICTSDIR environment variable.

This section is under construction. You are invited to help :)

fccanalysis(1), fccanalysis-run(1)

Many

There are many contributors to the FCCAnalyses framework, but the principal authors are:
Clement Helsens
Valentin Volkl
Gerardo Ganis

Part of the FCCAnalyses framework.

FCCAnalyses webpage

FCCAnalysises GitHub repository

FCCSW Forum

FCC-PED-SoftwareAndComputing-Analysis

17 Jan 2024 0.10.0