fccanalysis-script – analysis steering script
specification
The analysis script is expected to be a valid Python script
containing either part of or the full analysis. There are two basic modes
how to run an analysis, one is to run in the managed mode like so:
- fccanalysis run analysis_script.py
or
- fccanalysis final analysis_script.py
where user needs to provide minimal number of variables and
settings. In this mode the RDataFrame is managed for the user and it can be
controlled by defining several global attributes in the analysis script. The
other mode is to run the analysis script as a regular python script:
- python analysis_script.py
here user has full control over the RDataFrame, but has to create
all necessary scaffolding.
It is expected that the whole analysis will be split into several
stages and it can be done in one of the two styles:
- anascript_stage1.py -> anascript_stage2.py -> ... ->
anascript_stage_final.py -> plots.py
or
- analysis_histmaker.py -> plots.py
In the case of the first style there are at least three stages
required (anascript_stage1.py, anascript_stage_final.py, plots.py) and there
is no upper limit on the number of stages. In the case of the second style
only two stages are required. The first style is named "staged
analysis" and the second "histmaker analysis".
- staged
analysis
- The analysis script needs to contain RDFanalysis class of the
following structure:
- class RDFanalysis():
def analysers(df):
df2 = (
df
# define the muon collection
.Define("muons", "ReconstructedParticle::get(Muon0,
ReconstructedParticles)")
...
)
return df2
def output():
return ["muons", "muon_mass"]
- histmaker
analysis
- The analysis script needs to contain build_graph function of the
following structure:
- def build_graph(df, dataset):
results = []
df = df.Define("weight", "1.0")
weightsum = df.Sum("weight")
df = df.Define("muons",
"FCCAnalyses::ReconstructedParticle::sel_p(20)(muons_all)")
...
results.append(df.Histo1D(("muons_p_cut0", "",
*bins_p_mu), "muons_p"))
return results, weightsum
- plots script
- This stage does not require neither RDFanalysis class neither
build_graph function, it has it's own set of attributes, please see
the examples in the examples directory.
In case of running the FCCAnalysis in the managed mode user can
use the following global attributes to control the behavior of the
analysis.
- processList
(mandatory)
- Dictionary of process samples to be run over. Each process can have
several parameters: fraction
The analysis will run over reduced number of input files roughly
corresponding to the fraction of total events specified.
Default value: 1 (full process sample)
output
Specifies the stem for the output file(s). The stem will be used to create
output directory if there is more than one chunk or as a filename if there
is only one.
Default value: output.root
chunks
The analysis RDataFrame can be split into several chunks.
Default value: 1
- prodTag
(mandatory)
- Provides information where to find input files. There are several way how
to find the information, one of them uses YAML file which is being
searched for in the subfolders of $FCCDICTSDIR.
- outputDir
(mandatory)
- User can specify the directory for the output files. The output directory
can be overwriten by specifiing absolute path with `--output` commandline
argument.
- analysisName
(optional)
- Optional name for the analysis
Default value: empty string
- nCPUS
(optional)
- Number of threads the RDataFrame will use.
Default value: 4
- runBatch
(optional)
- Run the analysis on the HTCondor batch system.
Default value: False
- batchQueue
(optional)
- Batch queue name when running on HTCondor.
Default value: "longlunch"
- compGroup
(optional)
- Computing account when running on HTCondor.
Default value: "group_u_FCC.local_gen"
- outputDirEos
(optional)
- Output directory on EOS, if specified files will be copied there once the
batch job is done.
Default value: empty string
- eosType
(mandatory if outputDirEos is used)
- Type of the EOS proxy to be used.
Default value: empty string
- testFile
(optional)
- Location of the test file.
Default value: empty string
- graph
(optional)
- The computational graph of the analysis will be generated.
Default value: False
- graphPath
(optional)
- Location where the computational graph of the analysis should be stored.
Only paths with .dot and .png extensions are accepted.
Default value: empty string
- useDataSource
(optional)
- User can specify how to provide physical object collections to the
analyzers. If True the events will be loaded through the EDM4hep
RDataSource.
Default value: False
- do_weighted
(optional)
- Whether to use weighted or raw events. If True the events will be
weighted with EDM4hep's EventHeader.weight and all normalisation factors
calculated with sum of weights accordingly.
Default value: False
- procDict
- This variable controls which process dictionary will be used. It can be
either simple file name, absolute path or url. In the case of simple
filename, the file is being searched for first in the working directory
and then at the locations indicated in the $FCCDICTSDIR environment
variable.
This section is under construction. You are invited to help :)
fccanalysis(1), fccanalysis-run(1)
There are many contributors to the FCCAnalyses framework, but the
principal authors are:
Clement Helsens
Valentin Volkl
Gerardo Ganis
Part of the FCCAnalyses framework.