fccanalysis-script – analysis steering script
specification
The analysis script is expected to be a valid Python script
containing either part of or the full analysis. There are two basic modes
how to run an analysis, one is to run in the managed mode like so:
- fccanalysis run analysis_script.py
or
- fccanalysis final analysis_script.py
where user needs to provide a set of required attributes and
definitions in their analysis script. In this mode the RDataFrame is managed
for the user and the analysis behavior can be controlled by providing
additional attributes in the analysis script. The other mode is to run the
analysis script as a regular python script:
- python analysis_script.py
here user has full control over the RDataFrame, but has to create
all necessary scaffolding.
Usually it is the case that the whole analysis will be split into
several stages, which can be done in one of the two styles:
- anascript_stage1.py -> anascript_stage2.py -> ... ->
anascript_stage_final.py -> plots.py
or
- analysis_histmaker.py -> plots.py
In the case of the first style there are at least three stages
required (e.g. anascript_stage1.py, anascript_stage_final.py,
plots.py) and there is no upper limit on the number of stages. In the
case of the second style only two stages are required. The first style is
named "staged style" and the second "histmaker
style".
- Staged style
analysis
- The analysis script needs to contain Analysis class of the
following structure:
- class Analysis():
def __init__(self, cmdline_args):
...
def analyzers(self, dframe):
dframe2 = (
dframe
# define the muon collection
.Define("muons",
"ReconstructedParticle::get(Muon0, ReconstructedParticles)")
...
)
return dframe2
def output(self):
return ["muons", "muon_mass"]
- Histmaker style
analysis
- The analysis script needs to contain build_graph function of the
following structure:
- def build_graph(df, dataset):
results = []
df = df.Define("weight", "1.0")
weightsum = df.Sum("weight")
df = df.Define("muons",
"FCCAnalyses::ReconstructedParticle::sel_p(20)(muons_all)")
...
results.append(df.Histo1D(("muons_p_cut0", "",
*bins_p_mu), "muons_p"))
return results, weightsum
- Final and Plots
stages
- These stages do not require Analysis class neither
build_graph function, they have their own set of attributes, please
see the examples in the examples of FCCAnalyses repository or
fccanalysis-final-script(7) and fccanalysis-plots-script(7) manual pages.
In case of running the FCCAnalysis in the staged style user
can/should add the following attributes to the Analysis class in
order to control the behavior of the analysis.
- process_list
(mandatory)
- Dictionary of process samples to be run over. Each process can have
several parameters:
fraction
The analysis will run over reduced number of input files roughly
corresponding to the fraction of total events specified.
Default value: 1 (full process sample)
input_dir
Specifies the location of the process different from the globally set one.
This overrides both prod_tag and input_dir attributes of the analysis
script.
Default value: None
output
Specifies the stem for the output file(s). The stem will be used to create
output directory if there is more than one chunk or as a filename if there
is only one.
Default value: output.root
chunks
The analysis can be split into several output chunks (useful on distributed
systems).
Default value: 1
- prod_tag
(mandatory)
- Provides information where to find input files. There are several way how
to find the information, one of them uses YAML file which is being
searched for in the sub-folders of $FCCDICTSDIR.
- output_dir
(mandatory)
- User can specify the directory for the output files.
- analysis_name
(optional)
- Optional name for the analysis
Default value: empty string
- n_threads
(optional)
- Number of threads the RDataFrame will use during processing. To use all
threads available on the machine specify -1.
Default value: 1
- inlude_paths
(optional)
- Specify additional include header files to be JIT compiled.
Default value: None
- batch_queue
(optional)
- Batch queue name when running on HTCondor.
Default value: "longlunch"
- comp_group
(optional)
- Computing account when running on HTCondor.
Default value: "group_u_FCC.local_gen"
- output_dir_eos
(optional)
- Output directory on EOS, if specified files will be copied there once the
batch job is done.
Default value: empty string
- eos_type
(optional)
- Type of the EOS proxy to be used.
Default value: user
- test_file
(optional)
- Location of the test file provided as a string (str) or a template
(string.Template). In case of the template two variables are
supported: key4hep_os and key4hep_stack.
Default value: empty string
- graph
(optional)
- The computational graph of the analysis will be generated.
Default value: False
- graph_path
(optional)
- Location where the computational graph of the analysis should be stored.
Only paths with .dot and .png extensions are accepted.
Default value: empty string
- use_data_source
(optional)
- User can specify how to provide physical object collections to the
analyzers. If True the events will be loaded through the EDM4hep
RDataSource.
Default value: False
- do_weighted
(optional)
- Whether to use weighted or raw events. If True the events will be
weighted with EDM4hep's EventHeader.weight and all normalisation factors
calculated with sum of weights accordingly.
Default value: False
- proc_dict
- This variable controls which process dictionary will be used. It can be
either simple file name, absolute path or url. In the case of simple
filename, the file is being searched for first in the working directory
and then at the locations indicated in the $FCCDICTSDIR environment
variable.
- n_events_max
- Allows to limit the amount of events to be processed. The limit is applied
per sample.
This section is under construction. You are invited to help :)
fccanalysis(1), fccanalysis-run(1)
There are many contributors to the FCCAnalyses framework, but the
principal authors are:
Clement Helsens
Valentin Volkl
Gerardo Ganis
Part of the FCCAnalyses framework.