3.5. FCC-ee photon/pi0 discrimination using noble liquid calorimeter

Learning Objectives

This tutorial will teach you how to:

produce flat ntuples from full simulation samples using FCCAnalyses
infer state of the art deep learning within FCCAnalyses
produce discrimination plots single photons/pi0

3.5.1. Installation of FCCAnalyses

For this tutorial we will need to use FCCAnalyses. We can either use it from the stack, or if you have it installed locally use it. If you want to install it, you need to clone and install it. Go inside the area that you have setup for the tutorials and get the FCCAnalyses code:

git clone https://github.com/HEP-FCC/FCCAnalyses.git

Go inside the directory and run

source ./setup.sh
mkdir build install
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=../install
make install
cd ..

3.5.2. Build FCCAnalyses skeleton

We start by creating a file named analysis_tutorial_mva.py (or any other name you prefer to use) with the following information.

# Mandatory: RDFanalysis class where the user defines the operations on the TTree
class RDFanalysis():
    #__________________________________________________________
    # Mandatory: analysers function to define the analysers to process, please make sure you return the last dataframe, in this example it is df2
    def analysers(df):
        df2 = (df

              )
        return df2

    #__________________________________________________________
    # Mandatory: output function, please make sure you return the branchlist as a python list
    def output():
        branchList = [

        ]
        return branchList

At the top of the file, we need to define special variables needed to load the DD4Hep geometry, but first we need to get the path where they are located from the python code:

# get the environment variable FCCDETECTORS
from os import getenv
FCCDETECTORS = getenv("FCCDETECTORS")
# define geometryFile and readoutName to load a DD4Hep detector and readout
geometryFile = FCCDETECTORS + "/Detector/DetFCCeeIDEA-LAr/compact/FCCee_DectMaster.xml"
readoutName  = "ECalBarrelPhiEta"

At the top of the file, also add tesFile= to point to where your files are located from the calorimeter full simulation tutorial. If you do not have those files, you can use those two:

testFile='/eos/experiment/fcc/ee/tutorial/pi0GammaLAr2022/edm4hepFormat_smallSampleNotUsedForTraining/output_caloFullSim_10GeV_pdgId_22_noiseFalse.root'
#testFile='/eos/experiment/fcc/ee/tutorial/pi0GammaLAr2022/edm4hepFormat_smallSampleNotUsedForTraining/output_caloFullSim_10GeV_pdgId_111_noiseFalse.root'

3.5.3. Get the highest energetic cluster

Let’s now define new columns to the RootDataFrame object df2 inside the function analysers of the class RDFanalysis. We start by defining the index in the collection CaloClusters which is of type ClusterData of the cluster with maximum energy:

.Define("maxEnergyCluster_index", "std::distance(CaloClusters.energy.begin(), std::max_element(CaloClusters.energy.begin(), CaloClusters.energy.end()))")

Similarly, define the index in the collection CaloClusters of the cluster with minimum energy, the total number of clusters in the event and the cluster energy.

Suggested answer

.Define("minEnergyCluster_index", "std::distance(CaloClusters.energy.begin(), std::min_element(CaloClusters.energy.begin(), CaloClusters.energy.end()))")
.Define("clusters_n", "CaloClusters.energy.size()")
.Define("clusters_energy", "CaloClusters.energy")

Now we need to add all the variables that we have defined to the branchList of the output function of the same RDFanalysis class.

Suggested answer

"maxEnergyCluster_index", "minEnergyCluster_index", "clusters_n", "clusters_energy"

And let’s run on a few events to check that everything is fine. To find the options to configure the code to run on a few (10) events using as input file the test file in the python testFile and to store an output file photons.root using the command:

fccanalysis run --help

Suggested answer

fccanalysis run analysis_tutorial_mva.py --nevents 10 --output photons.root --test

Open the produced root file photons.root and inspect it with Scan for example check that the index we have calculated indeed correspond to the clusters of maximum/minimum energy:

Suggested answer

root -l photons.root
events->Scan("maxEnergyCluster_index:minEnergyCluster_index:clusters_n:clusters_energy")

3.5.4. Get the list of cells from the cluster

We now need to obtain the index of the first and last cells of the maximum energy cluster. For that we need to select the maxEnergyCluster_index in the CaloClusters collection and evaluate hits_begin and hits_end:

Suggested answer

.Define("maxEnergyCluster_firstCell_index", "CaloClusters[maxEnergyCluster_index].hits_begin")
.Define("maxEnergyCluster_lastCell_index",  "CaloClusters[maxEnergyCluster_index].hits_end")

3.5.5. Create sub-collection of cells

Using the positions of the first and last cells in the PositionedCaloClusterCells collection we now need to create a sub-collection from the full collection between the two indices. To achieve this with the ROOT version in this tutorial, we need to declare some extra code after the definition of the analysers function:

ROOT.gInterpreter.Declare("""
template<typename T>
ROOT::VecOps::RVec<T> myRange(ROOT::VecOps::RVec<T>& vec, std::size_t begin, std::size_t end)
{
   ROOT::VecOps::RVec<T> ret;
   ret.reserve(end - begin);
   for (auto i = begin; i < end; ++i)
      ret.push_back(vec[i]);
   return ret;
}
""")

Note, With the root version used for this tutorial (ROOT v6.26) it is not possible to do this inline, while from ROOT v6.28 it will be possible using Take and Range:

Take(vec, Range(id_end - id_begin) + id_begin)

Now you can create the sub-collection maxEnergyCluster_Cells from the input collection PositionedCaloClusterCells from maxEnergyCluster_firstCell_index to maxEnergyCluster_lastCell_index using the newly defined myRange function:

Suggested answer

.Define("maxEnergyCluster_cells", "myRange(PositionedCaloClusterCells, maxEnergyCluster_firstCell_index, maxEnergyCluster_lastCell_index)")

3.5.6. Create collection of cells observables

Using the newly defined collection maxEnergyCluster_cells, create new variables of their energies, phi, theta, layer and number of cells, using functions like here

Suggested answer

.Define("maxEnergyCluster_cells_energy", "CaloNtupleizer::getCaloHit_energy(maxEnergyCluster_cells)" )
.Define("maxEnergyCluster_cells_phi",    "CaloNtupleizer::getCaloHit_phi(maxEnergyCluster_cells)" )
.Define("maxEnergyCluster_cells_theta",  "CaloNtupleizer::getCaloHit_theta(maxEnergyCluster_cells)" )
.Define("maxEnergyCluster_cells_layer" , "CaloNtupleizer::getCaloHit_layer(maxEnergyCluster_cells)" )
.Define("maxEnergyCluster_cells_n" ,     "maxEnergyCluster_cells.size()" )

The last variable to add is the radius of the cell position, you can compute it inline defining first collections maxEnergyCluster_cells_x and maxEnergyCluster_cells_y. Hint you need to use the myRange function PositionedCaloClusterCells and the position attribute of the collection which is of type edm4hep::CalorimeterHitData see here and you need to calculate the radius as usual with x and y.

Suggested answer

.Define("maxEnergyCluster_cells_x", "myRange(PositionedCaloClusterCells.position.x, maxEnergyCluster_firstCell_index, maxEnergyCluster_lastCell_index)")
.Define("maxEnergyCluster_cells_y", "myRange(PositionedCaloClusterCells.position.y, maxEnergyCluster_firstCell_index, maxEnergyCluster_lastCell_index)")
.Define("maxEnergyCluster_cells_radius", "sqrt(pow(maxEnergyCluster_cells_x,2)+pow(maxEnergyCluster_cells_y,2))")

Do not forget to add all the newly defined variables to the output branchList.

Suggested answer

"maxEnergyCluster_cells_energy","maxEnergyCluster_cells_phi","maxEnergyCluster_cells_theta","maxEnergyCluster_cells_layer","maxEnergyCluster_cells_n","maxEnergyCluster_cells_radius"

Let’s give it a try on a few events as last time.

Suggested answer

fccanalysis run analysis_tutorial_mva.py --nevents 10 --output photons.root --test

3.5.7. Adding the MVA inference

Now we add the weaver part at the beginning of the analysers function:

from ROOT import WeaverUtils
from os import getenv
test_inputs_path = getenv('TEST_INPUT_DATA_DIR', '/eos/experiment/fcc/ee/tutorial/PNet_pi0Gamma/v1')
weaver = WeaverUtils.setup_weaver(test_inputs_path + '/fccee_pi_vs_gamma_v1.onnx',
                                      test_inputs_path + '/preprocess_fccee_pi_vs_gamma_v1.json',
                                      ('recocells_e', 'recocells_theta', 'recocells_phi', 'recocells_radius', 'recocells_layer'))

We need to transform our collection of cells observables in a vector as we might want to evaluate several cluster in the same event:

# retrieve all information about jet constituents for each jet in collection
.Define("cells_e",         "Utils::as_vector(maxEnergyCluster_cells_energy)")
.Define("cells_theta",     "Utils::as_vector(maxEnergyCluster_cells_theta)")
.Define("cells_phi",       "Utils::as_vector(maxEnergyCluster_cells_phi)")
.Define("cells_radius",    "Utils::as_vector(maxEnergyCluster_cells_radius)")
.Define("cells_layer",     "Utils::as_vector(maxEnergyCluster_cells_layer)")

Then we evaluate the network:

 .Define("MVAVec", "WeaverUtils::get_weights(cells_e, cells_theta, cells_phi, cells_radius, cells_layer)")

and retrieve the weights

 .Define("Cluster_isPhoton", "WeaverUtils::get_weight(MVAVec, 0)")
 .Define("Cluster_isPi0",    "WeaverUtils::get_weight(MVAVec, 1)")

Finally add the newly defined variable to the branchList in the output function and run over the full statistics. Do not forget to change the testFile when switching pi0, photon

fccanalysis run analysis_tutorial_mva.py --output pi0s.root --test
fccanalysis run analysis_tutorial_mva.py --output photons.root --test

3.5.8. Removing the last layers

In this section we will remove the last layers of the calorimeter and evaluate the same MVA model with less layers. First let’s have a look at one the output file and try to find how many layers we have in the calorimeter. For that plot the number of layers for a few events and look at the histogram

Suggested answer

events->Draw("maxEnergyCluster_cells_layer","","",10)

Now you produce new file removing the last 2 layers of the calorimeter adding a filter on the cells. Need to comment one the definition of the cells collection and add the two lines below

#.Define("maxEnergyCluster_cells", "myRange(PositionedCaloClusterCells, maxEnergyCluster_firstCell_index, maxEnergyCluster_lastCell_index)")
.Define("maxEnergyCluster_cellsFull", "myRange(PositionedCaloClusterCells, maxEnergyCluster_firstCell_index, maxEnergyCluster_lastCell_index)")
.Define("maxEnergyCluster_cells", ROOT.CaloNtupleizer.sel_layers(0, 10),["maxEnergyCluster_cellsFull"])

and we run again (don’t forget to switch the testFile)

Suggested answer

fccanalysis run analysis_tutorial_mva.py --output pi0s_10layers.root --test
fccanalysis run analysis_tutorial_mva.py --output photons_10layers.root --test

3.5.9. Compare the performance

We get the following plotting scripts

wget https://raw.githubusercontent.com/HEP-FCC/fcc-tutorials/master/full-detector-simulations/FCCeeCaloPhotonPi0Discrimination/draw_rocCurve_pi0_gamma_GNN.py
wget https://raw.githubusercontent.com/HEP-FCC/fcc-tutorials/master/full-detector-simulations/FCCeeCaloPhotonPi0Discrimination/rocCurveFacility.py

Edit draw_rocCurve_pi0_gamma_GNN.py and edit the following:

#Where your files are
input_file_path = ""
#name of your files
pi0_file = os.path.join(input_file_path, "pi0s.root")
photon_file = os.path.join(input_file_path, "photons.root")
#output name
output_string_suffix = ""

and run first with the full layers

python draw_rocCurve_pi0_gamma_GNN.py

change the name to use the 10 layers files

#name of your files
pi0_file = os.path.join(input_file_path, "pi0s_10layers.root")
photon_file = os.path.join(input_file_path, "photons_10layers.root")
#output name
output_string_suffix = "_10layers"

and run again with only 10 layers

python draw_rocCurve_pi0_gamma_GNN.py

look at the produced plot and compare the performance.