MC Generators in CMS

1 - Matrix Element Generator

Overview

Teaching: 20 min
Exercises: 40 min
Questions
  • What is Monte Carlo event generator?

  • Why are we using simulated samples in CMS?

  • How are the simulated samples produced in CMS?

Objectives
  • Running standalone MadGraph with simple Z boson process

  • Producing MadGraph gridpacks using CMS script

  • Analyzing LHE level information

Introduction and first steps

Although quite old, link is a great reading material to get a general overview of Monte Carlo event generators. Monte Carlo event generators are essential components of almost all experimental analyses and are also widely used by theorists and experiments to make predictions and preparations for future experiments. It is one of the topics where we CMS experimentalists and theorists have the closest connections to, theorists give us predictions and experimentalists verify them with the actual data. Although Monte Carlo event generators are extremely important tools in HEP, they are often used as black boxes which we more or less treat them as “data”. Our aim is to get the minimal background of how these tools are working and analyze them using the generator level information.

Samples that are used by CMS experiments go through several steps of simulation :

  1. Monte Carlo event generator
  2. Detector simulation
  3. Pileup mixing
  4. Trigger emulation
  5. Object econstruction

We focus on “1. Monte Carlo event generator” in this tutorial. Monte Carlo event generator can be further divided into several subpieces as each steps can be factorized and can be handled through separate calculations :

  1. Parton distribution function (PDF)
  2. Hard scattering (matrix element calculation)
  3. Parton shower & hadronization First of all, LHC is a proton-proton collider, hence we need information on how partons (quarks and gluons) are distributed in the proton (PDF). Hard scattering is the part where calculations can be treated perturbatively, interactions of incoming partons with the largest momentum transfer (usually the physics process we are interested in). Parton shower & hadronization further describes how the particles involed in the hard scattering evolve, working downwards to lower momentum scales even to a point where perturbative calculations break down.

(1) Standalone : DY to ee

In the first exercise, we will run one of the most widely used tool for hard scattering calculations, that is MadGraph5_aMCatNLO, in short MadGraph link. MadGraph can perform the calculations for many different physics processes (both SM and BSM) at LO or NLO in QCD. Because of its easy user interface and flexibility with UFO models, you can test wide variety of physics modeling. We will now first see how MadGraph runs interactively in standalone mode using simple DYtoee process as an example.

Before we proceed, make sure you have first completed the steps described in “Setup” section. We ended with “Setup” section with commands below, configuring MadGraph with several settings.

cd ${GENTUTPATH}/standalone-tut/MG5_aMC_v2_9_18/
cp -r ${GENTUTPATH}/generators-cmsdaslpc2024-git/standalone ./
./bin/mg5_aMC standalone/setup.config

Through this, we restrict ourselves to using maximally 2 cores with set nb_core 2. Otherwise MadGraph will interfere with other people’s running jobs. We also installed ninja and collier which are tools that MadGraph adopts for NLO calculations.

Before going further, go into input/mg5_configuration.txt and change # text_editor = None by removing # and replacing None with your favorite text editor. For example, text_editor = vim.

Now launch MadGraph prompt shell by doing :

./bin/mg5_aMC

Now let’s try with the simplest DYtoee example.

import model sm
generate p p > e+ e-
output standalone-drellyan-mll50

First line tells MadGraph that you would like to use the UFO model named sm for calculations. Second line defines which physics process to generate, and in this particular example you are asking for the process where two “quarks from proton” produce a Z/gamma* mediators and then decays into an electron and a positron. Keep in mind that the calculations are performed on “two quarks” and not “two protons”. The information which translates “protons -> quarks” actually come from PDF. Last line sets the output directory for the computation results, M50 is to indicate the dilepton mass phase space cut we are about to apply in a few minutes.

Now launch!

launch

After MadGraph found all the Feynman diagrams that you targetted, you can see that MadGraph asking you several questions as shown below. Press tab to turn off the timer (otherwise, MadGraph will move on by itself after 60 seconds).

/===========================================================================\
| 1. Choose the shower/hadronization program     shower = Not Avail.        |
| 2. Choose the detector simulation program    detector = Not Avail.        |
| 3. Choose an analysis package (plot/convert) analysis = ExRoot            |
| 4. Decay onshell particles                    madspin = OFF               |
| 5. Add weights to events for new hypp.       reweight = Not Avail.        |
\===========================================================================/

As we did not install any other shower, detector, they are in Not Avail. state. We will learn later how showering will be done under CMSSW and run brief analyzing/histogramming code to analyze the events we produce from this tutorial. ExRootAnalysis (analysis = ExRoot) is installed to later use it to convert LHE files to ROOT files and draw histograms using it. madspin will be demonstrated later using top pair process example in the third (optional) exercise. reweight is out of scope for this tutorial although it is quite useful for certain BSM scenarios.

Let’s move on by pressing ENTER.

You can see that MadGraph is asking you several questions as shown below. Again, press tab to turn off the timer (otherwise, MadGraph will move on by itself after 90 seconds).

Do you want to edit a card (press enter to bypass editing)?
/------------------------------------------------------------\
|  1. param : param_card.dat                                 |
|  2. run   : run_card.dat                                   |
\------------------------------------------------------------/
 you can also
   - enter the path to a valid card or banner.
   - use the 'set' command to modify a parameter directly.
     The set option works only for param_card and run_card.
     Type 'help set' for more information on this command.
   - call an external program (ASperGE/MadWidth/...).
     Type 'help' for the list of available command
 [0, done, 1, param, 2, run, enter path][90s to answer] 

Let’s take a look at the cards and see how the values are set, press 1 and ENTER to investigate the parameter settings.

###################################
## INFORMATION FOR MASS
###################################
Block mass
    5 4.700000e+00 # MB 
    6 1.730000e+02 # MT 
   15 1.777000e+00 # MTA 
   23 9.118800e+01 # MZ 
   25 1.250000e+02 # MH

...

###################################
## INFORMATION FOR DECAY
###################################
DECAY   6 1.491500e+00 # WT 
DECAY  23 2.441404e+00 # WZ 
DECAY  24 2.047600e+00 # WW 
DECAY  25 6.382339e-03 # WH 

Let’s take a look at the cards and see how the values are set, press 2 and ENTER to investigate the run settings.

#*********************************************************************
# Number of events and rnd seed                                      *
# Warning: Do not generate more than 1M events in a single run       *
#*********************************************************************
  10000 = nevents ! Number of unweighted events requested
  0   = iseed   ! rnd seed (0=assigned automatically=default))

...

#*********************************************************************
# Collider type and energy                                           *
# lpp: 0=No PDF, 1=proton, -1=antiproton,                            *
#                2=elastic photon of proton/ion beam                 *
#             +/-3=PDF of electron/positron beam                     *
#             +/-4=PDF of muon/antimuon beam                         *
#*********************************************************************
     1        = lpp1    ! beam 1 type
     1        = lpp2    ! beam 2 type
     6500.0     = ebeam1  ! beam 1 total energy in GeV
     6500.0     = ebeam2  ! beam 2 total energy in GeV

...

#*********************************************************************
# Standard Cuts                                                      *
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut)                *
#*********************************************************************
 10.0  = ptl       ! minimum pt for the charged leptons
 -1.0  = ptlmax    ! maximum pt for the charged leptons
 {} = pt_min_pdg ! pt cut for other particles (use pdg code). Applied on particle and anti-particle
 {}     = pt_max_pdg ! pt cut for other particles (syntax e.g. {6: 100, 25: 50})

...

#*********************************************************************
# Minimum and maximum invariant mass for pairs                       *
#*********************************************************************
 0.0   = mmll    ! min invariant mass of l+l- (same flavour) lepton pair
 -1.0  = mmllmax ! max invariant mass of l+l- (same flavour) lepton pair
 {} = mxx_min_pdg ! min invariant mass of a pair of particles X/X~ (e.g. {6:250})
 {'default': False} = mxx_only_part_antipart ! if True the invariant mass is applied only
                       ! to pairs of particle/antiparticle and not to pairs of the same pdg codes.

...

#*********************************************************************
# maximal pdg code for quark to be considered as a light jet         *
# (otherwise b cuts are applied)                                     *
#*********************************************************************
 4 = maxjetflavor    ! Maximum jet pdg code

Try editting the beam energy (ebeam1 and ebeam2) 6500 to 6800 as we are now running at 13.6TeV beam energy. When done with editting, escape after saving the changes in the text file.

MadGraph allows you to change settings by interactively typing in below as well.

set run_card nevents 5000

Take a look at the run card again and see if number of events to generate (nevents) is changed to 5000. And change it back to 10000 using same command and check again.

As shown above, there are several phase space cuts set by default (e.g. 10.0 = ptl). There is a handy command that removes all phase space cuts at once (instead of doing set run_card ptl 0, set run_card ptj 0, … one by one by hand).

set no_parton_cut

Take a look at the card again and see if lepton pt cut (ptl) is changed to 0. Keep in mind that the cuts you give before doing set no_parton_cut will be removed by this command. So don’t forget to do set no_parton_cut before giving the cuts you wish to give.

As mentioned above, mll50 in the output directory name stands for dilepton mass cut at 50GeV.

How should we set the dilepton mass cut?

In MadGraph LO run card, the name of dilepton mass variable is mmll. How should we give 50GeV cut to this value?

Solution

set run_card mmll 50

After you verified the desired dilepton mass cut is given, let’s really start the computation by moving on, press ENTER.

What is the cross section?

Take a close look at what MadGraph logs tell you.

Solution

 === Results Summary for run: run_01 tag: tag_1 ===

    Cross-section :   1584 +- 1.159 pb
    Nb of events :  10000

Type in exit in order to escape from MadGraph shell prompt. We will take a look at the output LHE file.

less $GENMGPATH/standalone-drellyan-mll50/Events/run_01/unweighted_events.lhe.gz

Scroll down to look at the first event (in order to exit, hit q).

<event>
 5      1 +1.4934000e+03 9.10903200e+01 7.54677100e-03 1.30023300e-01
        2 -1    0    0  501    0 +0.0000000000e+00 +0.0000000000e+00 +1.2430983507e+01 1.2430983507e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
       -2 -1    0    0    0  501 -0.0000000000e+00 -0.0000000000e+00 -1.6687025534e+02 1.6687025534e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
       23  2    1    2    0    0 +0.0000000000e+00 +0.0000000000e+00 -1.5443927183e+02 1.7930123884e+02 9.1090315443e+01 0.0000e+00 0.0000e+00
      -11  1    3    3    0    0 -2.3393803385e+01 -7.4187481776e+00 -1.5274153214e+02 1.5470062541e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
       11  1    3    3    0    0 +2.3393803385e+01 +7.4187481776e+00 -1.6977396902e+00 2.4600613435e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00

What does each column mean?

Solution

ID, status, mother1, mother2, color, anticolor, px, py, pz, E, mass, life time, and spin

-11  1    3    3    0    0 -2.3393803385e+01 -7.4187481776e+00 -1.5274153214e+02 1.5470062541e+02 0.0000000000e+00 0.0000e+00 1.0000e+00

This line tells you that a positron (ID) is an outgoing particle (status) with Z as its mother (mother1 and mother2 : 3rd particle is Z which is ID=23) with no color (color and anticolor), …

(2) Standalone : DY to ll (using particle containers)

Now let’s learn about particle containers an easier way to deal with multiple particles. Launch MadGraph shell prompt with ./bin/mg5_aMC as we did above.

import model sm
display multiparticles

Above command will show several predefined particle containers as below.

Multiparticle labels:
p = g u c d s u~ c~ d~ s~
j = g u c d s u~ c~ d~ s~
l+ = e+ mu+
l- = e- mu-
vl = ve vm vt
vl~ = ve~ vm~ vt~
all = g u c d s u~ c~ d~ s~ a ve vm vt e- mu- ve~ vm~ vt~ e+ mu+ t b t~ b~ z w+ h w- ta- ta+

One can redefine or newly define the particle containers by doing :

define l+ = e+ mu+ ta+
define l- = e- mu- ta-
define myleptons+ = e+ mu+ ta+
define myleptons- = e- mu- ta-
define lpcdas = e+ mu- u d b~
display multiparticles

Now let’s try making the same DY process events but this time allowing all lepton flavors. Previously we only did electron pair, now we are about to use particle containers collect all possible dilepton contributions including muons and taus.

generate p p > l+ l-
output standalone-drellyan-mll50-inclusive
launch
0
set run_card nevents 5000
set ebeam1 6800
set ebeam2 6800
set no_parton_cut
set mmll 50
set use_syst False
0

What is the cross section?

We added 2 new Feynman diagrams (decaying to muon pair and tau pair). How should the cross section be adding up from previous value 1493pb?

Solution

  === Results Summary for run: run_01 tag: tag_1 ===

     Cross-section :   4748 +- 5.361 pb
     Nb of events :  5000

Another way to generate multiple Feynman diagrams is by using add process as below.

import model sm
generate p p > e+ e-
add process p p > mu+ mu-
add process p p > ta+ ta-

There is one more cool trick to use MadGraph. Take a look at standalone/drellyan-mll10.config file.

generate p p > e+ e-
output standalone-drellyan-mll10
launch
set nevents 10000
set no_parton_cut
set mmll 10
set use_syst False
0

Nothing has changed except that set mmll 50 from above now became set mmll 10. We loosened the dilepton mass cut in this script.

Try this :

./bin/mg5_aMC standalone/drellyan-mll10.config

MadGraph reads the line and automatically passes the command to MadGraph prompt shell.

Try this again :

./bin/mg5_aMC standalone/drellyan-mll4.config

What are the cross sections?

How are the cross sections changing when compared to the case where 50GeV cut was given?

Solution

Cross sections get larger as we loosen the cuts drastically (we will later see this through a histogram).

Take a quick look at the plots. We will draw two histograms (transverse momentum and mass of the dilepton system) with the samples we’ve just produced.

cd $GENTUTPATH/CMSSW_12_4_14_patch2/src
cmsenv
mkdir -p $GENPLOTPATH
cd $GENPLOTPATH
cp $GENTUTPATH/generators-cmsdaslpc2024-git/plotter/*.py ./
python3 lhe-root-plotter.py

What can you infer from the plots?

Why is the transverse momentum distribution only populating at 0?

Solution

Transverse momentum peaks at 0 because the sum of intial quark’s momentum only lies in z-axis direction. How do we acquire transversal direction momentum of the Z boson?

What are the two peaks in the mass distribution?

Solution

Two peaks represent photon and Z boson. What will happen if we remove the cut on dilepton mass (‘set mmll 0’)?

(3) Gridpack : DY to ee producing gridpacks for CMS sample production)

As we learned above, running standalone MadGraph is not so difficult. And it is often useful to do quick tests if you are curious about certain physics processes and its cross sections. However, CMS relies on billions of events (we produce more than 50B events per year) for physics analysis. Could we handle all the necessary statistics by interactively running standalone MadGraph? What if the person who first produced 10M events for Z->ee process decides to leave CMS and we decide to make 40M events more? Can we ensure all the physics settings (which PDF set was chosen, how are kinematic cuts given, etc.) are all kept consistently? To mitigate such issues, CMS has developed a workflow called gridpacks which is maintained in link. Gridpacks are precompiled library that contains all necessary executables from MadGraph to produce LHE events. It is particularly useful for physics processes that require higher multiplicity of particles (we’ve only tried with 2->2 physics process, think of more complex physics processes e.g. pp->eejjjj which is 2->6 with 4 additional QCD particles denoted with j) as the precompilation greatly reduces the computing time. Here, instead of running MadGraph interactively, we will only give inputs to the CMS developed workflow and produce gridpacks and then see how they are the same or different compared to the standalone exercise.

Before we begin, we first need to unset CMSSW environment settings (or open a new terminal) as it might interfere with the scripts in genproductions repository.

eval `scram unsetenv -sh`

Now lets go into genproductions to try out the gridpack production.

cd $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO
cp -r ${GENTUTPATH}/generators-cmsdaslpc2024-git/gridpack ./

Take a look at the cards in gridpack/drellyan-mll50/ directory. There are two .dat files which are minimal inputs to make gridpacks.

less gridpack/drellyan-mll50/drellyan-mll50_proc_card.dat
less gridpack/drellyan-mll50/drellyan-mll50_run_card.dat

You would notice that the proc_card.dat defines the physics process that we want to calculate is the same as the first example with standalone exercise. From the run_card.dat you would notice that PDF choice that did not exist in the exercise above is now showing up.

#*********************************************************************
# PDF CHOICE: this automatically fixes also alpha_s and its evol.    *
#*********************************************************************
  'lhapdf'    = pdlabel     ! PDF set                                  
$DEFAULT_PDF_SETS = lhaid
$DEFAULT_PDF_MEMBERS = reweight_PDF     ! if pdlabel=lhapdf, this is the lhapdf number

$DEFAULT_PDF_SETS and DEFAULT_PDF_MEMBERS are parsed later automatically through Utilities/gridpack_helpers.sh in genproductions. This is CMS specific part of the code to keep consistent PDF setup among different CMS samples.

As there are many people running the tutorial at once, let’s restrict the core usage to 2 again and then start the gridpack production.

export NB_CORE=2
./gridpack_generation.sh drellyan-mll50 gridpack/drellyan-mll50/ pdmv

Note pdmv is only there to restrict the number of cores to use to 2 set with NB_CORE, normally you just need to execute it with ./gridpack_generation.sh <process name> <path to card> without pdmv.

Keep in mind that everything is exactly the same as the standalone tutorial except that gridpack_generation.sh is merely replacing every interactive commands that we were giving to MadGraph prompt shell.

You can see that MadGraph is downloaded from the web,

/uscms/home/sjeon/nobackup/GENTUTORIAL/gridpack-tut/genproductions/bin/MadGraph5_aMCatNLO
WARNING: In non-interactive mode release checks e.g. deprecated releases, production architectures are disabled.
WARNING: In non-interactive mode release checks e.g. deprecated releases, production architectures are disabled.
--2024-01-01 17:21:56--  https://cms-project-generators.web.cern.ch/cms-project-generators/MG5_aMC_v2.9.13.tar.gz
Resolving cms-project-generators.web.cern.ch (cms-project-generators.web.cern.ch)... 2001:1458:d00:4e::100:3c0, 188.184.74.207
Connecting to cms-project-generators.web.cern.ch (cms-project-generators.web.cern.ch)|2001:1458:d00:4e::100:3c0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26561088 (25M) [application/gzip]
Saving to: 'MG5_aMC_v2.9.13.tar.gz'

     0K .......... .......... .......... .......... ..........  0%  228K 1m54s
    50K .......... .......... .......... .......... ..........  0%  456K 85s
   100K .......... .......... .......... .......... ..........  0%  181M 57s
   150K .......... .......... .......... .......... ..........  0%  301M 42s

then applying several patches (to mitigate several bugs that are discovered after the release),

patching file models/loop_qcd_qed_sm/restrict_lepton_masses_no_lepton_yukawas.dat
patching file models/loop_sm/restrict_ckm_no_b_mass.dat
patching file models/sm/restrict_ckm_lepton_masses.dat
patching file models/sm/restrict_ckm_lepton_masses_no_b_mass.dat
patching file models/sm/restrict_ckm_no_b_mass.dat
patching file models/sm/restrict_lepton_masses_no_b_mass.dat
patching file Template/NLO/SubProcesses/MCmasses_PYTHIA8.inc
patching file madgraph/interface/loop_interface.py
patching file madgraph/various/systematics.py
patching file Template/NLO/Source/make_opts.inc
patching file madgraph/iolibs/export_v4.py
patching file madgraph/iolibs/template_files/pdf_opendata.f
patching file madgraph/iolibs/template_files/pdf_wrap_lhapdf.f

then finding the desired Feynman diagram defined in proc_card.dat,

import model sm
INFO: load particles 
INFO: load vertices 
INFO: Restrict model sm with file MG5_aMC_v2_9_13/models/sm/restrict_default.dat . 
INFO: Run "set stdout_level DEBUG" before import for more information. 
INFO: Change particles name to pass to MG5 convention 
Defined multiparticle p = g u c d s u~ c~ d~ s~
Defined multiparticle j = g u c d s u~ c~ d~ s~
Defined multiparticle l+ = e+ mu+
Defined multiparticle l- = e- mu-
Defined multiparticle vl = ve vm vt
Defined multiparticle vl~ = ve~ vm~ vt~
Defined multiparticle all = g u c d s u~ c~ d~ s~ a ve vm vt e- mu- ve~ vm~ vt~ e+ mu+ t b t~ b~ z w+ h w- ta- ta+
generate p p > e+ e-
INFO: Checking for minimal orders which gives processes. 
INFO: Please specify coupling orders to bypass this step. 
INFO: Trying process: g g > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: u u~ > e+ e- WEIGHTED<=4 @1  
INFO: Process has 2 diagrams 

and finally giving out the computed cross sections.

  === Results Summary for run: pilotrun tag: tag_1 ===

     Cross-section :   1836 +- 6.442 pb
     Nb of events :  0

Why did the cross section change?

You would notice that the cross section has changed from 1493pb to 1836pb. What would be the reasons although we ran on exact same physics process?

Solution

Most importantly, different PDF set has been used (check by looking at drellyan-mll50/drellyan-mll50_gridpack/work/gridpack/process/madevent/Cards/run_card.dat). This will give you totally different assumptions on parton distributions in a proton hence difference in the results. Other minor reasons for the difference could be the use of different MadGraph release version or perhaps different random seed.

Now try the same with different gridpack cards named drellyan-mll50-5fs.

./gridpack_generation.sh drellyan-mll50-5fs gridpack/drellyan-mll50-5fs/ pdmv

This will add a new contribution to the process that is b b~ > e+ e- to the calculation as it uses different UFO model that is sm-no_b_mass. Strictly speaking, we are using the same UFO model but adding a restriction (bottom quarks are treated massless) to the model. Take a look at restriction_card_tutorial from slide 24 for more information.

import model sm-no_b_mass
INFO: load particles 
INFO: load vertices 
INFO: Restrict model sm-no_b_mass with file MG5_aMC_v2_9_13/models/sm/restrict_no_b_mass.dat . 
INFO: Run "set stdout_level DEBUG" before import for more information. 
INFO: Change particles name to pass to MG5 convention 
Defined multiparticle p = g u c d s u~ c~ d~ s~
Defined multiparticle j = g u c d s u~ c~ d~ s~
Defined multiparticle l+ = e+ mu+
Defined multiparticle l- = e- mu-
Defined multiparticle vl = ve vm vt
Defined multiparticle vl~ = ve~ vm~ vt~
Pass the definition of 'j' and 'p' to 5 flavour scheme.
Defined multiparticle all = g u c d s b u~ c~ d~ s~ b~ a ve vm vt e- mu- ve~ vm~ vt~ e+ mu+ t t~ z w+ h w- ta- ta+
generate p p > e+ e-
INFO: Checking for minimal orders which gives processes. 
INFO: Please specify coupling orders to bypass this step. 
INFO: Trying process: g g > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: u u~ > e+ e- WEIGHTED<=4 @1  
INFO: Process has 2 diagrams 
INFO: Trying process: u c~ > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: c u~ > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: c c~ > e+ e- WEIGHTED<=4 @1  
INFO: Process has 2 diagrams 
INFO: Trying process: d d~ > e+ e- WEIGHTED<=4 @1  
INFO: Process has 2 diagrams 
INFO: Trying process: d s~ > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: d b~ > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: s d~ > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: s s~ > e+ e- WEIGHTED<=4 @1  
INFO: Process has 2 diagrams 
INFO: Trying process: s b~ > e+ e- WEIGHTED<=4 @1  
INFO: Process u~ u > e+ e- added to mirror process u u~ > e+ e- 
INFO: Process c~ c > e+ e- added to mirror process c c~ > e+ e- 
INFO: Process d~ d > e+ e- added to mirror process d d~ > e+ e- 
INFO: Trying process: d~ b > e+ e- WEIGHTED<=4 @1  
INFO: Process s~ s > e+ e- added to mirror process s s~ > e+ e- 
INFO: Trying process: s~ b > e+ e- WEIGHTED<=4 @1  
INFO: Trying process: b b~ > e+ e- WEIGHTED<=4 @1  
INFO: Process has 2 diagrams 
INFO: Process b~ b > e+ e- added to mirror process b b~ > e+ e- 
5 processes with 10 diagrams generated in 0.032 s
Total: 5 processes with 10 diagrams
output drellyan-mll50-5fs

Now you would get the following, somewhat increased cross section 1903pb compared to the previous run 1836pb.

  === Results Summary for run: pilotrun tag: tag_1 ===

     Cross-section :   1903 +- 6.669 pb
     Nb of events :  0

Why did the cross section change?

Did your proc_card.dat add any new processes?

Solution

We now have 5 flavor quarks in the proton which was 4 in the previous example by adding up the bottom quark contributions. So we have a new contribution that is b b~ > e+ e-.

But why did it not scale up so much?

When we ran p p > e+ e- and p p > l+ l- with standalone MadGraph, the cross sections was roughly 3 times larger. When we consider 4 (udcs) and 5 (udcsb) flavor schemes of proton should it not be 5/4 times larger?

Solution

No, keep in mind that proton consists of two up quarks and one down antiquark which are valence quarks. The rest are sea quark contributions, smaller in PDF. Hence, the amount of increment coming from bottom quark contributions are not so large.

Let’s recall why gridpack is a useful package to use. It is a precompiled library to make the LHE files faster for the given process. CMS uses gridpacks to produce official samples as it is much easier to keep consistency of the sample. Now it’s time to find out how we make LHE files from gridpacks. To start with, copy and paste the gridpack to a new temporary directory and untar it.

mkdir test
cd test
cp ../drellyan-mll50-5fs_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz ./
tar -xvf drellyan-mll50-5fs_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz

You can see that several files were compressed into one tarball. Among many files, running runcmsgrid.sh will take the precompiled library to generate events for the LHE file. Inputs are given in following order, ./runcmsgrid.sh <nevents> <random seed> <ncores>. To make 50 events with random seed 1 using 1 core, execute below.

./runcmsgrid.sh 50 1 1

After the run has finished, LHE file with a name cmsgrid_final.lhe has been produced. It’s easy to reproduce more statistics as much as we need than running from scratch with standalone MadGraph.

The first step of generator tutorial has finished. Before moving on to the next step, let’s first run below as it takes a bit more time to finish.

cd $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO
./gridpack_generation.sh drellyan-mll50-01j gridpack/drellyan-mll50-01j/ pdmv

Quite often, there are people asking questions “I am working for CMS experiment, I tried to make a gridpack using my awesome BSM UFO file with this awesome BSM particle predictions to use for my analysis. But the script I used in genproductions does not give me functional gridpacks. Please help me.”

Never do this! Setups in genproductions is not MadGraph authors turf!

  1. They have no responsibility to make our gripdack script work.
  2. They also have no idea (perhaps some idea) on what we do in genproductions
  3. They do not share the same computing environment.

Please first consult with GEN conveners or experts through CMS talk link. For more constructive iterations and feedbacks, provide your inputs (.dat files you used to generate gridpacks) and all possible error logs. Also as you all now know how to run standalone MadGraph, test your gridpack inputs with standdalone MadGraph first. If it works in standalone run but not in gridpack, it likely could be genproductions issue. If it does not work, it likely could be the core MadGraph issue or some mismodeling of the process.

Key Points

  • MadGraph is one of the most widely used generator for the hard scattering computations

  • Standalone MadGraph can run interactively on-the-fly or by importing the predefined text scripts

  • Gridpacks are useful for large scale productions with consistency guaranteed

  • LHE level information is not physical and parton shower is needed to describe full physics


2 - Parton Shower Generator

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • Why do we need parton shower?

  • How do we produce NanoGEN samples?

Objectives
  • Perform parton shower with LHE file as an input

  • Perform parton shower with gridpack as an input

  • Analyze generator level information using NanoGEN files

Creating particle level samples from LHE files

As discussed earlier, LHE files itself are not enough to describe physical distributions. In order to generate physics-wise sensible events, LHE files need to go through the parton shower. Parton shower, in principle, is responsible for higher order corrections to the hard process (consider q -> q g or e -> e gamma). Dominant contributions of such correction happen with collinear or soft emissions. In CMS, one of the most widely used tool for parton shower is Pythia8 (however, do note that Pythia8 is a multipurpose generator that is able to calculate hard process for certain physics processes). In this exercise, instead of compiling Pythia8 and running it in standalone mode as we did for MadGraph, we will take Pythia8 that is already compiled under CMSSW environment.

(1) Running Pythia8 interface in CMSSW

Let’s first check which release version of Pythia8 we will be using.

cd $GENTUTPATH/CMSSW_12_4_14_patch2/src
cmsenv
scram tool info pythia8

You can find out that we are now using Pythia8.306 version that is already compiled in CMSSW_12_4_14_patch2.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Name : pythia8
Version : 306-494ded5c626b685d055d5b022e918c0c
++++++++++++++++++++

INCLUDE=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/include
LIB=pythia8
LIBDIR=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/lib
PYTHIA8DATA=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/share/Pythia8/xmldoc
PYTHIA8_BASE=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c
ROOT_INCLUDE_PATH=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/include
SYSTEM_INCLUDE+=1
USE=root_cxxdefaults cxxcompiler hepmc3 hepmc lhapdf

Now we will start building our parton shower fragment in our own directories in order to produce samples by ourselves.

mkdir -p Configuration/GenProduction/python/
cp $GENTUTPATH/generators-cmsdaslpc2024-git/fragment/*.py Configuration/GenProduction/python/
scram b
mkdir -p $GENSHOWERPATH
cd $GENSHOWERPATH

cmsDriver.py executable makes the full configuration file based on the optional arguments it is given with (data tier, campaign, etc.) using the parton shower fragment that is built. We will create NanoGEN files that are flat ntuples that resembles the NanoAOD data tier but only stored with generator-level information related branches. It skips the SIM and RECO steps in the middle which makes it convenient to do generator-level studies. For more information, take a look at link.

cmsDriver.py Configuration/GenProduction/python/drellyan-mll50.py \
    --python_filename run_drellyan-mll50.py \
    --eventcontent NANOAOD \
    --datatier NANOAOD \
    --fileout file:drellyan-mll50.root \
    --conditions auto:mc \
    --step LHE,GEN,NANOGEN \
    --no_exec \
    --mc \
    -n 100

You just created run_drellyan-mll50.py that can be executed with cmsRun command. Take a look at run_drellyan-mll50.py with less, how it evolved from Configuration/GenProduction/python/drellyan-mll50.py through cmsDriver.py. It will proudce LHE files, run parton shower to make GEN samples, and then finally convert it to NanoGEN format in one go by doing below. Note that we will only test with 100 events (-n 100) due to time constraints.

cmsRun run_drellyan-mll50.py

LHE files are first produced using the gridpack we’ve just produced.

   ______________________________________     
         Running Generic Tarball/Gridpack     
   ______________________________________     
gridpack tarball path = /uscms/home/sjeon/nobackup/GENTUTORIAL/gridpack-tut/genproductions/bin/MadGraph5_aMCatNLO/drellyan-mll50_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 thread count requested = 1
%MSG-MG5 residual/optional arguments = 
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 number of cpus = 1
%MSG-MG5 SCRAM_ARCH version = slc7_amd64_gcc10
%MSG-MG5 CMSSW version = CMSSW_12_4_8
WARNING: Developer's area is created for non-production architecture slc7_amd64_gcc10. Production architecture for this release is el8_amd64_gcc10
**** Following environment variables are going to be unset.
      CMSSW_FULL_RELEASE_BASE

Running MG5_aMC for the 1 time
produced_lhe  0 nevt  100 submitting_event  100  remaining_event  100
run.sh 100 2345670
Now generating 100 events with random seed 2345670 and granularity 1

Reweight with additional PDF sets given for possible systematic sources.

INFO: #***************************************************************************
#
# original cross-section: 1855.0899999999972
#     scale variation: +10.6% -11.6%
#     emission scale variation: + 0% - 0%
#     central scheme variation: +3.05e-09% -17.8%
# PDF variation: +1.32% -1.32%
#
#PDF NNPDF31_nnlo_as_0118_nf_4: 1854.1 +1.32% -1.32%
#PDF NNPDF30_nnlo_nf_4_pdfas: 1816.21 +2.13% -2.13%
#PDF NNPDF40_nnlo_nf_4_pdfas: 1854.4 +0.579% -0.579%
#PDF MSHT20nnlo_nf4: 1827.24 +1.16% -1.61%
#PDF PDF4LHC21_40_pdfas_nf4: 1841.73 +1.59% -1.59%
#PDF ABMP16_4_nnlo: 1833.82 +0.925% -0.925%
# dynamical scheme # 1 : 1749 +11.8% -12.8% # \sum ET
# dynamical scheme # 2 : 1749 +11.8% -12.8% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 1524.71 +14.7% -15.9% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 1855.09 +10.6% -11.6% # \sqrt{\hat s}
# PDF 42930 : 1837.5192582008478
#***************************************************************************

And then Pythia8 is launched with the LHE file created given as an input. It first prints out the LHE information as we saw directly in the LHE file.

 --------  PYTHIA Event Listing  (hard process)  -----------------------------------------------------------------------------------
 
    no         id  name            status     mothers   daughters     colours      p_x        p_y        p_z         e          m 
     0         90  (system)           -11     0     0     0     0     0     0      0.000      0.000      0.000  13600.000  13600.000
     1       2212  (p+)               -12     0     0     3     0     0     0      0.000      0.000   6800.000   6800.000      0.938
     2       2212  (p+)               -12     0     0     4     0     0     0      0.000      0.000  -6800.000   6800.000      0.938
     3          2  (u)                -21     1     0     5     6   501     0      0.000      0.000     66.079     66.079      0.000
     4         -2  (ubar)             -21     2     0     5     6     0   501     -0.000     -0.000    -36.939     36.939      0.000
     5        -11  e+                  23     3     4     0     0     0     0     30.176    -10.240    -24.793     40.375      0.001
     6         11  e-                  23     3     4     0     0     0     0    -30.176     10.240     53.933     62.644      0.001
                                   Charge sum:  0.000           Momentum sum:      0.000      0.000     29.140    103.019     98.811

Starts the parton shower on top of the given LHE event. See how much more information gets printed out. Remember that parton shower goes lower and lower from the hard process until certain energy threshold (q -> q g -> q g g g -> q q q g g -> ...).

 --------  PYTHIA Event Listing  (complete event)  ---------------------------------------------------------------------------------
 
    no         id  name            status     mothers   daughters     colours      p_x        p_y        p_z         e          m 
     0         90  (system)           -11     0     0     0     0     0     0      0.000      0.000      0.000  13600.000  13600.000
     1       2212  (p+)               -12     0     0   265     0     0     0      0.000      0.000   6800.000   6800.000      0.938
     2       2212  (p+)               -12     0     0   266     0     0     0      0.000      0.000  -6800.000   6800.000      0.938
     3          2  (u)                -21     7     7     5     6   501     0      0.000      0.000     66.079     66.079      0.000
     4         -2  (ubar)             -21     8     0     5     6     0   501     -0.000     -0.000    -36.939     36.939      0.000
     5        -11  (e+)               -23     3     4     9     9     0     0     30.176    -10.240    -24.793     40.375      0.001
     6         11  (e-)               -23     3     4    10    10     0     0    -30.176     10.240     53.933     62.644      0.001
     7          2  (u)                -42    12     0     3     3   501     0      0.000      0.000     66.079     66.079      0.000
     8         -2  (ubar)             -41    13    13    11     4     0   502     -0.000     -0.000    -47.025     47.025      0.000
     9        -11  (e+)               -44     5     5    14    14     0     0     25.135    -14.682    -35.225     45.696      0.001
    10         11  (e-)               -44     6     6    15    15     0     0    -30.850      9.646     42.888     53.704      0.001
    11         21  (g)                -43     8     0    16    16   501   502      5.715      5.036     11.392     13.704      0.000
    12          2  (u)                -41   154     0    17     7   503     0      0.000      0.000     78.601     78.601      0.000
    13         -2  (ubar)             -42   155   155     8     8     0   502     -0.000     -0.000    -47.025     47.025      0.000
    14        -11  (e+)               -44     9     9   156   156     0     0     24.576    -15.108    -32.070     43.136      0.001
    15         11  (e-)               -44    10    10   157   157     0     0    -36.010      5.715     44.528     57.551      0.001
    16         21  (g)                -44    11    11   158   158   501   502      4.374      4.014     12.596     13.925      0.000
    17         21  (g)                -43    12     0   159   159   503   501      7.060      5.379      6.521     11.013      0.000
    18         21  (g)                -31    65     0    20    21   505   504      0.000      0.000      2.971      2.971      0.000
    19         21  (g)                -31    66    66    20    21   504   506      0.000      0.000    -25.830     25.830      0.000
    20         21  (g)                -33    18    19    67    67   507   506     -3.816      2.200    -23.877     24.280      0.000
    21         21  (g)                -33    18    19    68    68   505   507      3.816     -2.200      1.018      4.521      0.000

After 1 event information is printed out, 100 events get processed and finally reports the cross section.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Overall cross-section summary 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process		xsec_before [pb]		passed	nposw	nnegw	tried	nposw	nnegw 	xsec_match [pb]			accepted [%]	 event_eff [%]
0		1.855e+03 +/- 1.773e+01		100	100	0	100	100	0	1.855e+03 +/- 1.773e+01		100.0 +/- 0.0	100.0 +/- 0.0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Total		1.855e+03 +/- 1.773e+01		100	100	0	100	100	0	1.855e+03 +/- 1.773e+01		100.0 +/- 0.0	100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 1.855e+03 +- 1.773e+01 pb
After matching: total cross section = 1.855e+03 +- 1.773e+01 pb
Matching efficiency = 1.0 +/- 0.0   [TO BE USED IN MCM]
Filter efficiency (taking into account weights)= (100) / (100) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (100) / (100) = 1.000e+00 +- 0.000e+00    [TO BE USED IN MCM]

After filter: final cross section = 1.855e+03 +- 1.773e+01 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 5.391e-01 +- 5.179e-03

=============================================

How did the cross section change after parton shower?

MadGraph reported # original cross-section: 1855.0899999999972, 1855pb. After running parton shower with Pythia8, same cross section 1855pb is kept. Parton shower adds more and more vertices, but why does the cross section remain unchanged?

Solution

Parton shower is unitary. Sum of probability to branch (e.g q -> q g) and not branch is 1. Hence, the cross sections is determined by the lowest order input (hard process).

Previously, we saw the histogram of dilepton system’s transverse momentum using LHE information. And we claimed it only being populated at 0GeV was not a physical distribution. After parton shower, using the NanoGEN sample, let’s see how the distribution changed. Due to time constraints, tutors prepared samples with 40000 events in /tmp/GENTUTORIAL/drellyan-mll50.root for plotting purposes.

cd $GENPLOTPATH
python3 nanogen-plotter.py

How did the distribution change?

Where did the dilepton system acquire transverse momentum from?

Solution

Incoming partons from protons also go through parton shower which is named “initial state radiation (ISR)”.

What is GenDressedLepton?

What happens to leptons during parton shower? Are leptons kept stable during parton shower as it does not participate in strong interactions?

Solution

Parton shower, despite its choice of the naming, “parton”, also includes QED shower such as e -> e gamma. Dressed leptons (GenDressedLepton collection in NanoGEN) is an object formed of the charged lepton and photons that are close to it.

(2) Jet merging samples

Hard process calculation has advantage in modeling of hard jets and heavy particle decays while parton shower is great for describing collinear and soft emissions. For more realistic and reliable physics modeling of hard jets, for example in DY events, MadGraph can be used as below.

generate p p > e+ e- @0
add process p p > e+ e- j @1

With such syntaxes, MadGraph produces DY process with 0 and 1 hard jet in the event. If this sample goes through parton shower, as some portion of events (dentoed with @1) readily involves hard jet, it would be better at describing DY process with hard jet. However consider the event @0 emitting QCD particles from initial state radiation that could possibly form a jet that is hard enough. Such phase space inherently possesses a problem of double counting as “DY with hard jet” event could come from both @0 and @1. To mitigate such issues and remove double counting of phase space contributions, jet merging technique is used. Jet merging is set up with an artificial cut threshold called jet merging scale. This scale decides whether an event will be accepted or not from both @0 and @1. Finally, only accepted events from the two processes will be merged and form one sample. Very roughly, jet merging scale can be thought as the momentum of a jet. If a jet in the event is hard enough above the threshold, events from @0 are rejected while only accepting from @1. On the other hand, if a jet in the event is not too hard below the threshold, events from @0 are only accepted while rejecting @1.

Now let’s take a look at the gridpack we produced before we started the parton shower exercises.

ls $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO/drellyan-mll50-01j_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
cd $GENSHOWERPATH
mkdir jet_merging
cd jet_merging
cp $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO/drellyan-mll50-01j_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz ./
tar -xvf drellyan-mll50-01j_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz

Take a look at the cards in InputCards directory. Most notably, run_card.dat had a different setting compared to the other gridpacks we’ve produced.

#*********************************************************************
# Matching - Warning! ickkw > 1 is still beta
#*********************************************************************
  1     = ickkw ! 0 no matching, 1 MLM, 2 CKKW matching

This flag tells MadGraph that the LHE files we are going to produce will later be going through jet merging in order to avoid double countings.

#*********************************************************************
# Jet measure cuts                                                   *
#*********************************************************************
  10.0  = xqcut ! minimum kt jet measure between partons

When jet merging is turned on, xqcut needs to be set which presample the events for efficient jet merging. Remember that some portion of events will be later discarded and never going to be used. So there is no point of producing events that involve jets with too low energy scale at this LHE level since these will likely be removed.

Try producing 100 events using this gridpack as we did before with command ./runcmsgrid.sh 100 1 1.

What is the cross section?

MadGraph reported # original cross-section: 2928.1100000000024, 2928pb which is significantly larger than previous values that were below 2000pb. How can this be explained?

Solution

The cross section reported from MadGraph is before we run the parton shower. During parton shower, jet merging will be performed and thus some portion of events will be discared. This will be reflected into overall normalization and the cross section will be smaller than what we now see.

Before running the parton shower, let’s look at the pythia fragment that should be used for the parton shower with jet merging. Compare $GENTUTPATH/CMSSW_12_4_14_patch2/src/Configuration/GenProduction/python/drellyan-mll50-01j.py and the one we used earlier $GENTUTPATH/CMSSW_12_4_14_patch2/src/Configuration/GenProduction/python/drellyan-mll50.py. You will notice huge block of new lines are added to drellyan-mll50-01j.py.

        processParameters = cms.vstring(
            'JetMatching:setMad = off',
            'JetMatching:scheme = 1',
            'JetMatching:merge = on',
            'JetMatching:jetAlgorithm = 2',
            'JetMatching:etaJetMax = 5.',
            'JetMatching:coneRadius = 1.',
            'JetMatching:slowJetPower = 1',
            'JetMatching:doShowerKt = off',
            'JetMatching:qCut = 19.',
            'JetMatching:nQmatch = 4',
            'JetMatching:nJetMax = 1',
            'TimeShower:mMaxGamma = 4.0'
        ),

Most of the lines could be treated as template for jet merging samples using MadGraph at LO and Pythia8 (for further information, (link)[https://pythia.org/latest-manual/JetMatching.html] and (link)[http://hep.ucsb.edu/people/cag/Matching.pdf] would be useful). Here JetMatching:qCut = 19., line defines the threshold to decide whether the event should be accepted or not. Again, although not exact, one can think of this as the threshold for the momentum scale of a jet in the event. If a jet momentum in the event is above 19GeV, event is only accepted from p p > e+ e- j type of events. If a jet momentum in the event is below 19GeV, event is only accepted from p p > e+ e- type of events.

Now let’s give -n 1000 as an option to cmsDriver.py. This will first create an LHE file with 1000 events and this will be given as an input for Pythia8.

cmsDriver.py Configuration/GenProduction/python/drellyan-mll50-01j.py \
    --python_filename run_drellyan-mll50-01j.py \
    --eventcontent NANOAOD \
    --datatier NANOAOD \
    --fileout file:drellyan-mll50-01j.root \
    --conditions auto:mc \
    --step LHE,GEN,NANOGEN \
    --no_exec \
    --mc \
    -n 1000
cmsRun run_drellyan-mll50-01j.py

Cross sections before and after jet merging will be reported as below.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Overall cross-section summary 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process		xsec_before [pb]		passed	nposw	nnegw	tried	nposw	nnegw 	xsec_match [pb]			accepted [%]	 event_eff [%]
0		1.833e+03 +/- 1.197e+01		467	467	0	623	623	0	1.374e+03 +/- 3.305e+01		75.0 +/- 1.7	75.0 +/- 1.7
1		1.099e+03 +/- 1.713e+01		159	159	0	377	377	0	4.636e+02 +/- 2.887e+01		42.2 +/- 2.5	42.2 +/- 2.5
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
Total		2.932e+03 +/- 2.090e+01		626	626	0	1000	1000	0	1.835e+03 +/- 4.673e+01		62.6 +/- 1.5	62.6 +/- 1.5
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 2.932e+03 +- 2.090e+01 pb
After matching: total cross section = 1.835e+03 +- 4.673e+01 pb
Matching efficiency = 0.6 +/- 0.0   [TO BE USED IN MCM]
Filter efficiency (taking into account weights)= (626) / (626) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (626) / (626) = 1.000e+00 +- 0.000e+00    [TO BE USED IN MCM]

After filter: final cross section = 1.835e+03 +- 4.673e+01 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 5.449e-01 +- 1.388e-02

=============================================

First two lines, Process denoted 0 and 1 are indicators for p p > e+ e- and p p > e+ e- j processes, respectively. For 0, 623 events were tried and 467 passed, which means jet merging procedure accepted 467 events out of 623 events from 0. For 1, jet merging procedure accepted 159 events out of 377 events. Note that the sum of tried is 1000 which was the given input with -n 1000 at the LHE level. After jet merging has been done, events are discarded and the final cross section is reported After filter: final cross section = 1.835e+03 +- 4.673e+01 pb 1835pb which is dropped from the LHE given value Before matching: total cross section = 2.932e+03 +- 2.090e+01 pb 2932 pb.

Key Points

  • Pythia8 is the main tool used for parton showering in CMS

  • Events are not physical if it did not go through parton shower

  • Jet merging is a technique to avoid double countings of jet phase spaces in ME and PS calculations


3 - Optional (MadSpin and BSM UFO model)

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • What is MadSpin used for?

  • How do we customize the BSM parameters in the UFO model?

Objectives
  • Understand the role of MadSpin.

  • Customize BSM parameters for gridpacks.

Decay of resonant particles in MadGraph

Until now, we ran the physics process defined as

generate p p > e+ e-

What we can also do is below.

generate p p > z, z > e+ e-

The difference between p p > e+ e- and p p > z, z > e+ e- is that the former is the full DY physics process whereas the latter forces the two quarks to produce a Z boson initially and then lets it decay into electrons. Splitting the two processes with , is the key which tells where the matrix element calculation should be split. p p > e+ e- calculates 2 -> 2 and p p > z, z > e+ e- is 2 -> 1 and 1 -> 2.

cd $GENMGPATH
./bin/mg5 standalone/onshellz.config

Compare events in the new LHE file standalone-onshellz/Events/run_01/unweighted_events.lhe.gz with the old standalone-drellyan-mll4//Events/run_01/unweighted_events.lhe.gz. For every event in the new LHE file, a particle with 23 shows up as onshell Z boson is always produced in matrix element level calculations. In contrast, the old LHE file which allows offshell Z (along with gamma), does not necessarily require the existence of such particles in the LHE file. One event for example is below, u = 2 and ubar = -2 quarks produce z = 23 and decays into positron = -11 and electron = 11.

<event>
 5      1 +1.4201000e+03 9.06792500e+01 7.54677100e-03 1.30121500e-01
        2 -1    0    0  501    0 +0.0000000000e+00 +0.0000000000e+00 +1.4078650070e+02 1.4078650070e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
       -2 -1    0    0    0  501 -0.0000000000e+00 -0.0000000000e+00 -1.4601410033e+01 1.4601410033e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
       23  2    1    2    0    0 +0.0000000000e+00 +0.0000000000e+00 +1.2618509067e+02 1.5538791074e+02 9.0679246224e+01 0.0000e+00 0.0000e+00
      -11  1    3    3    0    0 -2.5416006610e+01 +3.5010752474e+01 +8.6334112633e+01 9.6567619754e+01 0.0000000000e+00 0.0000e+00 1.0000e+00
       11  1    3    3    0    0 +2.5416006610e+01 -3.5010752474e+01 +3.9850978037e+01 5.8820290983e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
</event>

By making the histograms, situation becomes more clearly understandable.

cd $GENPLOTPATH
python3 lhe-root-plotter-onshellz.py

From the histograms, you can clearly see that there is no event generated outside the Z boson mass window.

Using MadSpin

Now we will learn some advanced use cases of MadGraph which is using the MadSpin plugin. MadSpin, as we saw earlier, is one of the modules that runs through MadGraph interface which handles the decay of resonant particles. We just took a look at a physics process p p > z, z > e+ e-. Here, z is the resonant particle so when we chooses to use MadSpin, above process can be split into two where we first do

generate p p > z

and then decay z into the electron pair using MadSpin.

But the question still remains, why is MadSpin in any case useful? The answer lies in NLO calculations in QCD or loop-induced processes. Let’s launch MadGraph prompt shell again.

cd ${GENTUTPATH}/standalone-tut/MG5_aMC_v3_5_2/
./bin/mg5_aMC

Now try making another simple example that is top pair production.

import model sm
generate p p > t t~ [QCD]

It would be not so difficult to realize [QCD] has been added in the process definition. This is a flag which tells MadGraph that you wish to do the calculations at NLO in QCD.

Before going further, try concatenating top decays into a W boson and a b quark similar to what we did for Z -> ee example.

generate p p > t t~, t > w+ b [QCD]
generate p p > t t~ [QCD], t > w+ b
exit

You will find neither of these working and instead MadGraph complains with an error log saying str : Decay processes cannot be perturbed. So it means that physics processes with decays of particles are are not possible for NLO calculations. This is where MadSpin becomes necessary, for such cases where resonant particle cannot be decayed can be decayed using MadSpin. Now lets get back to the working example to see how it works.

import model sm
generate p p > t t~ [QCD]
output TopPair
launch
shower = PYTHIA8
4
0

Two lines are noticably added, shower = PYTHIA8 and 4 (which can be replaced with madspin = ON). We are again not going to do the parton shower here. This is because depending on which parton shower generator one chooses later, “counter term” calculation differs which accounts as negatively weighted events.

Negative weighted events

We won’t cover what it is in the tutorial but important things to remember are that

  1. Some portion of the events are negatively weighted so one needs to be careful with the normalization.
  2. LHE files at NLO are even more unphysical than LHE files at LO before parton shower.

Press tab to turn off timer. MadGraph again asks if you would like to edit the cards now including madspin_card.dat.

/------------------------------------------------------------\
|  1. param   : param_card.dat                               |
|  2. run     : run_card.dat                                 |
|  3. madspin : madspin_card.dat                             |
\------------------------------------------------------------/

If you take a look at the run_card.dat, you might notice that the template for it is quite different from when we did DY at LO. Template for NLO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/NLO_run_card.dat] and for LO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/LO_run_card.dat]. Although MadGraph shares the same user interface, LO and NLO calculations run on totally different codes in the backend. So NLO type run_card.dat does not work for LO calculations and vice versa.

Now take a look at madspin_card.dat by pressing 3.

# specify the decay for the final state particles
decay t > w+ b, w+ > all all
decay t~ > w- b~, w- > all all
decay w+ > all all
decay w- > all all
decay z > all all

This card lets you define how you want your resonant particles to decay. For example, if you do :

decay t > w+ b, w+ > e+ ve
decay t~ > w- b~, w- > mu- vm~

This forces top to decay into positron and antitop to decay into muon. Remove unnecessary decay definitions and add these two lines to make a top pair sample that ends up giving you positron and a muon. Before moving on, do set run_card nevents 50 to save time, producing only 50 events.

You will see inclusive top pair production cross section being computed which includes all possible decays for the top quark.

   --------------------------------------------------------------
      Summary:
      Process p p > t t~ [QCD]
      Run at p-p collider (6500.0 + 6500.0 GeV)
      Number of events generated: 50
      Total cross section: 6.847e+02 +- 4.3e+00 pb
   --------------------------------------------------------------

And then you will see MadSpin doing its job, decaying the top quarks to desired channels.

************************************************************
*                                                          *
*           W E L C O M E  to  M A D S P I N               *
*                                                          *
************************************************************

...

INFO: decay channels for t : ( width = 1.4915 GeV ) 
INFO:        BR                 d1  d2 
INFO:    1.000000e+00            b  w+  
INFO:    
INFO:    
INFO: decay channels for w+ : ( width = 2.04793 GeV ) 
INFO:        BR                 d1  d2 
INFO:    3.333610e-01            d~  u  
INFO:    3.333610e-01            s~  c  
INFO:    1.111195e-01            e+  ve  
INFO:    1.111195e-01            mu+  vm  
INFO:    1.110390e-01            ta+  vt  
INFO:    
INFO:    
INFO: decay channels for t~ : ( width = 1.4915 GeV ) 
INFO:        BR                 d1  d2 
INFO:    1.000000e+00            b~  w-  
INFO:    
INFO:    
INFO: decay channels for w- : ( width = 2.04793 GeV ) 
INFO:        BR                 d1  d2 
INFO:    3.333610e-01            d  u~  
INFO:    3.333610e-01            s  c~  
INFO:    1.111195e-01            e-  ve~  
INFO:    1.111195e-01            mu-  vm~  
INFO:    1.110390e-01            ta-  vt~

...

INFO:    Estimating the maximum weight     
INFO:    *****************************     
INFO:      Probing the first 75 events 
INFO:      with 400 phase space points 
INFO:    
INFO: Event 1/75 :  0.068s   
INFO: Event 6/75 :  0.63s   
INFO: Event 11/75 :  1.2s   
INFO: Event 16/75 :  1.8s   
INFO: Event 21/75 :  2.1s   
INFO: Event 26/75 :  3s   
INFO: Event 31/75 :  3.8s   
INFO: Event 36/75 :  4.6s   
INFO: Event 41/75 :  5.7s   
INFO: Event 46/75 :  6.5s   

What is the cross section?

Inclusive cross section was reported to be 684.7pb as we saw above. When considering the decay channels (e+ and mu- final states), what is the proper cross section? What are the branching ratios for w+ > e+ ve and w- > mu- vm~?

Solution

8.5pb (from 684.7 x 11% x 11%)

How can we make a sample that yields mu+, vm, and this time, two quark jets (hadronically decaying w-)

Solution

decay t > w+ b, w+ > mu+ vm
decay t~ > w- b, w- > j j

Interfacing BSM UFO model files

Let’s take a look at how BSM samples for search type of analyses gets produced. We will pick one simple example, a hypothetical heavy gauge boson that is called W’ particle.

import model WEff_UFO
display particles
generate p p > wp+, wp+ > e+ ve
add process p p > wp-, wp- > e- ve~
output WprimeToENu

How can we make the syntax simpler using particle containers?

How can we write generate p p > wp+, wp+ > e+ ve and add process p p > wp-, wp- > e- ve~ in a simpler way?

Solution

define wprime = wp+ wp-
define leptons = e+ e- ve ve~
generate p p > wprime, wprime > leptons leptons

This will find all possible Feynman diagrams with given particle combinations.

As we are missing right-handed interactions for W bosons in the SM, a lot of BSM scenarios predict the W’ boson that is heavier in mass (thus, we couldn’t find it yet) but possesses the ability to interact with right-handed couplings. As we do not know how large the particle’s mass is, we test many different scenarios (BSM parameters), for example, different masses, decay channels, coupling strengths. We will now see how such BSM parameters can be set in MadGraph.

launch
0

And press tab to turn off the timer.

Take a look at the parameter card by hitting 1.

Now you will see there is a clear difference in the parameter settings when compared to the sm model file we’ve been using. Here, we will only be focusing on the mass of W’ MWp and the right-handed coupling strength kR. In addition, you will also need to keep in mind that widths of the W’ wwp should be changing based on how you choose your BSM parameters.

###################################
## INFORMATION FOR MASS
###################################
Block mass
    1 5.040000e-03 # MD
    2 2.550000e-03 # MU
    3 1.010000e-01 # MS
    4 1.270000e+00 # MC
    5 4.700000e+00 # MB
    6 1.720000e+02 # MT
   11 5.110000e-04 # Me
   13 1.056600e-01 # MMU
   15 1.777000e+00 # MTA
   23 9.118760e+01 # MZ
   25 1.250000e+02 # MH
   34 1.000000e+03 # MWp

...

###################################
## INFORMATION FOR WPCOUP
###################################
Block wpcoup
    1 0.000000e+00 # kL
    2 1.000000e+00 # kR

...

###################################
## INFORMATION FOR DECAY
###################################
DECAY   6 1.508336e+00 # WT
DECAY  23 2.495200e+00 # WZ
DECAY  24 2.085000e+00 # WW
DECAY  25 4.070000e-03 # WH
DECAY  34 1.000000e+01 # WWp

You can see that the mass of W’ is now set to 1000GeV, right-handed coupling strength is set to 1.0, and the width of W’ is given with 10GeV. You can change the BSM parameters, maybe mass to 2000GeV and coupling strength to 0.1 by doing below.

set param_card mwp 2000
set param_card kr 0.1

However, if you again take a look at the parameter card, the width of W’ wwp is kept same. You can interactively see how the width value gets computed by doing compute_widths wp+. Check the parameter card again, and you would see that width has changed and also tells you the branching ratios to different channels.

#      PDG        Width
DECAY  34   6.672601e-01
#  BR             NDA  ID1    ID2   ...
   2.506959e-01   2    2  -1 # 0.1672793598579319
   2.479126e-01   2    6  -5 # 0.16542221070326676
   2.379169e-01   2    4  -3 # 0.15875247762227632
   8.356529e-02   2    12  -11 # 0.05575978661997229
   8.356529e-02   2    14  -13 # 0.05575978638653866
   8.356519e-02   2    16  -15 # 0.05575972059211703
   1.277883e-02   2    2  -3 # 0.008526805639865994

Instead of doing interactive width computation, you can do set param_card wwp auto. Then instead of first computing the widths, MadGraph will calculate the widths on-the-fly while generating events (but results will be identical).

Proceed by hitting 0 and see how much cross section it gives you when hypothetically the W’ boson exists and decays to the electron channel, assuming mass 2000GeV with right handed coupling 0.1.

  === Results Summary for run: run_01 tag: tag_1 ===

     Cross-section :   0.001016 +- 1.447e-06 pb
     Nb of events :  10000

How can we check the cross section when mass is 2000GeV with right handed coupling 1.0?

Solution

Repeat the exercise above but this time

set param_card mwp 2000
set param_card kr 1.0

And most importantly, do not forget to compute the width by adding :

set param_card wwp auto

Then you will get the following result.

  === Results Summary for run: run_01 tag: tag_1 ===

    Cross-section :   0.1045 +- 0.0001681 pb
    Nb of events :  10000

How much did the cross section increase compared to the scenario when mass is 2000GeV with right handed coupling 0.1?

How many interactions did the W’ boson get involved in?

Solution

One vertex when producing it, another vertex when it decays to electron channel. Thus two interactions (1./0.1) = 10 gets squared and thus result in 100 times larger cross section.

Key Points


4 - CMS resources for samples and generators

Overview

Teaching: 5 min
Exercises: 10 min
Questions
  • How do I find centrally produced samples and their status?

  • How do I obtain a cross section to normalize my sample?

Objectives
  • Leverage available tools for efficient analysis work

CMS resources for simulated samples

Get configurations for a certain sample from McM. E.g. you want the inclusive W+jets sample, start from a DAS query (requires a valid grid certificate / proxy):

dasgoclient -query="/WJetsToLNu*/RunIISummer20UL18*/MINIAODSIM"

Alternatively there’s also a web-based DAS client: https://cmsweb.cern.ch/das/.

/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v4/MINIAODSIM
/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRK_TRK_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRKv2_TRKv2_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-100to200_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM

We want the inclusive LO sample with the latest MiniAOD version (MiniAODv2), hence we pick /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM. Plug this name into ‘‘Output Dataset’’ in McM, then click on the dataset name (WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8). In ‘‘Select View’’ check ‘‘Fragment’’ and click on the expand icon under ‘‘Fragment’’ (rightmost column) for the request with a Summer20UL18wmLHEGS PrepId. You can also filter the results directly by appending ?dataset_name=WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8&prepid=*Summer20UL18wmLHE* to the requests address, https://cms-pdmv.cern.ch/mcm/requests.

Status of samples

GrASP is tool to conveniently track the status of your samples. Just select the campaigns you’re interested in (e.g. Run2 UL or Run3) and type the sample name. You can also tag samples of your analysis so that they are easier to find and keep track of.

Cross sections

CMSSW analyzer

In the following, we will use a CMSSW analyzer called GenXSecAnalyzer to compute the cross section of samples. The analyzer takes a list of EDM files as input (i.e., no NanoAOD or NanoGEN). Make sure you are in a CMSSW environment

cd ${CDGPATH}/CMSSW_10_6_19/src/
cmsenv

You can then use the prepared configurations to obtain the cross section for a sample of your liking, e.g. /TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM

dasgoclient -query="file dataset=/TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM | grep file.name" > myfiles.txt
cmsRun $CDGPATH/gen-cmsdas-2023/configs/xsec_ana.py inputFiles="myfiles.txt" maxEvents=100000

In this example we restrict the maximum number of events to 100k. This will give us a large enough sample for a reliable result, without running too long (the sample has 10.5M events, you can use DAS to verify this number with dasgoclient -query="summary dataset=DATASET).

The inputFiles option takes a range of options:

Questions:

xsec DB

A central database is kept with approved x-secs for centrally produced samples, XSDB.

The CMS Generator’s group Cross Section Database Tool (XSDB) is a tool for storing and looking up information related to a specific MC sample witihin CMS. This tool is designed to complement DAS and MCM, with a direct link from DAS being available to a specific sample. Anyone with a CERN login can view the XSDB and perform queries for sample information. However, further action is restricted by e-group permissions. There exist a user’s, approver’s, and admin e-groups. The XSB users are CMS individuals that have permission to insert and modify documents for XSDB. Approvers have the same user privileges as users, but are primarily tasked with approving records submitted by users. The admins have the responsibility of maintaining and improving the tool for future use.

There is a large amount of information that can be stored in the database for each sample. This information includes: cross section value, cross section uncertainty, hadronization tool, matrix element generator, sample contact, cuts used, DASprimary dataset name, and MCM prename, among other metadata. This information can then be used to help with analysis. In this exercise, we will simply try some searching through XSDB for a sample, looking at some information stored there and getting familiar with the XSDB.

We would like to search for a sample within XSDB. We’ll look for an EXO sample used in the Contact Interaction qqbar to dimuon channel in the search for compositeness. The sample can be found in DAS with the dataset name: /CITo2Mu_M1300_CUETP8M1_Lam10TeVDesRR_13TeV-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

It is possible to search for a substring of the item that one would like to look for. It is important to note that wildcards are supported, however as long as the string is contiguous, it will be accepted by the XSDB query. XSDB also supports boolean queries. If we want to query the database for our original sample we could use the following: process_name=cito2mu && total_uncertainty=21.42 You can also query for your favorite MC sample. The XSDB twiki can be found here: XSDB twiki.

Key Points

  • DAS can be used to find samples and their files, number of events for a certain sample etc

  • McM is used for sample generation management, and can be used by the user to obtain additional information about their samples, e.g. the root gridpack, fragments etc.

  • McM is also a good place to look for example cmsDriver commands

  • Different sources for x-secs exist within CMS: a CMSSW analyzer and a database