1 - Matrix Element Generator
Overview
Teaching: 20 min
Exercises: 40 minQuestions
What is Monte Carlo event generator?
Why are we using simulated samples in CMS?
How are the simulated samples produced in CMS?
Objectives
Running standalone MadGraph with simple Z boson process
Producing MadGraph gridpacks using CMS script
Analyzing LHE level information
Introduction and first steps
Although quite old, link is a great reading material to get a general overview of Monte Carlo event generators. Monte Carlo event generators are essential components of almost all experimental analyses and are also widely used by theorists and experiments to make predictions and preparations for future experiments. It is one of the topics where we CMS experimentalists and theorists have the closest connections to, theorists give us predictions and experimentalists verify them with the actual data. Although Monte Carlo event generators are extremely important tools in HEP, they are often used as black boxes which we more or less treat them as “data”. Our aim is to get the minimal background of how these tools are working and analyze them using the generator level information.
Samples that are used by CMS experiments go through several steps of simulation :
- Monte Carlo event generator
- Detector simulation
- Pileup mixing
- Trigger emulation
- Object econstruction
We focus on “1. Monte Carlo event generator” in this tutorial. Monte Carlo event generator can be further divided into several subpieces as each steps can be factorized and can be handled through separate calculations :
- Parton distribution function (PDF)
- Hard scattering (matrix element calculation)
- Parton shower & hadronization First of all, LHC is a proton-proton collider, hence we need information on how partons (quarks and gluons) are distributed in the proton (PDF). Hard scattering is the part where calculations can be treated perturbatively, interactions of incoming partons with the largest momentum transfer (usually the physics process we are interested in). Parton shower & hadronization further describes how the particles involed in the hard scattering evolve, working downwards to lower momentum scales even to a point where perturbative calculations break down.
(1) Standalone : DY to ee
In the first exercise, we will run one of the most widely used tool for hard scattering calculations, that is MadGraph5_aMCatNLO, in short MadGraph link.
MadGraph can perform the calculations for many different physics processes (both SM and BSM) at LO or NLO in QCD.
Because of its easy user interface and flexibility with UFO models, you can test wide variety of physics modeling.
We will now first see how MadGraph runs interactively in standalone mode using simple DYtoee
process as an example.
Before we proceed, make sure you have first completed the steps described in “Setup” section. We ended with “Setup” section with commands below, configuring MadGraph with several settings.
cd ${GENTUTPATH}/standalone-tut/MG5_aMC_v2_9_18/
cp -r ${GENTUTPATH}/generators-cmsdaslpc2024-git/standalone ./
./bin/mg5_aMC standalone/setup.config
Through this, we restrict ourselves to using maximally 2 cores with set nb_core 2
.
Otherwise MadGraph will interfere with other people’s running jobs.
We also installed ninja
and collier
which are tools that MadGraph adopts for NLO calculations.
Before going further, go into input/mg5_configuration.txt
and change # text_editor = None
by removing #
and replacing None
with your favorite text editor. For example, text_editor = vim
.
Now launch MadGraph prompt shell by doing :
./bin/mg5_aMC
Now let’s try with the simplest DYtoee
example.
import model sm
generate p p > e+ e-
output standalone-drellyan-mll50
First line tells MadGraph that you would like to use the UFO model named sm
for calculations.
Second line defines which physics process to generate, and in this particular example you are asking for the process where two “quarks from proton” produce a Z/gamma* mediators and then decays into an electron and a positron.
Keep in mind that the calculations are performed on “two quarks” and not “two protons”.
The information which translates “protons -> quarks” actually come from PDF.
Last line sets the output directory for the computation results, M50 is to indicate the dilepton mass phase space cut we are about to apply in a few minutes.
Now launch!
launch
After MadGraph found all the Feynman diagrams that you targetted, you can see that MadGraph asking you several questions as shown below.
Press tab
to turn off the timer (otherwise, MadGraph will move on by itself after 60 seconds).
/===========================================================================\
| 1. Choose the shower/hadronization program shower = Not Avail. |
| 2. Choose the detector simulation program detector = Not Avail. |
| 3. Choose an analysis package (plot/convert) analysis = ExRoot |
| 4. Decay onshell particles madspin = OFF |
| 5. Add weights to events for new hypp. reweight = Not Avail. |
\===========================================================================/
As we did not install any other shower
, detector
, they are in Not Avail.
state.
We will learn later how showering will be done under CMSSW and run brief analyzing/histogramming code to analyze the events we produce from this tutorial.
ExRootAnalysis (analysis = ExRoot
) is installed to later use it to convert LHE files to ROOT files and draw histograms using it.
madspin
will be demonstrated later using top pair process example in the third (optional) exercise.
reweight
is out of scope for this tutorial although it is quite useful for certain BSM scenarios.
Let’s move on by pressing ENTER
.
You can see that MadGraph is asking you several questions as shown below.
Again, press tab
to turn off the timer (otherwise, MadGraph will move on by itself after 90 seconds).
Do you want to edit a card (press enter to bypass editing)?
/------------------------------------------------------------\
| 1. param : param_card.dat |
| 2. run : run_card.dat |
\------------------------------------------------------------/
you can also
- enter the path to a valid card or banner.
- use the 'set' command to modify a parameter directly.
The set option works only for param_card and run_card.
Type 'help set' for more information on this command.
- call an external program (ASperGE/MadWidth/...).
Type 'help' for the list of available command
[0, done, 1, param, 2, run, enter path][90s to answer]
Let’s take a look at the cards and see how the values are set, press 1
and ENTER
to investigate the parameter settings.
###################################
## INFORMATION FOR MASS
###################################
Block mass
5 4.700000e+00 # MB
6 1.730000e+02 # MT
15 1.777000e+00 # MTA
23 9.118800e+01 # MZ
25 1.250000e+02 # MH
...
###################################
## INFORMATION FOR DECAY
###################################
DECAY 6 1.491500e+00 # WT
DECAY 23 2.441404e+00 # WZ
DECAY 24 2.047600e+00 # WW
DECAY 25 6.382339e-03 # WH
Let’s take a look at the cards and see how the values are set, press 2
and ENTER
to investigate the run settings.
#*********************************************************************
# Number of events and rnd seed *
# Warning: Do not generate more than 1M events in a single run *
#*********************************************************************
10000 = nevents ! Number of unweighted events requested
0 = iseed ! rnd seed (0=assigned automatically=default))
...
#*********************************************************************
# Collider type and energy *
# lpp: 0=No PDF, 1=proton, -1=antiproton, *
# 2=elastic photon of proton/ion beam *
# +/-3=PDF of electron/positron beam *
# +/-4=PDF of muon/antimuon beam *
#*********************************************************************
1 = lpp1 ! beam 1 type
1 = lpp2 ! beam 2 type
6500.0 = ebeam1 ! beam 1 total energy in GeV
6500.0 = ebeam2 ! beam 2 total energy in GeV
...
#*********************************************************************
# Standard Cuts *
#*********************************************************************
# Minimum and maximum pt's (for max, -1 means no cut) *
#*********************************************************************
10.0 = ptl ! minimum pt for the charged leptons
-1.0 = ptlmax ! maximum pt for the charged leptons
{} = pt_min_pdg ! pt cut for other particles (use pdg code). Applied on particle and anti-particle
{} = pt_max_pdg ! pt cut for other particles (syntax e.g. {6: 100, 25: 50})
...
#*********************************************************************
# Minimum and maximum invariant mass for pairs *
#*********************************************************************
0.0 = mmll ! min invariant mass of l+l- (same flavour) lepton pair
-1.0 = mmllmax ! max invariant mass of l+l- (same flavour) lepton pair
{} = mxx_min_pdg ! min invariant mass of a pair of particles X/X~ (e.g. {6:250})
{'default': False} = mxx_only_part_antipart ! if True the invariant mass is applied only
! to pairs of particle/antiparticle and not to pairs of the same pdg codes.
...
#*********************************************************************
# maximal pdg code for quark to be considered as a light jet *
# (otherwise b cuts are applied) *
#*********************************************************************
4 = maxjetflavor ! Maximum jet pdg code
Try editting the beam energy (ebeam1
and ebeam2
) 6500
to 6800
as we are now running at 13.6TeV beam energy.
When done with editting, escape after saving the changes in the text file.
MadGraph allows you to change settings by interactively typing in below as well.
set run_card nevents 5000
Take a look at the run card again and see if number of events to generate (nevents
) is changed to 5000
.
And change it back to 10000
using same command and check again.
As shown above, there are several phase space cuts set by default (e.g. 10.0 = ptl
).
There is a handy command that removes all phase space cuts at once (instead of doing set run_card ptl 0
, set run_card ptj 0
, … one by one by hand).
set no_parton_cut
Take a look at the card again and see if lepton pt cut (ptl
) is changed to 0
.
Keep in mind that the cuts you give before doing set no_parton_cut
will be removed by this command.
So don’t forget to do set no_parton_cut
before giving the cuts you wish to give.
As mentioned above, mll50
in the output directory name stands for dilepton mass cut at 50GeV.
How should we set the dilepton mass cut?
In MadGraph LO run card, the name of dilepton mass variable is
mmll
. How should we give 50GeV cut to this value?Solution
set run_card mmll 50
After you verified the desired dilepton mass cut is given, let’s really start the computation by moving on, press ENTER
.
What is the cross section?
Take a close look at what MadGraph logs tell you.
Solution
=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 1584 +- 1.159 pb Nb of events : 10000
Type in exit
in order to escape from MadGraph shell prompt.
We will take a look at the output LHE file.
less $GENMGPATH/standalone-drellyan-mll50/Events/run_01/unweighted_events.lhe.gz
Scroll down to look at the first event (in order to exit, hit q
).
<event>
5 1 +1.4934000e+03 9.10903200e+01 7.54677100e-03 1.30023300e-01
2 -1 0 0 501 0 +0.0000000000e+00 +0.0000000000e+00 +1.2430983507e+01 1.2430983507e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
-2 -1 0 0 0 501 -0.0000000000e+00 -0.0000000000e+00 -1.6687025534e+02 1.6687025534e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
23 2 1 2 0 0 +0.0000000000e+00 +0.0000000000e+00 -1.5443927183e+02 1.7930123884e+02 9.1090315443e+01 0.0000e+00 0.0000e+00
-11 1 3 3 0 0 -2.3393803385e+01 -7.4187481776e+00 -1.5274153214e+02 1.5470062541e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
11 1 3 3 0 0 +2.3393803385e+01 +7.4187481776e+00 -1.6977396902e+00 2.4600613435e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
What does each column mean?
Solution
ID
,status
,mother1
,mother2
,color
,anticolor
,px
,py
,pz
,E
,mass
,life time
, andspin
-11 1 3 3 0 0 -2.3393803385e+01 -7.4187481776e+00 -1.5274153214e+02 1.5470062541e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
This line tells you that a positron (
ID
) is an outgoing particle (status
) with Z as its mother (mother1
andmother2
: 3rd particle is Z which isID=23
) with no color (color
andanticolor
), …
(2) Standalone : DY to ll (using particle containers)
Now let’s learn about particle containers an easier way to deal with multiple particles.
Launch MadGraph shell prompt with ./bin/mg5_aMC
as we did above.
import model sm
display multiparticles
Above command will show several predefined particle containers as below.
Multiparticle labels:
p = g u c d s u~ c~ d~ s~
j = g u c d s u~ c~ d~ s~
l+ = e+ mu+
l- = e- mu-
vl = ve vm vt
vl~ = ve~ vm~ vt~
all = g u c d s u~ c~ d~ s~ a ve vm vt e- mu- ve~ vm~ vt~ e+ mu+ t b t~ b~ z w+ h w- ta- ta+
One can redefine or newly define the particle containers by doing :
define l+ = e+ mu+ ta+
define l- = e- mu- ta-
define myleptons+ = e+ mu+ ta+
define myleptons- = e- mu- ta-
define lpcdas = e+ mu- u d b~
display multiparticles
Now let’s try making the same DY process events but this time allowing all lepton flavors. Previously we only did electron pair, now we are about to use particle containers collect all possible dilepton contributions including muons and taus.
generate p p > l+ l-
output standalone-drellyan-mll50-inclusive
launch
0
set run_card nevents 5000
set ebeam1 6800
set ebeam2 6800
set no_parton_cut
set mmll 50
set use_syst False
0
What is the cross section?
We added 2 new Feynman diagrams (decaying to muon pair and tau pair). How should the cross section be adding up from previous value 1493pb?
Solution
=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 4748 +- 5.361 pb Nb of events : 5000
Another way to generate multiple Feynman diagrams is by using add process
as below.
import model sm
generate p p > e+ e-
add process p p > mu+ mu-
add process p p > ta+ ta-
There is one more cool trick to use MadGraph.
Take a look at standalone/drellyan-mll10.config
file.
generate p p > e+ e-
output standalone-drellyan-mll10
launch
set nevents 10000
set no_parton_cut
set mmll 10
set use_syst False
0
Nothing has changed except that set mmll 50
from above now became set mmll 10
.
We loosened the dilepton mass cut in this script.
Try this :
./bin/mg5_aMC standalone/drellyan-mll10.config
MadGraph reads the line and automatically passes the command to MadGraph prompt shell.
Try this again :
./bin/mg5_aMC standalone/drellyan-mll4.config
What are the cross sections?
How are the cross sections changing when compared to the case where 50GeV cut was given?
Solution
Cross sections get larger as we loosen the cuts drastically (we will later see this through a histogram).
Take a quick look at the plots. We will draw two histograms (transverse momentum and mass of the dilepton system) with the samples we’ve just produced.
cd $GENTUTPATH/CMSSW_12_4_14_patch2/src
cmsenv
mkdir -p $GENPLOTPATH
cd $GENPLOTPATH
cp $GENTUTPATH/generators-cmsdaslpc2024-git/plotter/*.py ./
python3 lhe-root-plotter.py
What can you infer from the plots?
Why is the transverse momentum distribution only populating at 0?
Solution
Transverse momentum peaks at 0 because the sum of intial quark’s momentum only lies in z-axis direction. How do we acquire transversal direction momentum of the Z boson?
What are the two peaks in the mass distribution?
Solution
Two peaks represent photon and Z boson. What will happen if we remove the cut on dilepton mass (‘set mmll 0’)?
(3) Gridpack : DY to ee producing gridpacks for CMS sample production)
As we learned above, running standalone MadGraph is not so difficult.
And it is often useful to do quick tests if you are curious about certain physics processes and its cross sections.
However, CMS relies on billions of events (we produce more than 50B events per year) for physics analysis.
Could we handle all the necessary statistics by interactively running standalone MadGraph?
What if the person who first produced 10M events for Z->ee
process decides to leave CMS and we decide to make 40M events more?
Can we ensure all the physics settings (which PDF set was chosen, how are kinematic cuts given, etc.) are all kept consistently?
To mitigate such issues, CMS has developed a workflow called gridpacks which is maintained in link.
Gridpacks are precompiled library that contains all necessary executables from MadGraph to produce LHE events.
It is particularly useful for physics processes that require higher multiplicity of particles (we’ve only tried with 2->2
physics process, think of more complex physics processes e.g. pp->eejjjj
which is 2->6
with 4 additional QCD particles denoted with j
) as the precompilation greatly reduces the computing time.
Here, instead of running MadGraph interactively, we will only give inputs to the CMS developed workflow and produce gridpacks and then see how they are the same or different compared to the standalone exercise.
Before we begin, we first need to unset CMSSW environment settings (or open a new terminal) as it might interfere with the scripts in genproductions repository.
eval `scram unsetenv -sh`
Now lets go into genproductions to try out the gridpack production.
cd $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO
cp -r ${GENTUTPATH}/generators-cmsdaslpc2024-git/gridpack ./
Take a look at the cards in gridpack/drellyan-mll50/
directory.
There are two .dat
files which are minimal inputs to make gridpacks.
less gridpack/drellyan-mll50/drellyan-mll50_proc_card.dat
less gridpack/drellyan-mll50/drellyan-mll50_run_card.dat
You would notice that the proc_card.dat
defines the physics process that we want to calculate is the same as the first example with standalone exercise.
From the run_card.dat
you would notice that PDF choice that did not exist in the exercise above is now showing up.
#*********************************************************************
# PDF CHOICE: this automatically fixes also alpha_s and its evol. *
#*********************************************************************
'lhapdf' = pdlabel ! PDF set
$DEFAULT_PDF_SETS = lhaid
$DEFAULT_PDF_MEMBERS = reweight_PDF ! if pdlabel=lhapdf, this is the lhapdf number
$DEFAULT_PDF_SETS
and DEFAULT_PDF_MEMBERS
are parsed later automatically through Utilities/gridpack_helpers.sh
in genproductions.
This is CMS specific part of the code to keep consistent PDF setup among different CMS samples.
As there are many people running the tutorial at once, let’s restrict the core usage to 2 again and then start the gridpack production.
export NB_CORE=2
./gridpack_generation.sh drellyan-mll50 gridpack/drellyan-mll50/ pdmv
Note pdmv
is only there to restrict the number of cores to use to 2 set with NB_CORE
, normally you just need to execute it with ./gridpack_generation.sh <process name> <path to card>
without pdmv
.
Keep in mind that everything is exactly the same as the standalone tutorial except that gridpack_generation.sh
is merely replacing every interactive commands that we were giving to MadGraph prompt shell.
You can see that MadGraph is downloaded from the web,
/uscms/home/sjeon/nobackup/GENTUTORIAL/gridpack-tut/genproductions/bin/MadGraph5_aMCatNLO
WARNING: In non-interactive mode release checks e.g. deprecated releases, production architectures are disabled.
WARNING: In non-interactive mode release checks e.g. deprecated releases, production architectures are disabled.
--2024-01-01 17:21:56-- https://cms-project-generators.web.cern.ch/cms-project-generators/MG5_aMC_v2.9.13.tar.gz
Resolving cms-project-generators.web.cern.ch (cms-project-generators.web.cern.ch)... 2001:1458:d00:4e::100:3c0, 188.184.74.207
Connecting to cms-project-generators.web.cern.ch (cms-project-generators.web.cern.ch)|2001:1458:d00:4e::100:3c0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26561088 (25M) [application/gzip]
Saving to: 'MG5_aMC_v2.9.13.tar.gz'
0K .......... .......... .......... .......... .......... 0% 228K 1m54s
50K .......... .......... .......... .......... .......... 0% 456K 85s
100K .......... .......... .......... .......... .......... 0% 181M 57s
150K .......... .......... .......... .......... .......... 0% 301M 42s
then applying several patches (to mitigate several bugs that are discovered after the release),
patching file models/loop_qcd_qed_sm/restrict_lepton_masses_no_lepton_yukawas.dat
patching file models/loop_sm/restrict_ckm_no_b_mass.dat
patching file models/sm/restrict_ckm_lepton_masses.dat
patching file models/sm/restrict_ckm_lepton_masses_no_b_mass.dat
patching file models/sm/restrict_ckm_no_b_mass.dat
patching file models/sm/restrict_lepton_masses_no_b_mass.dat
patching file Template/NLO/SubProcesses/MCmasses_PYTHIA8.inc
patching file madgraph/interface/loop_interface.py
patching file madgraph/various/systematics.py
patching file Template/NLO/Source/make_opts.inc
patching file madgraph/iolibs/export_v4.py
patching file madgraph/iolibs/template_files/pdf_opendata.f
patching file madgraph/iolibs/template_files/pdf_wrap_lhapdf.f
then finding the desired Feynman diagram defined in proc_card.dat
,
import model sm
INFO: load particles
INFO: load vertices
INFO: Restrict model sm with file MG5_aMC_v2_9_13/models/sm/restrict_default.dat .
INFO: Run "set stdout_level DEBUG" before import for more information.
INFO: Change particles name to pass to MG5 convention
Defined multiparticle p = g u c d s u~ c~ d~ s~
Defined multiparticle j = g u c d s u~ c~ d~ s~
Defined multiparticle l+ = e+ mu+
Defined multiparticle l- = e- mu-
Defined multiparticle vl = ve vm vt
Defined multiparticle vl~ = ve~ vm~ vt~
Defined multiparticle all = g u c d s u~ c~ d~ s~ a ve vm vt e- mu- ve~ vm~ vt~ e+ mu+ t b t~ b~ z w+ h w- ta- ta+
generate p p > e+ e-
INFO: Checking for minimal orders which gives processes.
INFO: Please specify coupling orders to bypass this step.
INFO: Trying process: g g > e+ e- WEIGHTED<=4 @1
INFO: Trying process: u u~ > e+ e- WEIGHTED<=4 @1
INFO: Process has 2 diagrams
and finally giving out the computed cross sections.
=== Results Summary for run: pilotrun tag: tag_1 ===
Cross-section : 1836 +- 6.442 pb
Nb of events : 0
Why did the cross section change?
You would notice that the cross section has changed from 1493pb to 1836pb. What would be the reasons although we ran on exact same physics process?
Solution
Most importantly, different PDF set has been used (check by looking at
drellyan-mll50/drellyan-mll50_gridpack/work/gridpack/process/madevent/Cards/run_card.dat
). This will give you totally different assumptions on parton distributions in a proton hence difference in the results. Other minor reasons for the difference could be the use of different MadGraph release version or perhaps different random seed.
Now try the same with different gridpack cards named drellyan-mll50-5fs
.
./gridpack_generation.sh drellyan-mll50-5fs gridpack/drellyan-mll50-5fs/ pdmv
This will add a new contribution to the process that is b b~ > e+ e-
to the calculation as it uses different UFO model that is sm-no_b_mass
.
Strictly speaking, we are using the same UFO model but adding a restriction (bottom quarks are treated massless) to the model.
Take a look at restriction_card_tutorial from slide 24 for more information.
import model sm-no_b_mass
INFO: load particles
INFO: load vertices
INFO: Restrict model sm-no_b_mass with file MG5_aMC_v2_9_13/models/sm/restrict_no_b_mass.dat .
INFO: Run "set stdout_level DEBUG" before import for more information.
INFO: Change particles name to pass to MG5 convention
Defined multiparticle p = g u c d s u~ c~ d~ s~
Defined multiparticle j = g u c d s u~ c~ d~ s~
Defined multiparticle l+ = e+ mu+
Defined multiparticle l- = e- mu-
Defined multiparticle vl = ve vm vt
Defined multiparticle vl~ = ve~ vm~ vt~
Pass the definition of 'j' and 'p' to 5 flavour scheme.
Defined multiparticle all = g u c d s b u~ c~ d~ s~ b~ a ve vm vt e- mu- ve~ vm~ vt~ e+ mu+ t t~ z w+ h w- ta- ta+
generate p p > e+ e-
INFO: Checking for minimal orders which gives processes.
INFO: Please specify coupling orders to bypass this step.
INFO: Trying process: g g > e+ e- WEIGHTED<=4 @1
INFO: Trying process: u u~ > e+ e- WEIGHTED<=4 @1
INFO: Process has 2 diagrams
INFO: Trying process: u c~ > e+ e- WEIGHTED<=4 @1
INFO: Trying process: c u~ > e+ e- WEIGHTED<=4 @1
INFO: Trying process: c c~ > e+ e- WEIGHTED<=4 @1
INFO: Process has 2 diagrams
INFO: Trying process: d d~ > e+ e- WEIGHTED<=4 @1
INFO: Process has 2 diagrams
INFO: Trying process: d s~ > e+ e- WEIGHTED<=4 @1
INFO: Trying process: d b~ > e+ e- WEIGHTED<=4 @1
INFO: Trying process: s d~ > e+ e- WEIGHTED<=4 @1
INFO: Trying process: s s~ > e+ e- WEIGHTED<=4 @1
INFO: Process has 2 diagrams
INFO: Trying process: s b~ > e+ e- WEIGHTED<=4 @1
INFO: Process u~ u > e+ e- added to mirror process u u~ > e+ e-
INFO: Process c~ c > e+ e- added to mirror process c c~ > e+ e-
INFO: Process d~ d > e+ e- added to mirror process d d~ > e+ e-
INFO: Trying process: d~ b > e+ e- WEIGHTED<=4 @1
INFO: Process s~ s > e+ e- added to mirror process s s~ > e+ e-
INFO: Trying process: s~ b > e+ e- WEIGHTED<=4 @1
INFO: Trying process: b b~ > e+ e- WEIGHTED<=4 @1
INFO: Process has 2 diagrams
INFO: Process b~ b > e+ e- added to mirror process b b~ > e+ e-
5 processes with 10 diagrams generated in 0.032 s
Total: 5 processes with 10 diagrams
output drellyan-mll50-5fs
Now you would get the following, somewhat increased cross section 1903pb compared to the previous run 1836pb.
=== Results Summary for run: pilotrun tag: tag_1 ===
Cross-section : 1903 +- 6.669 pb
Nb of events : 0
Why did the cross section change?
Did your
proc_card.dat
add any new processes?Solution
We now have 5 flavor quarks in the proton which was 4 in the previous example by adding up the bottom quark contributions. So we have a new contribution that is
b b~ > e+ e-
.But why did it not scale up so much?
When we ran
p p > e+ e-
andp p > l+ l-
with standalone MadGraph, the cross sections was roughly 3 times larger. When we consider 4 (udcs) and 5 (udcsb) flavor schemes of proton should it not be 5/4 times larger?Solution
No, keep in mind that proton consists of two up quarks and one down antiquark which are valence quarks. The rest are sea quark contributions, smaller in PDF. Hence, the amount of increment coming from bottom quark contributions are not so large.
Let’s recall why gridpack is a useful package to use. It is a precompiled library to make the LHE files faster for the given process. CMS uses gridpacks to produce official samples as it is much easier to keep consistency of the sample. Now it’s time to find out how we make LHE files from gridpacks. To start with, copy and paste the gridpack to a new temporary directory and untar it.
mkdir test
cd test
cp ../drellyan-mll50-5fs_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz ./
tar -xvf drellyan-mll50-5fs_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
You can see that several files were compressed into one tarball.
Among many files, running runcmsgrid.sh
will take the precompiled library to generate events for the LHE file.
Inputs are given in following order, ./runcmsgrid.sh <nevents> <random seed> <ncores>
.
To make 50 events with random seed 1 using 1 core, execute below.
./runcmsgrid.sh 50 1 1
After the run has finished, LHE file with a name cmsgrid_final.lhe
has been produced.
It’s easy to reproduce more statistics as much as we need than running from scratch with standalone MadGraph.
The first step of generator tutorial has finished. Before moving on to the next step, let’s first run below as it takes a bit more time to finish.
cd $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO
./gridpack_generation.sh drellyan-mll50-01j gridpack/drellyan-mll50-01j/ pdmv
Important note : When asking questions to MadGraph authors in Launchpad link
Quite often, there are people asking questions “I am working for CMS experiment, I tried to make a gridpack using my awesome BSM UFO file with this awesome BSM particle predictions to use for my analysis. But the script I used in genproductions does not give me functional gridpacks. Please help me.”
Never do this! Setups in genproductions is not MadGraph authors turf!
- They have no responsibility to make our gripdack script work.
- They also have no idea (perhaps some idea) on what we do in genproductions
- They do not share the same computing environment.
Please first consult with GEN conveners or experts through CMS talk link. For more constructive iterations and feedbacks, provide your inputs (
.dat
files you used to generate gridpacks) and all possible error logs. Also as you all now know how to run standalone MadGraph, test your gridpack inputs with standdalone MadGraph first. If it works in standalone run but not in gridpack, it likely could be genproductions issue. If it does not work, it likely could be the core MadGraph issue or some mismodeling of the process.
Key Points
MadGraph is one of the most widely used generator for the hard scattering computations
Standalone MadGraph can run interactively on-the-fly or by importing the predefined text scripts
Gridpacks are useful for large scale productions with consistency guaranteed
LHE level information is not physical and parton shower is needed to describe full physics
2 - Parton Shower Generator
Overview
Teaching: 10 min
Exercises: 20 minQuestions
Why do we need parton shower?
How do we produce NanoGEN samples?
Objectives
Perform parton shower with LHE file as an input
Perform parton shower with gridpack as an input
Analyze generator level information using NanoGEN files
Creating particle level samples from LHE files
As discussed earlier, LHE files itself are not enough to describe physical distributions.
In order to generate physics-wise sensible events, LHE files need to go through the parton shower.
Parton shower, in principle, is responsible for higher order corrections to the hard process (consider q -> q g
or e -> e gamma
).
Dominant contributions of such correction happen with collinear or soft emissions.
In CMS, one of the most widely used tool for parton shower is Pythia8 (however, do note that Pythia8 is a multipurpose generator that is able to calculate hard process for certain physics processes).
In this exercise, instead of compiling Pythia8 and running it in standalone mode as we did for MadGraph, we will take Pythia8 that is already compiled under CMSSW environment.
(1) Running Pythia8 interface in CMSSW
Let’s first check which release version of Pythia8 we will be using.
cd $GENTUTPATH/CMSSW_12_4_14_patch2/src
cmsenv
scram tool info pythia8
You can find out that we are now using Pythia8.306 version that is already compiled in CMSSW_12_4_14_patch2
.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Name : pythia8
Version : 306-494ded5c626b685d055d5b022e918c0c
++++++++++++++++++++
INCLUDE=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/include
LIB=pythia8
LIBDIR=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/lib
PYTHIA8DATA=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/share/Pythia8/xmldoc
PYTHIA8_BASE=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c
ROOT_INCLUDE_PATH=/cvmfs/cms.cern.ch/slc7_amd64_gcc10/external/pythia8/306-494ded5c626b685d055d5b022e918c0c/include
SYSTEM_INCLUDE+=1
USE=root_cxxdefaults cxxcompiler hepmc3 hepmc lhapdf
Now we will start building our parton shower fragment in our own directories in order to produce samples by ourselves.
mkdir -p Configuration/GenProduction/python/
cp $GENTUTPATH/generators-cmsdaslpc2024-git/fragment/*.py Configuration/GenProduction/python/
scram b
mkdir -p $GENSHOWERPATH
cd $GENSHOWERPATH
cmsDriver.py
executable makes the full configuration file based on the optional arguments it is given with (data tier, campaign, etc.) using the parton shower fragment that is built.
We will create NanoGEN files that are flat ntuples that resembles the NanoAOD data tier but only stored with generator-level information related branches.
It skips the SIM and RECO steps in the middle which makes it convenient to do generator-level studies.
For more information, take a look at link.
cmsDriver.py Configuration/GenProduction/python/drellyan-mll50.py \
--python_filename run_drellyan-mll50.py \
--eventcontent NANOAOD \
--datatier NANOAOD \
--fileout file:drellyan-mll50.root \
--conditions auto:mc \
--step LHE,GEN,NANOGEN \
--no_exec \
--mc \
-n 100
You just created run_drellyan-mll50.py
that can be executed with cmsRun
command.
Take a look at run_drellyan-mll50.py
with less
, how it evolved from Configuration/GenProduction/python/drellyan-mll50.py
through cmsDriver.py
.
It will proudce LHE files, run parton shower to make GEN samples, and then finally convert it to NanoGEN format in one go by doing below.
Note that we will only test with 100 events (-n 100
) due to time constraints.
cmsRun run_drellyan-mll50.py
LHE files are first produced using the gridpack we’ve just produced.
______________________________________
Running Generic Tarball/Gridpack
______________________________________
gridpack tarball path = /uscms/home/sjeon/nobackup/GENTUTORIAL/gridpack-tut/genproductions/bin/MadGraph5_aMCatNLO/drellyan-mll50_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 thread count requested = 1
%MSG-MG5 residual/optional arguments =
%MSG-MG5 number of events requested = 100
%MSG-MG5 random seed used for the run = 234567
%MSG-MG5 number of cpus = 1
%MSG-MG5 SCRAM_ARCH version = slc7_amd64_gcc10
%MSG-MG5 CMSSW version = CMSSW_12_4_8
WARNING: Developer's area is created for non-production architecture slc7_amd64_gcc10. Production architecture for this release is el8_amd64_gcc10
**** Following environment variables are going to be unset.
CMSSW_FULL_RELEASE_BASE
Running MG5_aMC for the 1 time
produced_lhe 0 nevt 100 submitting_event 100 remaining_event 100
run.sh 100 2345670
Now generating 100 events with random seed 2345670 and granularity 1
Reweight with additional PDF sets given for possible systematic sources.
INFO: #***************************************************************************
#
# original cross-section: 1855.0899999999972
# scale variation: +10.6% -11.6%
# emission scale variation: + 0% - 0%
# central scheme variation: +3.05e-09% -17.8%
# PDF variation: +1.32% -1.32%
#
#PDF NNPDF31_nnlo_as_0118_nf_4: 1854.1 +1.32% -1.32%
#PDF NNPDF30_nnlo_nf_4_pdfas: 1816.21 +2.13% -2.13%
#PDF NNPDF40_nnlo_nf_4_pdfas: 1854.4 +0.579% -0.579%
#PDF MSHT20nnlo_nf4: 1827.24 +1.16% -1.61%
#PDF PDF4LHC21_40_pdfas_nf4: 1841.73 +1.59% -1.59%
#PDF ABMP16_4_nnlo: 1833.82 +0.925% -0.925%
# dynamical scheme # 1 : 1749 +11.8% -12.8% # \sum ET
# dynamical scheme # 2 : 1749 +11.8% -12.8% # \sum\sqrt{m^2+pt^2}
# dynamical scheme # 3 : 1524.71 +14.7% -15.9% # 0.5 \sum\sqrt{m^2+pt^2}
# dynamical scheme # 4 : 1855.09 +10.6% -11.6% # \sqrt{\hat s}
# PDF 42930 : 1837.5192582008478
#***************************************************************************
And then Pythia8 is launched with the LHE file created given as an input. It first prints out the LHE information as we saw directly in the LHE file.
-------- PYTHIA Event Listing (hard process) -----------------------------------------------------------------------------------
no id name status mothers daughters colours p_x p_y p_z e m
0 90 (system) -11 0 0 0 0 0 0 0.000 0.000 0.000 13600.000 13600.000
1 2212 (p+) -12 0 0 3 0 0 0 0.000 0.000 6800.000 6800.000 0.938
2 2212 (p+) -12 0 0 4 0 0 0 0.000 0.000 -6800.000 6800.000 0.938
3 2 (u) -21 1 0 5 6 501 0 0.000 0.000 66.079 66.079 0.000
4 -2 (ubar) -21 2 0 5 6 0 501 -0.000 -0.000 -36.939 36.939 0.000
5 -11 e+ 23 3 4 0 0 0 0 30.176 -10.240 -24.793 40.375 0.001
6 11 e- 23 3 4 0 0 0 0 -30.176 10.240 53.933 62.644 0.001
Charge sum: 0.000 Momentum sum: 0.000 0.000 29.140 103.019 98.811
Starts the parton shower on top of the given LHE event.
See how much more information gets printed out.
Remember that parton shower goes lower and lower from the hard process until certain energy threshold (q -> q g -> q g g g -> q q q g g -> ...
).
-------- PYTHIA Event Listing (complete event) ---------------------------------------------------------------------------------
no id name status mothers daughters colours p_x p_y p_z e m
0 90 (system) -11 0 0 0 0 0 0 0.000 0.000 0.000 13600.000 13600.000
1 2212 (p+) -12 0 0 265 0 0 0 0.000 0.000 6800.000 6800.000 0.938
2 2212 (p+) -12 0 0 266 0 0 0 0.000 0.000 -6800.000 6800.000 0.938
3 2 (u) -21 7 7 5 6 501 0 0.000 0.000 66.079 66.079 0.000
4 -2 (ubar) -21 8 0 5 6 0 501 -0.000 -0.000 -36.939 36.939 0.000
5 -11 (e+) -23 3 4 9 9 0 0 30.176 -10.240 -24.793 40.375 0.001
6 11 (e-) -23 3 4 10 10 0 0 -30.176 10.240 53.933 62.644 0.001
7 2 (u) -42 12 0 3 3 501 0 0.000 0.000 66.079 66.079 0.000
8 -2 (ubar) -41 13 13 11 4 0 502 -0.000 -0.000 -47.025 47.025 0.000
9 -11 (e+) -44 5 5 14 14 0 0 25.135 -14.682 -35.225 45.696 0.001
10 11 (e-) -44 6 6 15 15 0 0 -30.850 9.646 42.888 53.704 0.001
11 21 (g) -43 8 0 16 16 501 502 5.715 5.036 11.392 13.704 0.000
12 2 (u) -41 154 0 17 7 503 0 0.000 0.000 78.601 78.601 0.000
13 -2 (ubar) -42 155 155 8 8 0 502 -0.000 -0.000 -47.025 47.025 0.000
14 -11 (e+) -44 9 9 156 156 0 0 24.576 -15.108 -32.070 43.136 0.001
15 11 (e-) -44 10 10 157 157 0 0 -36.010 5.715 44.528 57.551 0.001
16 21 (g) -44 11 11 158 158 501 502 4.374 4.014 12.596 13.925 0.000
17 21 (g) -43 12 0 159 159 503 501 7.060 5.379 6.521 11.013 0.000
18 21 (g) -31 65 0 20 21 505 504 0.000 0.000 2.971 2.971 0.000
19 21 (g) -31 66 66 20 21 504 506 0.000 0.000 -25.830 25.830 0.000
20 21 (g) -33 18 19 67 67 507 506 -3.816 2.200 -23.877 24.280 0.000
21 21 (g) -33 18 19 68 68 505 507 3.816 -2.200 1.018 4.521 0.000
After 1 event information is printed out, 100 events get processed and finally reports the cross section.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Overall cross-section summary
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process xsec_before [pb] passed nposw nnegw tried nposw nnegw xsec_match [pb] accepted [%] event_eff [%]
0 1.855e+03 +/- 1.773e+01 100 100 0 100 100 0 1.855e+03 +/- 1.773e+01 100.0 +/- 0.0 100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 1.855e+03 +/- 1.773e+01 100 100 0 100 100 0 1.855e+03 +/- 1.773e+01 100.0 +/- 0.0 100.0 +/- 0.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 1.855e+03 +- 1.773e+01 pb
After matching: total cross section = 1.855e+03 +- 1.773e+01 pb
Matching efficiency = 1.0 +/- 0.0 [TO BE USED IN MCM]
Filter efficiency (taking into account weights)= (100) / (100) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (100) / (100) = 1.000e+00 +- 0.000e+00 [TO BE USED IN MCM]
After filter: final cross section = 1.855e+03 +- 1.773e+01 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 5.391e-01 +- 5.179e-03
=============================================
How did the cross section change after parton shower?
MadGraph reported
# original cross-section: 1855.0899999999972
, 1855pb. After running parton shower with Pythia8, same cross section 1855pb is kept. Parton shower adds more and more vertices, but why does the cross section remain unchanged?Solution
Parton shower is unitary. Sum of probability to branch (e.g
q -> q g
) and not branch is 1. Hence, the cross sections is determined by the lowest order input (hard process).
Previously, we saw the histogram of dilepton system’s transverse momentum using LHE information.
And we claimed it only being populated at 0GeV was not a physical distribution.
After parton shower, using the NanoGEN sample, let’s see how the distribution changed.
Due to time constraints, tutors prepared samples with 40000 events in /tmp/GENTUTORIAL/drellyan-mll50.root
for plotting purposes.
cd $GENPLOTPATH
python3 nanogen-plotter.py
How did the distribution change?
Where did the dilepton system acquire transverse momentum from?
Solution
Incoming partons from protons also go through parton shower which is named “initial state radiation (ISR)”.
What is GenDressedLepton?
What happens to leptons during parton shower? Are leptons kept stable during parton shower as it does not participate in strong interactions?
Solution
Parton shower, despite its choice of the naming, “parton”, also includes QED shower such as
e -> e gamma
. Dressed leptons (GenDressedLepton
collection in NanoGEN) is an object formed of the charged lepton and photons that are close to it.
(2) Jet merging samples
Hard process calculation has advantage in modeling of hard jets and heavy particle decays while parton shower is great for describing collinear and soft emissions. For more realistic and reliable physics modeling of hard jets, for example in DY events, MadGraph can be used as below.
generate p p > e+ e- @0
add process p p > e+ e- j @1
With such syntaxes, MadGraph produces DY process with 0 and 1 hard jet in the event.
If this sample goes through parton shower, as some portion of events (dentoed with @1
) readily involves hard jet, it would be better at describing DY process with hard jet.
However consider the event @0
emitting QCD particles from initial state radiation that could possibly form a jet that is hard enough.
Such phase space inherently possesses a problem of double counting as “DY with hard jet” event could come from both @0
and @1
.
To mitigate such issues and remove double counting of phase space contributions, jet merging technique is used.
Jet merging is set up with an artificial cut threshold called jet merging scale.
This scale decides whether an event will be accepted or not from both @0
and @1
.
Finally, only accepted events from the two processes will be merged and form one sample.
Very roughly, jet merging scale can be thought as the momentum of a jet.
If a jet in the event is hard enough above the threshold, events from @0
are rejected while only accepting from @1
.
On the other hand, if a jet in the event is not too hard below the threshold, events from @0
are only accepted while rejecting @1
.
Now let’s take a look at the gridpack we produced before we started the parton shower exercises.
ls $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO/drellyan-mll50-01j_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
cd $GENSHOWERPATH
mkdir jet_merging
cd jet_merging
cp $GENGRIDPACKPATH/bin/MadGraph5_aMCatNLO/drellyan-mll50-01j_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz ./
tar -xvf drellyan-mll50-01j_slc7_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz
Take a look at the cards in InputCards
directory.
Most notably, run_card.dat
had a different setting compared to the other gridpacks we’ve produced.
#*********************************************************************
# Matching - Warning! ickkw > 1 is still beta
#*********************************************************************
1 = ickkw ! 0 no matching, 1 MLM, 2 CKKW matching
This flag tells MadGraph that the LHE files we are going to produce will later be going through jet merging in order to avoid double countings.
#*********************************************************************
# Jet measure cuts *
#*********************************************************************
10.0 = xqcut ! minimum kt jet measure between partons
When jet merging is turned on, xqcut
needs to be set which presample the events for efficient jet merging.
Remember that some portion of events will be later discarded and never going to be used.
So there is no point of producing events that involve jets with too low energy scale at this LHE level since these will likely be removed.
Try producing 100 events using this gridpack as we did before with command ./runcmsgrid.sh 100 1 1
.
What is the cross section?
MadGraph reported
# original cross-section: 2928.1100000000024
, 2928pb which is significantly larger than previous values that were below 2000pb. How can this be explained?Solution
The cross section reported from MadGraph is before we run the parton shower. During parton shower, jet merging will be performed and thus some portion of events will be discared. This will be reflected into overall normalization and the cross section will be smaller than what we now see.
Before running the parton shower, let’s look at the pythia fragment that should be used for the parton shower with jet merging.
Compare $GENTUTPATH/CMSSW_12_4_14_patch2/src/Configuration/GenProduction/python/drellyan-mll50-01j.py
and the one we used earlier $GENTUTPATH/CMSSW_12_4_14_patch2/src/Configuration/GenProduction/python/drellyan-mll50.py
.
You will notice huge block of new lines are added to drellyan-mll50-01j.py
.
processParameters = cms.vstring(
'JetMatching:setMad = off',
'JetMatching:scheme = 1',
'JetMatching:merge = on',
'JetMatching:jetAlgorithm = 2',
'JetMatching:etaJetMax = 5.',
'JetMatching:coneRadius = 1.',
'JetMatching:slowJetPower = 1',
'JetMatching:doShowerKt = off',
'JetMatching:qCut = 19.',
'JetMatching:nQmatch = 4',
'JetMatching:nJetMax = 1',
'TimeShower:mMaxGamma = 4.0'
),
Most of the lines could be treated as template for jet merging samples using MadGraph at LO and Pythia8 (for further information, (link)[https://pythia.org/latest-manual/JetMatching.html] and (link)[http://hep.ucsb.edu/people/cag/Matching.pdf] would be useful).
Here JetMatching:qCut = 19.
, line defines the threshold to decide whether the event should be accepted or not.
Again, although not exact, one can think of this as the threshold for the momentum scale of a jet in the event.
If a jet momentum in the event is above 19GeV, event is only accepted from p p > e+ e- j
type of events.
If a jet momentum in the event is below 19GeV, event is only accepted from p p > e+ e-
type of events.
Now let’s give -n 1000
as an option to cmsDriver.py
.
This will first create an LHE file with 1000 events and this will be given as an input for Pythia8.
cmsDriver.py Configuration/GenProduction/python/drellyan-mll50-01j.py \
--python_filename run_drellyan-mll50-01j.py \
--eventcontent NANOAOD \
--datatier NANOAOD \
--fileout file:drellyan-mll50-01j.root \
--conditions auto:mc \
--step LHE,GEN,NANOGEN \
--no_exec \
--mc \
-n 1000
cmsRun run_drellyan-mll50-01j.py
Cross sections before and after jet merging will be reported as below.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Overall cross-section summary
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Process xsec_before [pb] passed nposw nnegw tried nposw nnegw xsec_match [pb] accepted [%] event_eff [%]
0 1.833e+03 +/- 1.197e+01 467 467 0 623 623 0 1.374e+03 +/- 3.305e+01 75.0 +/- 1.7 75.0 +/- 1.7
1 1.099e+03 +/- 1.713e+01 159 159 0 377 377 0 4.636e+02 +/- 2.887e+01 42.2 +/- 2.5 42.2 +/- 2.5
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 2.932e+03 +/- 2.090e+01 626 626 0 1000 1000 0 1.835e+03 +/- 4.673e+01 62.6 +/- 1.5 62.6 +/- 1.5
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Before matching: total cross section = 2.932e+03 +- 2.090e+01 pb
After matching: total cross section = 1.835e+03 +- 4.673e+01 pb
Matching efficiency = 0.6 +/- 0.0 [TO BE USED IN MCM]
Filter efficiency (taking into account weights)= (626) / (626) = 1.000e+00 +- 0.000e+00
Filter efficiency (event-level)= (626) / (626) = 1.000e+00 +- 0.000e+00 [TO BE USED IN MCM]
After filter: final cross section = 1.835e+03 +- 4.673e+01 pb
After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00
After filter: final equivalent lumi for 1M events (1/fb) = 5.449e-01 +- 1.388e-02
=============================================
First two lines, Process
denoted 0
and 1
are indicators for p p > e+ e-
and p p > e+ e- j
processes, respectively.
For 0
, 623 events were tried
and 467 passed, which means jet merging procedure accepted 467 events out of 623 events from 0
.
For 1
, jet merging procedure accepted 159 events out of 377 events.
Note that the sum of tried
is 1000 which was the given input with -n 1000
at the LHE level.
After jet merging has been done, events are discarded and the final cross section is reported After filter: final cross section = 1.835e+03 +- 4.673e+01 pb
1835pb which is dropped from the LHE given value Before matching: total cross section = 2.932e+03 +- 2.090e+01 pb
2932 pb.
Key Points
Pythia8 is the main tool used for parton showering in CMS
Events are not physical if it did not go through parton shower
Jet merging is a technique to avoid double countings of jet phase spaces in ME and PS calculations
3 - Optional (MadSpin and BSM UFO model)
Overview
Teaching: 10 min
Exercises: 20 minQuestions
What is MadSpin used for?
How do we customize the BSM parameters in the UFO model?
Objectives
Understand the role of MadSpin.
Customize BSM parameters for gridpacks.
Decay of resonant particles in MadGraph
Until now, we ran the physics process defined as
generate p p > e+ e-
What we can also do is below.
generate p p > z, z > e+ e-
The difference between p p > e+ e-
and p p > z, z > e+ e-
is that the former is the full DY physics process whereas the latter forces the two quarks to produce a Z boson initially and then lets it decay into electrons.
Splitting the two processes with ,
is the key which tells where the matrix element calculation should be split.
p p > e+ e-
calculates 2 -> 2
and p p > z, z > e+ e-
is 2 -> 1
and 1 -> 2
.
cd $GENMGPATH
./bin/mg5 standalone/onshellz.config
Compare events in the new LHE file standalone-onshellz/Events/run_01/unweighted_events.lhe.gz
with the old standalone-drellyan-mll4//Events/run_01/unweighted_events.lhe.gz
.
For every event in the new LHE file, a particle with 23
shows up as onshell Z boson is always produced in matrix element level calculations.
In contrast, the old LHE file which allows offshell Z (along with gamma), does not necessarily require the existence of such particles in the LHE file.
One event for example is below, u = 2
and ubar = -2
quarks produce z = 23
and decays into positron = -11
and electron = 11
.
<event>
5 1 +1.4201000e+03 9.06792500e+01 7.54677100e-03 1.30121500e-01
2 -1 0 0 501 0 +0.0000000000e+00 +0.0000000000e+00 +1.4078650070e+02 1.4078650070e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
-2 -1 0 0 0 501 -0.0000000000e+00 -0.0000000000e+00 -1.4601410033e+01 1.4601410033e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
23 2 1 2 0 0 +0.0000000000e+00 +0.0000000000e+00 +1.2618509067e+02 1.5538791074e+02 9.0679246224e+01 0.0000e+00 0.0000e+00
-11 1 3 3 0 0 -2.5416006610e+01 +3.5010752474e+01 +8.6334112633e+01 9.6567619754e+01 0.0000000000e+00 0.0000e+00 1.0000e+00
11 1 3 3 0 0 +2.5416006610e+01 -3.5010752474e+01 +3.9850978037e+01 5.8820290983e+01 0.0000000000e+00 0.0000e+00 -1.0000e+00
</event>
By making the histograms, situation becomes more clearly understandable.
cd $GENPLOTPATH
python3 lhe-root-plotter-onshellz.py
From the histograms, you can clearly see that there is no event generated outside the Z boson mass window.
Using MadSpin
Now we will learn some advanced use cases of MadGraph which is using the MadSpin plugin.
MadSpin, as we saw earlier, is one of the modules that runs through MadGraph interface which handles the decay of resonant particles.
We just took a look at a physics process p p > z, z > e+ e-
.
Here, z
is the resonant particle so when we chooses to use MadSpin, above process can be split into two where we first do
generate p p > z
and then decay z
into the electron pair using MadSpin.
But the question still remains, why is MadSpin in any case useful? The answer lies in NLO calculations in QCD or loop-induced processes. Let’s launch MadGraph prompt shell again.
cd ${GENTUTPATH}/standalone-tut/MG5_aMC_v3_5_2/
./bin/mg5_aMC
Now try making another simple example that is top pair production.
import model sm
generate p p > t t~ [QCD]
It would be not so difficult to realize [QCD]
has been added in the process definition.
This is a flag which tells MadGraph that you wish to do the calculations at NLO in QCD.
Before going further, try concatenating top decays into a W boson and a b quark similar to what we did for Z -> ee
example.
generate p p > t t~, t > w+ b [QCD]
generate p p > t t~ [QCD], t > w+ b
exit
You will find neither of these working and instead MadGraph complains with an error log saying str : Decay processes cannot be perturbed
.
So it means that physics processes with decays of particles are are not possible for NLO calculations.
This is where MadSpin becomes necessary, for such cases where resonant particle cannot be decayed can be decayed using MadSpin.
Now lets get back to the working example to see how it works.
import model sm
generate p p > t t~ [QCD]
output TopPair
launch
shower = PYTHIA8
4
0
Two lines are noticably added, shower = PYTHIA8
and 4
(which can be replaced with madspin = ON
).
We are again not going to do the parton shower here.
This is because depending on which parton shower generator one chooses later, “counter term” calculation differs which accounts as negatively weighted events.
Negative weighted events
We won’t cover what it is in the tutorial but important things to remember are that
- Some portion of the events are negatively weighted so one needs to be careful with the normalization.
- LHE files at NLO are even more unphysical than LHE files at LO before parton shower.
Press tab
to turn off timer.
MadGraph again asks if you would like to edit the cards now including madspin_card.dat
.
/------------------------------------------------------------\
| 1. param : param_card.dat |
| 2. run : run_card.dat |
| 3. madspin : madspin_card.dat |
\------------------------------------------------------------/
If you take a look at the run_card.dat
, you might notice that the template for it is quite different from when we did DY at LO.
Template for NLO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/NLO_run_card.dat] and for LO is shown in (link)[https://github.com/cms-PdmV/GridpackFiles/blob/master/Campaigns/Run3Summer22/MadGraph5_aMCatNLO/Templates/LO_run_card.dat].
Although MadGraph shares the same user interface, LO and NLO calculations run on totally different codes in the backend.
So NLO type run_card.dat
does not work for LO calculations and vice versa.
Now take a look at madspin_card.dat
by pressing 3
.
# specify the decay for the final state particles
decay t > w+ b, w+ > all all
decay t~ > w- b~, w- > all all
decay w+ > all all
decay w- > all all
decay z > all all
This card lets you define how you want your resonant particles to decay. For example, if you do :
decay t > w+ b, w+ > e+ ve
decay t~ > w- b~, w- > mu- vm~
This forces top to decay into positron and antitop to decay into muon.
Remove unnecessary decay definitions and add these two lines to make a top pair sample that ends up giving you positron and a muon.
Before moving on, do set run_card nevents 50
to save time, producing only 50 events.
You will see inclusive top pair production cross section being computed which includes all possible decays for the top quark.
--------------------------------------------------------------
Summary:
Process p p > t t~ [QCD]
Run at p-p collider (6500.0 + 6500.0 GeV)
Number of events generated: 50
Total cross section: 6.847e+02 +- 4.3e+00 pb
--------------------------------------------------------------
And then you will see MadSpin doing its job, decaying the top quarks to desired channels.
************************************************************
* *
* W E L C O M E to M A D S P I N *
* *
************************************************************
...
INFO: decay channels for t : ( width = 1.4915 GeV )
INFO: BR d1 d2
INFO: 1.000000e+00 b w+
INFO:
INFO:
INFO: decay channels for w+ : ( width = 2.04793 GeV )
INFO: BR d1 d2
INFO: 3.333610e-01 d~ u
INFO: 3.333610e-01 s~ c
INFO: 1.111195e-01 e+ ve
INFO: 1.111195e-01 mu+ vm
INFO: 1.110390e-01 ta+ vt
INFO:
INFO:
INFO: decay channels for t~ : ( width = 1.4915 GeV )
INFO: BR d1 d2
INFO: 1.000000e+00 b~ w-
INFO:
INFO:
INFO: decay channels for w- : ( width = 2.04793 GeV )
INFO: BR d1 d2
INFO: 3.333610e-01 d u~
INFO: 3.333610e-01 s c~
INFO: 1.111195e-01 e- ve~
INFO: 1.111195e-01 mu- vm~
INFO: 1.110390e-01 ta- vt~
...
INFO: Estimating the maximum weight
INFO: *****************************
INFO: Probing the first 75 events
INFO: with 400 phase space points
INFO:
INFO: Event 1/75 : 0.068s
INFO: Event 6/75 : 0.63s
INFO: Event 11/75 : 1.2s
INFO: Event 16/75 : 1.8s
INFO: Event 21/75 : 2.1s
INFO: Event 26/75 : 3s
INFO: Event 31/75 : 3.8s
INFO: Event 36/75 : 4.6s
INFO: Event 41/75 : 5.7s
INFO: Event 46/75 : 6.5s
What is the cross section?
Inclusive cross section was reported to be 684.7pb as we saw above. When considering the decay channels (
e+
andmu-
final states), what is the proper cross section? What are the branching ratios forw+ > e+ ve
andw- > mu- vm~
?Solution
8.5pb (from 684.7 x 11% x 11%)
How can we make a sample that yields
mu+
,vm
, and this time, two quark jets (hadronically decayingw-
)Solution
decay t > w+ b, w+ > mu+ vm decay t~ > w- b, w- > j j
Interfacing BSM UFO model files
Let’s take a look at how BSM samples for search type of analyses gets produced. We will pick one simple example, a hypothetical heavy gauge boson that is called W’ particle.
import model WEff_UFO
display particles
generate p p > wp+, wp+ > e+ ve
add process p p > wp-, wp- > e- ve~
output WprimeToENu
How can we make the syntax simpler using particle containers?
How can we write
generate p p > wp+, wp+ > e+ ve
andadd process p p > wp-, wp- > e- ve~
in a simpler way?Solution
define wprime = wp+ wp- define leptons = e+ e- ve ve~ generate p p > wprime, wprime > leptons leptons
This will find all possible Feynman diagrams with given particle combinations.
As we are missing right-handed interactions for W bosons in the SM, a lot of BSM scenarios predict the W’ boson that is heavier in mass (thus, we couldn’t find it yet) but possesses the ability to interact with right-handed couplings. As we do not know how large the particle’s mass is, we test many different scenarios (BSM parameters), for example, different masses, decay channels, coupling strengths. We will now see how such BSM parameters can be set in MadGraph.
launch
0
And press tab
to turn off the timer.
Take a look at the parameter card by hitting 1
.
Now you will see there is a clear difference in the parameter settings when compared to the sm
model file we’ve been using.
Here, we will only be focusing on the mass of W’ MWp
and the right-handed coupling strength kR
.
In addition, you will also need to keep in mind that widths of the W’ wwp
should be changing based on how you choose your BSM parameters.
###################################
## INFORMATION FOR MASS
###################################
Block mass
1 5.040000e-03 # MD
2 2.550000e-03 # MU
3 1.010000e-01 # MS
4 1.270000e+00 # MC
5 4.700000e+00 # MB
6 1.720000e+02 # MT
11 5.110000e-04 # Me
13 1.056600e-01 # MMU
15 1.777000e+00 # MTA
23 9.118760e+01 # MZ
25 1.250000e+02 # MH
34 1.000000e+03 # MWp
...
###################################
## INFORMATION FOR WPCOUP
###################################
Block wpcoup
1 0.000000e+00 # kL
2 1.000000e+00 # kR
...
###################################
## INFORMATION FOR DECAY
###################################
DECAY 6 1.508336e+00 # WT
DECAY 23 2.495200e+00 # WZ
DECAY 24 2.085000e+00 # WW
DECAY 25 4.070000e-03 # WH
DECAY 34 1.000000e+01 # WWp
You can see that the mass of W’ is now set to 1000GeV, right-handed coupling strength is set to 1.0, and the width of W’ is given with 10GeV. You can change the BSM parameters, maybe mass to 2000GeV and coupling strength to 0.1 by doing below.
set param_card mwp 2000
set param_card kr 0.1
However, if you again take a look at the parameter card, the width of W’ wwp
is kept same.
You can interactively see how the width value gets computed by doing compute_widths wp+
.
Check the parameter card again, and you would see that width has changed and also tells you the branching ratios to different channels.
# PDG Width
DECAY 34 6.672601e-01
# BR NDA ID1 ID2 ...
2.506959e-01 2 2 -1 # 0.1672793598579319
2.479126e-01 2 6 -5 # 0.16542221070326676
2.379169e-01 2 4 -3 # 0.15875247762227632
8.356529e-02 2 12 -11 # 0.05575978661997229
8.356529e-02 2 14 -13 # 0.05575978638653866
8.356519e-02 2 16 -15 # 0.05575972059211703
1.277883e-02 2 2 -3 # 0.008526805639865994
Instead of doing interactive width computation, you can do set param_card wwp auto
.
Then instead of first computing the widths, MadGraph will calculate the widths on-the-fly while generating events (but results will be identical).
Proceed by hitting 0
and see how much cross section it gives you when hypothetically the W’ boson exists and decays to the electron channel, assuming mass 2000GeV with right handed coupling 0.1.
=== Results Summary for run: run_01 tag: tag_1 ===
Cross-section : 0.001016 +- 1.447e-06 pb
Nb of events : 10000
How can we check the cross section when mass is 2000GeV with right handed coupling 1.0?
Solution
Repeat the exercise above but this time
set param_card mwp 2000 set param_card kr 1.0
And most importantly, do not forget to compute the width by adding :
set param_card wwp auto
Then you will get the following result.
=== Results Summary for run: run_01 tag: tag_1 === Cross-section : 0.1045 +- 0.0001681 pb Nb of events : 10000
How much did the cross section increase compared to the scenario when mass is 2000GeV with right handed coupling 0.1?
How many interactions did the W’ boson get involved in?
Solution
One vertex when producing it, another vertex when it decays to electron channel. Thus two interactions (1./0.1) = 10 gets squared and thus result in 100 times larger cross section.
Key Points
4 - CMS resources for samples and generators
Overview
Teaching: 5 min
Exercises: 10 minQuestions
How do I find centrally produced samples and their status?
How do I obtain a cross section to normalize my sample?
Objectives
Leverage available tools for efficient analysis work
CMS resources for simulated samples
How to find samples and related information
Get configurations for a certain sample from McM. E.g. you want the inclusive W+jets sample, start from a DAS query (requires a valid grid certificate / proxy):
dasgoclient -query="/WJetsToLNu*/RunIISummer20UL18*/MINIAODSIM"
Alternatively there’s also a web-based DAS client: https://cmsweb.cern.ch/das/.
/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v4/MINIAODSIM
/WJetsToLNu_012JetsNLO_34JetsLO_EWNLOcorr_13TeV-sherpa/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM
/WJetsToLNu_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_1J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_2J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-100To200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-1200To2500_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-200To400_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-2500ToInf_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-400To600_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-600To800_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-70To100_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1_ext1-v1/MINIAODSIM
/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-250To400_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-400To600_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Pt-600ToInf_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRK_TRK_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18MiniAODv2-PUForTRKv2_TRKv2_106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-100to200_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v1/MINIAODSIM
/WJetsToLNu_Wpt-200toInf_BPSFilter_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v2/MINIAODSIM
We want the inclusive LO sample with the latest MiniAOD version (MiniAODv2), hence we pick /WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
.
Plug this name into ‘‘Output Dataset’’ in McM, then click on the dataset name (WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8
).
In ‘‘Select View’’ check ‘‘Fragment’’ and click on the expand icon under ‘‘Fragment’’ (rightmost column) for the request with a Summer20UL18wmLHEGS PrepId.
You can also filter the results directly by appending ?dataset_name=WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8&prepid=*Summer20UL18wmLHE*
to the requests address, https://cms-pdmv.cern.ch/mcm/requests
.
Status of samples
GrASP is tool to conveniently track the status of your samples. Just select the campaigns you’re interested in (e.g. Run2 UL or Run3) and type the sample name. You can also tag samples of your analysis so that they are easier to find and keep track of.
Cross sections
CMSSW analyzer
In the following, we will use a CMSSW analyzer called GenXSecAnalyzer to compute the cross section of samples. The analyzer takes a list of EDM files as input (i.e., no NanoAOD or NanoGEN). Make sure you are in a CMSSW environment
cd ${CDGPATH}/CMSSW_10_6_19/src/
cmsenv
You can then use the prepared configurations to obtain the cross section for a sample of your liking, e.g. /TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM
dasgoclient -query="file dataset=/TTWJetsToLNu_TuneCP5_13TeV-amcatnloFXFX-madspin-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1-v1/MINIAODSIM | grep file.name" > myfiles.txt
cmsRun $CDGPATH/gen-cmsdas-2023/configs/xsec_ana.py inputFiles="myfiles.txt" maxEvents=100000
In this example we restrict the maximum number of events to 100k.
This will give us a large enough sample for a reliable result, without running too long (the sample has 10.5M events, you can use DAS to verify this number with dasgoclient -query="summary dataset=DATASET
).
The inputFiles
option takes a range of options:
- A single file in your local area:
inputFiles="file:mylocalfile.root"
- A single published file:
inputFiles="/store/mc/..."
- Multiple files:
inputFiles="/store/mc/file1,/store/mc/file2"
- A text file containing one filepath per line:
inputFiles="myfilelist.txt"
Questions:
- What is the total cross section for your chosen sample? What is the relative uncertainty in this cross section that you obtained?
- Are there different processes listed in the summary? What could those different processes be?
- Does your sample have negative weights? If yes, what is the fraction of events with negative weights?
- The printout also mentions the equivalent luminosity. Do you understand what is meant by that?
xsec DB
A central database is kept with approved x-secs for centrally produced samples, XSDB.
The CMS Generator’s group Cross Section Database Tool (XSDB) is a tool for storing and looking up information related to a specific MC sample witihin CMS. This tool is designed to complement DAS and MCM, with a direct link from DAS being available to a specific sample. Anyone with a CERN login can view the XSDB and perform queries for sample information. However, further action is restricted by e-group permissions. There exist a user’s, approver’s, and admin e-groups. The XSB users are CMS individuals that have permission to insert and modify documents for XSDB. Approvers have the same user privileges as users, but are primarily tasked with approving records submitted by users. The admins have the responsibility of maintaining and improving the tool for future use.
There is a large amount of information that can be stored in the database for each sample. This information includes: cross section value, cross section uncertainty, hadronization tool, matrix element generator, sample contact, cuts used, DASprimary dataset name, and MCM prename, among other metadata. This information can then be used to help with analysis. In this exercise, we will simply try some searching through XSDB for a sample, looking at some information stored there and getting familiar with the XSDB.
We would like to search for a sample within XSDB. We’ll look for an EXO sample used in the Contact Interaction qqbar to dimuon channel in the search for compositeness.
The sample can be found in DAS with the dataset name: /CITo2Mu_M1300_CUETP8M1_Lam10TeVDesRR_13TeV-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
- Query XSDB using:
DAS=CITo2Mu_M1300_CUETP8M1_Lam10TeVDesRR_13TeV-pythia8
in the query field and hitting either enter button on keyboard or clicking “Search” - Take a minute to explore the items stored for the sample
- You can also choose which metadata are displayed by checking, or unchecking the appropriate boxes in between the search bar and the displayed results
- If we would like to see all of the Contact Interaction samples available we can search:
process_name=cito2mu*
- Take some time to look through the samples and pagination at the bottom of the results page
- Repeat this exercise for
process_name=ttbar
. This will show a typical search for SM background samples.
It is possible to search for a substring of the item that one would like to look for.
It is important to note that wildcards are supported, however as long as the string is contiguous, it will be accepted by the XSDB query.
XSDB also supports boolean queries.
If we want to query the database for our original sample we could use the following: process_name=cito2mu && total_uncertainty=21.42
You can also query for your favorite MC sample.
The XSDB twiki can be found here: XSDB twiki.
Key Points
DAS can be used to find samples and their files, number of events for a certain sample etc
McM is used for sample generation management, and can be used by the user to obtain additional information about their samples, e.g. the root gridpack, fragments etc.
McM is also a good place to look for example cmsDriver commands
Different sources for x-secs exist within CMS: a CMSSW analyzer and a database