An analysis demo: search for Higgs boson
In this demo we will search for the Higgs boson in the data from the
Large Hadron Collider (CMS experiment) in the decay H→WW.
The Git repository of the code we will run is:
https://github.com/kalanand/HiggsSearchDemo
There are three basic steps to the analysis.
Step 1: Run over the entire data, filter interesting events, store their contents in Ntuple format
The goal of this step is to identify events with
two W bosons and store all necessary information needed for physics analysis.
The event should contain an electron or muon, large missing energy due to an
escaped neutrino, and at least two high energy particle jets.
For the sake of saving time we will process only muon data in this demonstration.
The above requirements reduce data size from ~50 petabytes to ~100 GB, i.e., by a
factor of 105.
We run this step as a large number of parallel batch jobs on the LHC Computing
Grid or on Condor batch machines at Fermilab. Copy
submitmu.txt file, which contains
the dataset names and configuration details, in a work area at
Fermilab and submit batch jobs
condor_submit submitmu.txt
If everything goes well, you will see on the screen something like the following
cmslpc25>condor_submit submitmu.txt
Submitting job(s)............................................
44 job(s) submitted to cluster 246668.
Job status can be monitored by running the command
condor_q -sub < username>
Here is a snapshot of my running jobs
cmslpc25>condor_q -sub kalanand
-- Submitter: kalanand@fnal.gov : <131.225.190.171:38138> : cmslpc25.fnal.gov
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
246669.0 kalanand 4/19 14:33 0+00:01:13 R 0 0.0 batchit.sh /uscms/
246668.29 kalanand 4/19 14:18 0+00:10:38 R 0 0.0 batchit.sh /uscms/
246668.32 kalanand 4/19 14:18 0+00:10:37 R 0 0.0 batchit.sh /uscms/
246668.33 kalanand 4/19 14:18 0+00:10:37 R 0 0.0 batchit.sh /uscms/
246668.35 kalanand 4/19 14:18 0+00:10:36 R 0 0.0 batchit.sh /uscms/
246668.36 kalanand 4/19 14:18 0+00:10:36 R 0 0.0 batchit.sh /uscms/
246668.37 kalanand 4/19 14:18 0+00:10:36 R 0 0.0 batchit.sh /uscms/
246668.38 kalanand 4/19 14:18 0+00:10:36 R 0 0.0 batchit.sh /uscms/
246668.39 kalanand 4/19 14:18 0+00:10:36 R 0 0.0 batchit.sh /uscms/
246668.40 kalanand 4/19 14:18 0+00:10:35 R 0 0.0 batchit.sh /uscms/
246668.41 kalanand 4/19 14:18 0+00:10:35 R 0 0.0 batchit.sh /uscms/
246668.42 kalanand 4/19 14:18 0+00:10:35 R 0 0.0 batchit.sh /uscms/
246668.43 kalanand 4/19 14:18 0+00:10:35 R 0 0.0 batchit.sh /uscms/
....
The jobs take several hours to complete.
For the purpose of this demo we will simply submit the jobs and move to the next step
(I already have the output stored).
Step 2: Select signal-like events (i.e., "map reduce")
In this step we perform a further 103 reduction in data
size and select signal-like events.
We apply tight quality criteria specified in skim.py and
drop information not needed for further analysis.
python skim.py
Step 3: Perform Higgs search (i.e., "data analytics")
Now we analyze the remaining events for any hints of Higgs signal.
We plot the invariant mass of the two W bosons. If Higgs
boson decays to WW, then we would expect a mass peak in this distribution
corresponding to the Higgs boson mass.
Run the analyzer script:
python analyze.py
This should produce two figures. In the left figure
we plot the distribution in data and compare it to the expected
background from simulation. The figure also shows the magnitude of
Higgs signal for a few hypothetical Higgs mass values.
In the figure on the right, we plot data after background subtraction.
We find that the data is consistent with background-only hypothesis.
Given the expected magnitude of Higgs signal, we can
exclude Higgs boson mass in the range 200−600 GeV with 95% confidence. This is what we actually did
with the early LHC data before eventually discovering the Higgs boson at mass 125 GeV on July 4, 2012.