An analysis demo: search for Higgs boson

In this demo we will search for the Higgs boson in the data from the Large Hadron Collider (CMS experiment) in the decay H→WW. The Git repository of the code we will run is:

There are three basic steps to the analysis.

Step 1: Run over the entire data, filter interesting events, store their contents in Ntuple format

The goal of this step is to identify events with two W bosons and store all necessary information needed for physics analysis. The event should contain an electron or muon, large missing energy due to an escaped neutrino, and at least two high energy particle jets. For the sake of saving time we will process only muon data in this demonstration.

The above requirements reduce data size from ~50 petabytes to ~100 GB, i.e., by a factor of 105. We run this step as a large number of parallel batch jobs on the LHC Computing Grid or on Condor batch machines at Fermilab. Copy submitmu.txt file, which contains the dataset names and configuration details, in a work area at Fermilab and submit batch jobs
	  condor_submit submitmu.txt
If everything goes well, you will see on the screen something like the following
	  cmslpc25>condor_submit submitmu.txt
	  Submitting job(s)............................................
	  44 job(s) submitted to cluster 246668.
Job status can be monitored by running the command
	  condor_q -sub < username>
Here is a snapshot of my running jobs
	  cmslpc25>condor_q -sub kalanand

	  -- Submitter: : <> :
		ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
		246669.0   kalanand        4/19 14:33   0+00:01:13 R  0   0.0 /uscms/
		246668.29  kalanand        4/19 14:18   0+00:10:38 R  0   0.0 /uscms/
		246668.32  kalanand        4/19 14:18   0+00:10:37 R  0   0.0 /uscms/
		246668.33  kalanand        4/19 14:18   0+00:10:37 R  0   0.0 /uscms/
		246668.35  kalanand        4/19 14:18   0+00:10:36 R  0   0.0 /uscms/
		246668.36  kalanand        4/19 14:18   0+00:10:36 R  0   0.0 /uscms/
		246668.37  kalanand        4/19 14:18   0+00:10:36 R  0   0.0 /uscms/
		246668.38  kalanand        4/19 14:18   0+00:10:36 R  0   0.0 /uscms/
		246668.39  kalanand        4/19 14:18   0+00:10:36 R  0   0.0 /uscms/
		246668.40  kalanand        4/19 14:18   0+00:10:35 R  0   0.0 /uscms/
		246668.41  kalanand        4/19 14:18   0+00:10:35 R  0   0.0 /uscms/
		246668.42  kalanand        4/19 14:18   0+00:10:35 R  0   0.0 /uscms/
		246668.43  kalanand        4/19 14:18   0+00:10:35 R  0   0.0 /uscms/
The jobs take several hours to complete. For the purpose of this demo we will simply submit the jobs and move to the next step (I already have the output stored).

Step 2: Select signal-like events (i.e., "map reduce")

In this step we perform a further 103 reduction in data size and select signal-like events. We apply tight quality criteria specified in and drop information not needed for further analysis.

Step 3: Perform Higgs search (i.e., "data analytics")

Now we analyze the remaining events for any hints of Higgs signal. We plot the invariant mass of the two W bosons. If Higgs boson decays to WW, then we would expect a mass peak in this distribution corresponding to the Higgs boson mass.

Run the analyzer script:
This should produce two figures. In the left figure we plot the distribution in data and compare it to the expected background from simulation. The figure also shows the magnitude of Higgs signal for a few hypothetical Higgs mass values. In the figure on the right, we plot data after background subtraction. We find that the data is consistent with background-only hypothesis. Given the expected magnitude of Higgs signal, we can exclude Higgs boson mass in the range 200−600 GeV with 95% confidence. This is what we actually did with the early LHC data before eventually discovering the Higgs boson at mass 125 GeV on July 4, 2012.

Higgs massHiggs mass background subtracted

Last modified: Sat April 19 14:11:44 CST 2014