Kalanand's December 2008 Log

   November 2008   
December 2008
   January 2009   

December 1st

Publishing a privately produced data to CMS database

There are six steps to the process.

STEP 1: Collect information from the files you want to publish
Need to know the name, file-size, and number of entries in each of the ROOT files. The first thing you need to do is to make text files (like ~kalanand/files_PublishToDB/files_*.txt) which contain the file names to be published in DBS. Then run "entries.csh" which will give output text files like "info_list_1.txt".

STEP 2: Setup DBS area (work in bash shell)
export CVSROOT=:pserver:anonymous@cmscvs.cern.ch:/cvs_server/repositories/CMSSW
export CVS_RSH=ssh
cvs -d `echo $CVSROOT | awk -F@ '{print $1":98passwd\@"$2}'` login
cvs co -r DBS_2_0_2 DBS/Clients/Python
cd DBS/Clients/Python
source setup.sh
STEP 3: Setup env/config variables
emacs dbs.config &

==> Change DBS VERSION and host URL
==> The following combinations seems to work fine:


[rather old]

STEP 3: Setup env/config variables
cd UserExamples

==> Need to modify "dbsInsertEverything.py".
    It will better to copy directly from my (working) sample file:
Make sure to do the following changes:\\ 1. The field "primary = DbsPrimaryDataset (Name = ...., Type = ....)" defines the primary name of the dataset. The conventional naming scheme is: primary_name/process_name/type e.g., /Madgraph_ttbarjets_FNAL/CMSSW_1_6_12_Fastsim/RECO

2. In field "contents=open( ).read()", set the absolute path name for an example configuration script.

3. Set the correct CMSSW version in "algo = DbsAlgorithm ( ..., ApplicationVersion= "CMSSW_1_6_12", ...)".

4. Set the correct process name in field "proc = DbsProcessedDataset (..., Name="CMSSW_1_6_12_Fastsim", ...)".

5. Append the correct summary file names:

6. Set the complete dataset name in the field "block = DbsFileBlock ( Name= ..., ...)".
Also, this name should match the primary and process names of steps 1,4. Otherwise you will get an error message.

7. Complete the "LogicalFileName".

***** Important: In the first try, add word "test" before the dataset names in steps 1,6. This is because once inserted, the names cannot be removed if something goes wrong.

STEP 4: Publish in the local DB at Fermilab
python dbsInsertEverything.py

==> check if the dataset got into the DBS or not:

    dbs search --query="find dataset where dataset = /test1_Madgraph_ttbarjets_FNAL/CMSSW_1_6_12_Fastsim/RECO"
The usual options are: find dataset, find block, find files
The old way of querying:
   dbs lsd --path=/test1_Madgraph_ttbarjets_FNAL/CMSSW_1_6_12_Fastsim/RECO
   dbs lsb --path=/test1_Madgraph_ttbarjets_FNAL/CMSSW_1_6_12_Fastsim/RECO
   dbs lsf --path=/test1_Madgraph_ttbarjets_FNAL/CMSSW_1_6_12_Fastsim/RECO
STEP 5: Test: transfer dataset from one local server to another
python dbsMigrateWithParents.py   
For example:
python dbsMigrateWithParents.py http://cmssrv48.fnal.gov:8383/DBS/servlet/DBSServlet http://cmssrv46.fnal.gov:8080/DBS111LFNOPT/servlet/DBSServlet /test1_Madgraph_ttbarjets_FNAL/CMSSW_1_6_12_Fastsim/RECO
STEP 6: Get proxy and FINALLY publish in global DBS
voms-proxy-init -voms cms

python dbsMigrateWithParents.py < host_server> < destination> < dataset>
For example:
python dbsMigrateWithParents.py http://cmssrv46.fnal.gov:8080/DBS111LFNOPT/servlet/DBSServlet https://cmsdbsprod.cern.ch:8443/cms_dbs_prod_global_writer/servlet/DBSServlet /Madgraph_ttbarjets_FNAL/CMSSW_1_6_12_Fastsim/RECO

December 5th

Copy multiple files from dCache to local area

Here is the shell script to do this: cpFromDcache.csh

# copy data files from FNAL dCache resilient to local area

cd /pnfs/cms/WAX/resilient/kalanand/ZeeJet_OctX_7TeV/

foreach datafile ( `ls *root` )
echo "submitting: dccp $datafile /uscms/home/kalanand/cms/jet/CMSSW_3_1_4/src/JetMETCorrections/ZJet/test/$datafile"

dccp $datafile /uscms/home/kalanand/cms/jet/CMSSW_3_1_4/src/JetMETCorrections/ZJet/test/$datafile
echo ""

December 22nd

Statistics refresher

Bay's theorem:

P(H|D) = P(D|H) * P(H) / P(D), where

P(H)       = Prior           = The probability of the hypothesis being true before collecting data
P(D)       = Marginal    = The probability of collecting this data under all possible hypotheses
P(D|H)   = Likelihood = The probability of collecting this data when our hypothesis is true
P(H|D)   = Posterior    = The updated probability of our hypothesis being true given the data

Remember: The mean (u) and standard deviation (s) for a binomial distribution are np and √(npq), respectively.

Probability throwing dice - theory: Throwing dice is more complicated than tossing coins, as there are more than 2 values. If you throw a single dice, then it can fall six ways, each of which is equally likely if the dice is true. So the probability of getting one particular value is 1/6. If you want either of two values it is 2/6 or 1/3, and so on.

When rolling two dice, distinguish between them in some way: a first one and second one, a left and a right, a red and a green, etc. Let (a,b) denote a possible outcome of rolling the two die, with a the number on the top of the first die and b the number on the top of the second die. Note that each of a and b can be any of the integers from 1 through 6. Here is a listing of all the joint possibilities for (a,b):

(1,1) 	(1,2) 	(1,3) 	(1,4) 	(1,5) 	(1,6)
(2,1) 	(2,2) 	(2,3) 	(2,4) 	(2,5) 	(2,6)
(3,1) 	(3,2) 	(3,3) 	(3,4) 	(3,5) 	(3,6)
(4,1) 	(4,2) 	(4,3) 	(4,4) 	(4,5) 	(4,6)
(5,1) 	(5,2) 	(5,3) 	(5,4) 	(5,5) 	(5,6)
(6,1) 	(6,2) 	(6,3) 	(6,4) 	(6,5) 	(6,6) 

Some sample questions

Q: Two fair six-sided dice are rolled. What is the probability that the sum of the two dice is seven?
A: Outcomes: (1, 6), (2, 5), (3, 4), (4, 3), (5, 2) and (6, 1). Probability = 6/36 = 1/6. Table below shows probabilities for all possible combinations.

Q: Two fair six-sided dice are rolled. What is the probability that the numbers on the dice are different?
A: 1- 6/36 = 5/6

Probability of two dice totals:

Total on dice   Pairs of dice                   Probability
2               1+1                             1/36 = 3%
3               1+2, 2+1                        2/36 = 6%
4               1+3, 2+2, 3+1                   3/36 = 8%
5               1+4, 2+3, 3+2, 4+1              4/36 = 11%
6               1+5, 2+4, 3+3, 4+2, 5+1         5/36 = 14%
7               1+6, 2+5, 3+4, 4+3, 5+2, 6+1    6/36 = 17%
8               2+6, 3+5, 4+4, 5+3, 6+2         5/36 = 14%
9               3+6, 4+5, 5+4, 6+3              4/36 = 11%
10              4+6, 5+5, 6+4                   3/36 = 8%
11              5+6, 6+5                        2/36 = 6%
12              6+6                             1/36 = 3%
An easy way to remember the numerator in this chart:
If the sum is 7 or less, then subtract one from the sum. If the sum is greater than 7, then subtract that value from 13.

Q: A pair of dice are rolled, find the probability that the sum is greater than 9.
A: 3/36 + 2/36 + 1/36 = 1/6

Q: A pair of dice are rolled. Find the probability that the sum is a multiple of 4.
A: 3/36 + 5/36 + 1/36 = 1/4

There is an interesting phenomenon that occurs when dealing with problems like above. If you are looking for the probability that the sum is divisible by a certain number less than 7, then you get:

If the sum is divisible by:
        2 -> the probability is 1/2.
        3 -> the probability is 1/3.
        4 -> the probability is 1/4.
       *5 -> the probability is 7/36.
        6 -> the probability is 1/6.
Note: 5 is the only abnormal one, making this easy to memorize.

Other useful observations:
Probability that either dice is a particular number: 11/36
Neither dice is a particular number: 25/36
Both dice are the same number: 6/36
Both dice are the same particular number (say, 6): 1/36
At least one dice isn't a particular number: 35/36
The dice are either of two particular numbers: 1 - 4/6 * 4/6 = 20/36
The dice are neither of two particular numbers: 4/6 * 4/6 = 16/36
The dice are two particular numbers (i.e., one is 2 and the other is 3): 2/36

More than two dice:
As the number of dice increases, things get more complicated. The easiest is the probability of a number of dice being a particular number. For n dice, this is 1/(6^n).

Q: Conditional probability related to two dice throws. A pair if dice is thrown. If it is known that one dice shows a 4, what probability that a) the other dice shows a 5 b) the total of both the dice is greater than 7 ?
A: (a) 2/11 (b) Allowed combination: 44, 45, 46, 54, 64. Hence 5 /11.

Q: A pair if dice is thrown. One of them shows 6. What is the probability of other being 6 ?
A: 1/11

More nerdy questions from mathforum.org:
A fair die is thrown n times. The probability that there is an even number of sixes is 1/2 * [1+(2/3)^n]
Note: 0 is considered an even number.

Q: Tossing a Coin and Rolling a Die: If you toss a coin and roll a dice, what is the probability of obtaining: a) heads and a five b) heads or a five c) tails or a two?
A: (a) 1/2 * 1/6 = 1/12 (b) 1 - (1-1/2)(1-1/6) = 7/12 (c) 7/12

Q: The house rolls 10 dice, one die at a time. What are the odds of the following: a.) A player rolling the same numbers in the same order as the house.
b.) A player rolling the same numbers as the house without regard to the order in which they were rolled.
c.) A player rolling 9 of 10 numbers the same as the house without regard to the order in which they were rolled.

A: (a) 1/6^10
(b) This is equivalent to having 6 cells (representing the possible number on the face of any dice) and 10 balls to be distributed into these 6 cells. The number of balls in any cell will represent the number of times that the cell number occurred when the 10 dice are rolled. It is possible that some cells (from 0 to 5) could be empty. So, the total number of arrangements is: C(15, 10) = 15! / (10! * 5! ) = 3003. Therefore, the desired probability = 1/3003.
(c) [N(0 empty cells) + N(1 empty cells) + … + N(5 empty cells)] / 3003 = (30 + 25 + .. + 5) / 3003 = 105 / 3003 = 5 / 143

Probability questions:

Q: Mr. Smith has two children. You already know that one of them is a girl; what's the probability that Mr. Smith has one boy and one girl?
Ans: 2/3 (possible outcomes: BG, GB, GG)

Q: A bakery makes a batch of 200 cookies in which 2000 chocolate chips were used. What is the probability that a cookie picked at random from the batch will contain at least 13 chocolate chips?
A: Probability of n chips = e^(-x) * (x^n) / n! , where x = average number of chips = 10. So, the probability = e^(-10) * [10^13/ 13! + 10^14/ 14! + …]

Q: Suppose that you toss a balanced coin 100 times. What is the approximate probability that you observe less than or equal to 40 heads?
A: Sum_{k=0 to 40} C(n,k) * p^k * q^(n-k) = (1/2)^100 * Sum C(100, k) = 0.03 approx.

Q: Five cards are drawn without replacement from a standard deck of playing cards. Next, five cards are drawn without replacement from a second standard deck of playing cards. And finally, five cards are drawn without replacement from a third standard deck of playing cards. What is the probability that the Ace of Spades will appear once? Twice? Three times?
A: The chance that the Ace of Spades is chosen when five cards are dawn from one pack is: (1C1 * 51C4) / (52C5) = 0.09615
Chance that it is not selected is 1 - 0.09615 = 0.903846

So we have three trials, in each of which there is a probability of success of 0.096, and a chance of failure of 0.904 This is a binomial probability problem, and we have:

Probability of one success = 3C1 * 0.096 * 0.904^2 = 0.24
Probability of two successes = 3C2 * 0.096^2 * 0.904 = 0.025
Probability of three successes = 3C3 * 0.096^3 = 0.001

Q: Suppose used car salesmen tell the truth 2/5 of the time, and 1/3 of the trees in a forest are oak. If 4 used car salesmen say that a certain tree in the forest is oak, what is the probability that the tree is indeed oak?
A: Oak Prob = 1/3: (1/3).C(4,4)(2/5)^4 = (1/3)*16/625
Not oak Prob = 2/3: (2/3).C(4,0)(3/5)^4 = (2/3)*81/625

Therefore, Oak / total = 16/ (16 + 2*81) = 0.09

Q: Bob and Kyle each have $18. They flip a fair coin repeatedly. If the coin comes up tails, Bob pays Kyle $1. If the coin comes up heads, Kyle pays Bob $1. The game is finished when either one of them has no money left. What is the expected number of coin flips this game will last?
A: Let's assume that the players start with nothing, and the game lasts until one of the players has won either -S or +S dollars (in this case, S = 18). Let's say x[n] represents the expected duration of the game when the player has won n dollars. We are trying to compute x[0].

x[n] = S^2 - n^2
So, x[0] = 18^2 = 324 coin flips.

Q: Sue makes 70% of the free throws she attempts. She shoots three free throws in her warmup before a game. What is the probability that Sue makes two or more of the three free throws?
A: 3C2 * (0.70)^2 * (0.30)^1 + 3C3 * (0.70)^3 = 0.441 + 0.343 = 0.784

Q: There are seven different car keys in a box, including mine. I'm going to randomly remove one key at a time, and then try to start my car. If the key doesn't start the car, I'll discard the key. We know that the probability is one in seven that I'll retrieve my own car key on the first try. But what is the probability that I'll get my key on the second try?
A: It's still 1/7. The probability that the second key is your key contains two elements:

The first key is not your key AND the second key is. => (1-1/7) * (1/6) = 1/7

Q: Law of Large Numbers and the Gambler's Fallacy. Should you get the same total, on average, when you make three throws of three dice each as when you throw nine dice at once?
A: The sum, the average, and the probabilities of getting a particular value on one die or more aren't affected by whether the dice are rolled one at a time, in groups of 3, or all 9 at once. These are called "independent events," and the order in which they happen doesn't affect the outcome.

Q: If a car has 2 headlights with an average lifespan of 2500 hours,whose probability of failure can be modeled by: f(t) = 1/u e ^(-t/u) for the mean value u = 2500, what is the probability that both lamps will fail within 2500 hours.
A: The probability that either will fail within 2500 hours is 0.632 (radioactivity). Assuming that multiple failures would be independent events, the probability that BOTH fail within 2500 hours = 0.632^2 = 0.40

Q: What is the minimum number of people we would need to assemble in a group such that the probability that at least one person in the group has the same birthday as you is greater than 50%?
A: Find minimum n such that (364/365)^n < 0.5 => 253 people !!!!!

Q: Having thrown a die 10 times, what is the probability of rolling all six numbers? Order does not matter.
A: T(10,6)/6^10 where T(10,6) is the coefficient of x^10 in the expansion [e^x - 1]^6. Plugging in all the numbers the probability is 0.27 !!

Q: What is the probability of having a Social Security Number comprised of only two digits (say 1s and 2s) ?
A: 10C2 * (2^9 - 2)/ 10^9 = 22950/ 10^9 ~ 1/44000. Note that I needed to subtract 2 from 2^9 because I need to exclude numbers made exclusively using a single digit.

Q: If you flip a coin 10 times, what is the probability of getting at least 4 heads?
A: [10C4 + 10C5 + 10C6 + 10C7+10C8 + 10C9 + 10C10] (1/2)^10 = 1 - [1+10C1 + 10C2 + 10C3]/2^10 = 1 - 176/1024 = 82.8%

Q: In a box there are nine fair coins and one two-headed coin. One coin is chosen at random and tossed twice. Given that heads show both times, what is the probability that the coin is the two-headed one? What if it comes up heads for three tosses in a row?
A: Before performing the experiment, here are the three possible outcomes

                                  flip two heads (1/4) 
          choose fair coin (9/10)
         /                       \flip anything else (3/4)
10 coins                            
         choose two-headed coin (1/10) -> flip 2 heads (1/1)
The top row has probability (9/10)*(1/4) = 9/40
The middle row has probability (9/10)*(3/4) = 27/40
The last one has probability (1/10)*1 = 1/10

After tossing the coin, the middle option is out. So, the posterior probability of the coin being fair is 9/(9+4) = 9/13 and being two-headed is 4/13.

In case of three tosses bing head, the above probabilities become 9/(9+8) = 9/17 and 8/17 respectively.

Q: How can you use a 6-sided Die to Generate a Random Number from 1 to 7 ?
Ans:Roll the dice twice, keeping track of the first and second roll. There are 36 outcomes, let's discard 66. If you get a (6,6), just re-roll the die twice again until you get a non-(6,6). Now there are 35 equally-likely outcomes, so divide them into 7 groups of 5 corresponding to the 7 choices among which you want to choose.

Q: Of the nine members of the board of trustees of a college, five agree with the president on a certain issue. The president selects three trustees at random and asks their opinions. What is the probability that at least two of them will agree with him?
A: (5C3 + 5C2 * 4) / 9C3 = (10 + 4 * 10) / 84 = 50/84 = 25/42 ~ 60%

Q: Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?
A: 30/(30+20) = 0.6. Note that the probability was 0.5 before Fred picked up the cookie.

December 24th

Copy multiple files from local area to dCache/store

Here is the shell script to do this: cpLocalFilesToStore.csh
setenv DIR 00_09
cd /uscmst1b_scratch/lpc1/3DayLifetime/kalanand/$DIR/
foreach datafile ( `ls *` )
echo "submitting: file://localhost//uscmst1b_scratch/lpc1/3DayLifetime/kalanand/$DIR/$datafile srm://cmssrm.fnal.gov:8443/srm/managerv2?SFN=/11/store/user/kalanand/$DIR/$datafile"

srmcp -2 -debug=true "file://localhost//uscmst1b_scratch/lpc1/3DayLifetime/kalanand/$DIR/$datafile" "srm://cmssrm.fnal.gov:8443/srm/managerv2?SFN=/11/store/user/kalanand/$DIR/$datafile"
echo ""

Go to November's log

Last modified: Wed Dec 24 21:17:29 CST 2008