Kalanand's November 2013 Log

October 2013

December 2013

November 4th

Level-1 Calo trigger upgrade resources

L1T Offline Development twiki
https://github.com/cms-l1t-offline/cmssw/tree/l1t-devel/L1Trigger/L1TCalorimeter
UCT2015 tWiki
uwcms/UCT2015 GitHub
HLT config browser
CMS Technical Design Report for the Level-1 Trigger Upgrade - CERN Document Server
L1Trigger Upgrade Perf Studies twiki
uhalQuickTutorial cactus
Stage 1 Layer 2 Technical Workshop (20 November 2013)
SWGuideL1TYellow twiki
MP7 twiki
SLHC Fall2012 Updates tWiki
L1 Trigge CDR part 1 follow-up meeting on CT upgrade (31 October 2012)
L1TUpgrade Performance TF twiki
FNAL CMS Trigger Upgrade twiki
Cristian Vega twiki
FNAL Trigger Study Group - Home
L1Trigger Upgrade UCT Work twiki
Backup Material
21st USCMS CD1 TEAM Weekly Meeting (10 May 2013)
LPC Trigger Event (7 August 2013)
SWGuide L1TOffline Dev
Review of the Stage 1 Calorimeter Trigger (January 23, 2014)
cactus
/trunk/boards/mp7/base_fw/mp7_690es cactus
CERN e-groups subscription
CMSSW software - Asana
Laura Dodd presentation

November 11th

AWK refresher

Get AWK version installed on my machine

awk --version | head -1

Output:
awk version 20070501

AWK syntax for search

awk '/search pattern1/ {Actions} /search pattern2/ {Actions}' file

Let's start with a simple input file

cat employee.txt

100  Thomas  Manager    Sales       $65,000
200  Jason   Developer  Technology  $65,500
300  John    Sysadmin   Technology  $77,000
400  Emily   Manager    Marketing   $99,500
500  Randy   DBA        Technology  $66,000

Print everything from this input file

awk '{print;}' employee.txt

100  Thomas  Manager    Sales       $65,000
200  Jason   Developer  Technology  $65,500
300  John    Sysadmin   Technology  $77,000
400  Emily   Manager    Marketing   $99,500
500  Randy   DBA        Technology  $66,000

Search for pattern and then print the matching lines


awk '/Thomas/ {print;} /Emily/ {print;}' employee.txt

100  Thomas  Manager    Sales       $65,000
400  Emily   Manager    Marketing   $99,500

Find employees with employee id greater than 200

awk '$1 >200' employee.txt

300  John    Sysadmin   Technology  $77,000
400  Emily   Manager    Marketing   $99,500
500  Randy   DBA        Technology  $66,000

Print list of employees in the Technology department

awk '$4 ~/Technology/' employee.txt

200  Jason   Developer  Technology  $65,500
300  John    Sysadmin   Technology  $77,000
500  Randy   DBA        Technology  $66,000

Print only specific fields

awk '{print $2,$5;}' employee.txt

Thomas $65,000
Jason $65,500
John $77,000
Emily $99,500
Randy $66,000

NF is a built in variable which represents total number of fields in a record. So, we could write the above query as

awk '{print $2,$NF;}' employee.txt

Suppose we have a series of actions, e.g., we want to print a header on the first line followed by the result of the query on subsequent lines, and finally a termination message on the last line. Here is the syntax to accomplish this

awk 'BEGIN {Action} Actions END {Action}'

Print a headline, then specific filelds from the input file, and finally an exit message

awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";} {print $2,"\t",$3,"\t",$4,"\t",$NF;} END{print "Report Generated\n--------------";}' employee.txt

Name	Designation	Department	Salary
Thomas 	 Manager 	 Sales 	 $65,000
Jason 	 Developer 	 Technology 	 $65,500
John 	 Sysadmin 	 Technology 	 $77,000
Emily 	 Manager 	 Marketing 	 $99,500
Randy 	 DBA 	 Technology 	 $66,000
Report Generated
--------------

Another example

awk 'BEGIN {count=0;} $4 ~ /Technology/ {count++;} END { print "Number of employees in Technology Dept =",count;}' employee.txt

Number of employees in Technology Dept = 3

AWK built-in variables

FS       :   Input Field Separator 
OFS      :   Output Field Separator
RS       :   Record Separator
ORS      :   Output Record Separator 
NR       :   Number of Records
NF       :   Number of Fields in a record
FILENAME :   Name of the current input file
FNR      :   Number of Records relative to the current input file

Examples

awk -F 'FS' ':' inputfilename
awk 'BEGIN{FS=":";}'
awk 'BEGIN{OFS="=";} {print $2,$NF;}' employee.txt
awk 'BEGIN {RS="\n\n";FS="\n";} {print $1,$2;}' employee.txt
awk 'BEGIN{ORS="\n\n";} {print $1,$2;}' employee.txt
awk 'BEGIN{OFS="=";} {print $1,$2;}END {print NR, "records are processed";}' employee.txt
awk '{print NR,"->",NF}' employee.txt
awk '{print FILENAME}' employee.txt
awk '{print FILENAME, FNR;}' employee.txt

AWK built-in functions

Numeric functions:

int(x)  :  the nearest integer to x, exactly as in C++. 
sqrt(x) :  positive square root of x
exp(x)  :  exponential of x (e ^ x), reports an error if x is out of range 
log(x)  :  natural logarithm of x, if x is positive; otherwise, reports an error
sin(x)  :  the sine of x, with x in radians 
cos(x)  :  the cosine of x, with x in radians  
atan2(y, x): the arctangent of y / x in radians  
rand()  :  a random number between 0 and 1. The value is never 0 and never 1.

string manipulation:

 
index(in, find) : the first occurrence of "find" in string "in" (default 0). 
length(string)  : number of characters in the string
match(string, regexp): the first match of regex in the given string 
split(string, array [, fieldsep]): split the string into pieces, store in array 
sprintf(format, expression1,...): same as in C++
sub(regexp, replacement [, target]): change the first occurrence of regex 
gsub(regexp, replacement [, target]): same as sub, but global substitution 
substr(string, start [, length]):  return a substring 
tolower(string): same as in C++
toupper(string): same as in C++

Input/Output:

close(filename)
system(command)
systime()
strftime([format [, timestamp]])

strftime supports the following date format specifications:

%a  The locale's abbreviated weekday name. 
%A  The locale's full weekday name. 
%b  The locale's abbreviated month name. 
%B  The locale's full month name. 
%c  The locale's "appropriate" date and time representation. 
%d  The day of the month as a decimal number (01--31). 
%H  The hour (24-hour clock) as a decimal number (00--23). 
%I  The hour (12-hour clock) as a decimal number (01--12). 
%j  The day of the year as a decimal number (001--366). 
%m  The month as a decimal number (01--12). 
%M  The minute as a decimal number (00--59). 
%p  AM/PM designations associated with a 12-hour clock. 
%S  The second as a decimal number (00--61).(11) 
%U  The week number of the year (the first Sunday as the first day of week one) as a decimal number (00--53). 
%w  The weekday as a decimal number (0--6). Sunday is day zero. 
%W  The week number of the year (the first Monday as the first day of week one) as a decimal number (00--53). 
%x  The locale's "appropriate" date representation. 
%X  The locale's "appropriate" time representation. 
%y  The year without century as a decimal number (00--99). 
%Y  The year with century as a decimal number (e.g., 1995). 
%Z  The time zone name or abbreviation.
%%  A literal `%'.

Examples:

awk 'BEGIN{print int(3.534); print int(4); print int(-5.223); print int(-5); }'
3
4
-5
-5

awk 'BEGIN{print log(12); print log(0); print log(1); print log(-1); }'
2.48491
-inf
0
nan


awk 'BEGIN{ print sqrt(16); print sqrt(0); print sqrt(-12); }'
4
0
nan


awk 'BEGIN{ print exp(123434346); print exp(0); print exp(-12); }'
inf
1
6.14421e-06


awk 'BEGIN { print sin(3.1415926);  print sin(atan2(0,-1)); print sin(90); }'
5.35898e-08
1.22465e-16
0.893997


awk 'BEGIN { print cos(3.1415926);  print cos(atan2(0,-1)); print cos(90); }'
-1
-1
-0.448074

The following example generates 1000 random numbers between 0 to 100 and shows how often each number was used. The configuration file rand.awk can be found here.

awk -f  rand.awk

0 Occured 12 times
1 Occured 7 times
2 Occured 4 times
3 Occured 8 times
4 Occured 7 times
5 Occured 11 times
6 Occured 7 times
7 Occured 12 times
8 Occured 5 times
9 Occured 12 times
10 Occured 13 times
11 Occured 8 times
12 Occured 12 times
13 Occured 10 times
14 Occured 6 times
15 Occured 11 times
16 Occured 14 times
17 Occured 9 times
18 Occured 15 times
19 Occured 12 times
20 Occured 10 times
21 Occured 9 times
22 Occured 13 times
23 Occured 9 times
24 Occured 11 times
25 Occured 8 times
26 Occured 8 times
27 Occured 8 times
28 Occured 11 times
29 Occured 14 times
30 Occured 11 times
31 Occured 7 times
32 Occured 11 times
33 Occured 10 times
34 Occured 11 times
35 Occured 8 times
36 Occured 10 times
37 Occured 5 times
38 Occured 10 times
39 Occured 10 times
40 Occured 9 times
41 Occured 5 times
42 Occured 2 times
43 Occured 12 times
44 Occured 8 times
45 Occured 8 times
46 Occured 10 times
47 Occured 13 times
48 Occured 11 times
49 Occured 11 times
50 Occured 4 times
51 Occured 12 times
52 Occured 13 times
53 Occured 13 times
54 Occured 3 times
55 Occured 9 times
56 Occured 6 times
57 Occured 12 times
58 Occured 11 times
59 Occured 16 times
60 Occured 11 times
61 Occured 11 times
62 Occured 13 times
63 Occured 14 times
64 Occured 15 times
65 Occured 16 times
66 Occured 11 times
67 Occured 11 times
68 Occured 15 times
69 Occured 7 times
70 Occured 8 times
71 Occured 10 times
72 Occured 6 times
73 Occured 10 times
74 Occured 12 times
75 Occured 9 times
76 Occured 13 times
77 Occured 13 times
78 Occured 10 times
79 Occured 8 times
80 Occured 11 times
81 Occured 11 times
82 Occured 10 times
83 Occured 12 times
84 Occured 9 times
85 Occured 6 times
86 Occured 12 times
87 Occured 10 times
88 Occured 17 times
89 Occured 7 times
90 Occured 11 times
91 Occured 12 times
92 Occured 16 times
93 Occured 14 times
94 Occured 11 times
95 Occured 8 times
96 Occured 5 times
97 Occured 4 times
98 Occured 6 times
99 Occured 8 times
100 Occured  times

The following example generates 5 random numbers between 5 and 50 using srand. The configuration file srand.awk can be found here.

awk -f  srand.awk

7
13
14
49

More examples

awk 'BEGIN { print index("peanut", "an") }'    
3 

awk 'BEGIN { print length("abcde")}'   
5

awk 'BEGIN { print length(15 * 35)}'   
3 # because 15*35 = "525"

awk 'BEGIN { print match("I am at Fermilab this week", lab)}'
1

awk 'BEGIN { print split("cul-de-sac", a, "-")}'
3

awk 'BEGIN {str = "daabaaa"; sub(/a*/, "c&c", str); print str;}'
ccdaabaaa


awk 'BEGIN {str = "The candidate came."; sub(/candidate/, "& and his wife", str); print str; }'
The candidate and his wife came.

Count the total number of fields in a file

awk '{ total += NF}; END {print total}' employee.txt

25

Print the even-numbered lines

awk 'NR % 2 == 0' employee.txt

200  Jason   Developer  Technology  $65,500
400  Emily   Manager    Marketing   $99,500

Find the employee with the highest employee ID

awk '$1 > maxid { maxid=$1;}; END { print}' employee.txt

500  Randy   DBA        Technology  $66,000

Examples of conditional statement: if, if else

awk '{ if($1>200) print; }' employee.txt

300  John    Sysadmin   Technology  $77,000
400  Emily   Manager    Marketing   $99,500
500  Randy   DBA        Technology  $66,000

 
awk '{ if($1>200) print $2, $3; else print $3, $4}' employee.txt 

Manager Sales
Developer Technology
John Sysadmin
Emily Manager
Randy DBA


awk '{ if($1<200) print $2; else if ($1<400) print $3, $4; else print $5}' employee.txt 

Thomas
Developer Technology
Sysadmin Technology
$99,500
$66,000

Concatenate every 3 lines of input with a comma.

awk 'ORS=NR%3?",":"\n"' employee.txt

100  Thomas  Manager    Sales       $65,000,200  Jason   Developer  Technology  $65,500,300  John    Sysadmin   Technology  $77,000
400  Emily   Manager    Marketing   $99,500,500  Randy   DBA        Technology  $66,000,

Loops: while, do while, for, break, continue, exit

awk 'BEGIN { while (count++<50) string=string "x"; print string }'

300  John    Sysadmin   Technology  $77,000
400  Emily   Manager    Marketing   $99,500
500  Randy   DBA        Technology  $66,000

awk 'BEGIN{count=1; do print count, "I am bored, printing this 100 times"; while(count++<100)}'

1 I am bored, printing this 100 times
2 I am bored, printing this 100 times
....
100 I am bored, printing this 100 times

awk '{ for (i = 1; i <= NF; i++) total = total+$i; print i,NF,total }; END { print total }' employee.txt

6 5 100
6 5 300
6 5 600
6 5 1000
6 5 1500
1500

Cool stuff: Print the fields in reverse order on every line

awk 'BEGIN{ORS="";}{ for (i=NF; i>0; i--) print $i," "; print "\n"; }' employee.txt

$65,000  Sales  Manager  Thomas  100  
$65,500  Technology  Developer  Jason  200  
$77,000  Technology  Sysadmin  John  300  
$99,500  Marketing  Manager  Emily  400  
$66,000  Technology  DBA  Randy  500

More examples

awk 'BEGIN{ x=1; while(1) {print "Break after 10 iterations"; if ( x==10 ) break; x++;} }'

Break after 10 iterations
Break after 10 iterations
Break after 10 iterations
Break after 10 iterations
Break after 10 iterations
Break after 10 iterations
Break after 10 iterations
Break after 10 iterations
Break after 10 iterations
Break after 10 iterations

awk 'BEGIN{ x=1; while(x<=20) { if(x>5 && x<=15){ x++; continue;} print "Value of x",x;x++;} }'

Value of x 1
Value of x 2
Value of x 3
Value of x 4
Value of x 5
Value of x 16
Value of x 17
Value of x 18
Value of x 19
Value of x 20

awk 'BEGIN{ x=1; while(x<=10) {if(x==5){exit;} print "Value of x",x;x++;} }'

Value of x 1
Value of x 2
Value of x 3
Value of x 4

Use of arrays

Let's start with a file which has duplicate lines

cat duplicates.txt

foo
bar
foo
baz
bar

Remove duplicates

awk '!($0 in array) { array[$0]; print }' duplicates.txt
 
foo
bar
baz

Reverse the order of lines in the above file

awk '{ a[i++] = $0 } END { for (j=i-1; j>=0;) print a[j--] }' duplicates.txt

bar
baz
foo
bar
foo

List all words and their frequency

awk 'BEGIN {print "Word\tCount";} {word[$0]++;} END{ for (var in word) print var,"\t",word[var]; }' duplicates.txt

Word	Count
baz 	 1
foo 	 2
bar 	 2

Generate tables in an html file. The script file generateHtml.awk can be found here.

awk -f generateHtml.awk employee.txt >> table.html

User defined function

Here are some simple examples of UDF

awk '
# Function to obtain a random non-negative integer less than n
function randint(n) { return int(n * rand()) }'

# Function to roll a simulated die.
function roll(n) { return 1 + int(rand() * n) }

# Roll 3 six-sided dice and
# print total number of points.
{
      printf("%d points\n",
             roll(6)+roll(6)+roll(6))
}'

AWK: online resources

November 12th

Sed refresher

The syntax to substitute a string on each line is: sed 's/regex/replacement'.

echo "Hello" | sed 's/Hell/Heaven/'

Heaveno

Using & as the matched string

echo "123 abc" | sed 's/[0-9]*/& &/'

123 123 abc


echo "Kalanand Mishra" | sed 's/Kal*/(&)/'

(Kal)anand Mishra

echo kalanand mishra | sed 's/[^ ]*/(&)/'

(kalanand) mishra

Using \1, \2, ... to keep part of the pattern
The escaped parentheses is a substring of the characters matched by the regular expression. The "\1" is the first remembered pattern, "\2" is the second, etc. Sed has up to nine remembered patterns.

echo abcd123 | sed 's/\([a-z]*\).*/\1/'

abcd

echo kalanand mishra | sed 's/\([a-z]*\).*/\1/'

kalanand

Similarly, we can switch the two words around

echo kalanand mishra | sed 's/\([a-z]*\) \([a-z]*\)/\2 \1/'

mishra kalanand

Removing duplicate words

echo kalanand kalanand mishra | sed 's/\([a-z]*\) \1/\1/'

kalanand mishra

Use "/g" option for global replacement

echo kalanand mishra | sed 's/[^ ]*/(&)/g'

(kalanand) (mishra)

Keep the first occurrence of the word but delete the second:

echo Mishra, Mishra Kalanand | sed 's/[a-zA-Z]* //2'

Mishra, Kalanand


echo Mishra, Mishra Kalanand | sed 's/[a-zA-Z]* /DELETED /2'

Mishra, DELETED Kalanand


echo Mishra, Mishra Kalanand | sed 's/[a-zA-Z]* //2' | sed 's/[a-zA-Z]*, //1'

Kalanand

Multiple commands with -e command

echo Kalanand Mishra | sed -e 's/a/A/' -e 's/h/H/'

KAlanand MisHra

Filenames on the command line
If there is more than one argument to sed that does not start with an option, it must be a filename. This next example will count the number of lines in two files that don't begin with a "#:"

sed 's/^#.*//' employee.txt duplicates.txt | grep -v '^$' | wc -l

10

The "-n" option will not print anything

echo kalanand mishra | sed -n 's/[^ ]*/(&)/g'

# Nothing is printed

Execute sed commands from a script file

sed -f sedscript <filename>

where sedscript could look like this:

# This script is called 'sedscript'
# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g

Example:

echo kalanand mishra | sed -f sedscript

kAlAnAnd mIshrA

Passing arguments into a sed script

echo kalanand mishra | sed 's/'$0'/&/'

kalanand mishra

Restricting to a line number or range

echo "123\n112233\n112222233333" | sed '3 s/[0-9][0-9]*//'

123
112233
#empty line


echo "123\n112233\n112222233333" | sed '2,3 s/[0-9][0-9]*//'

123
#empty line
#empty line

The "$" is one of those conventions that mean "last". So the above query could be written as

echo "123\n112233\n112222233333" | sed '2,$ s/[0-9][0-9]*//'

Patterns
Many UNIX utilities like vi and more use a slash to search for a regular expression. Sed uses the same convention, provided you terminate the expression with a slash. To delete the first number on all lines that start with a "#," use: sed '/^#/ s/[0-9][0-9]*//'

echo "#0123\n#3456\n789" | sed '/^#/ s/[0-9][0-9]*//'

#
#
789

Transform with y
To change the letters "a" through "f" into their upper case form, use:

echo kalanand mishra | sed 'y/abcdef/ABCDEF/'

kAlAnAnD mishrA

More practical examples

Eliminate Comments Using sed

sed 's/#.*//' file

echo "I am going\n#home\nto dinner" | sed 's/#.*//'

I am going

to dinner

Eliminate Comments and Empty Lines Using sed

sed 's/#.*//;/^$/d'  file 

echo "I am going\n#home\nto dinner\n\nback to work again" | sed 's/#.*//;/^$/d'

I am going
to dinner
back to work again

Eliminate HTML Tags from file Using sed

sed 's/<[^>]*>//g' index.html

Delete Last X Number of Characters From Each Line (X = 3 in this example)

echo "kalanand\nmishra" | sed 's/...$//'

kalan
mis

Substitute Only When the Line Matches with the Pattern

# If the line matches with the pattern “-”, then it replaces all the characters from “-” with the empty.
echo "This is\n-Kalanand\nMishra" | sed '/\-/s/\-.*//g'

This is

Mishra

Changing the PATH

sed 's|myfolder/mysubfolder/myfile|myotherfolder/myotherfile|' config

Double-space a file

sed G file

Double-space a file which already has blank lines in it. Output file should contain no more than one blank line between lines of text.

sed '/^$/d;G' file

Triple space a file

sed 'G;G'

Undo double-spacing (assumes even-numbered lines are always blank)

sed 'n;d'

Insert a blank line above every line which matches "regex"

sed '/regex/{x;p;x;}'

Insert a blank line below every line which matches "regex"

sed '/regex/G'

Insert a blank line above and below every line which matches "regex"

sed '/regex/{x;p;x;G;}'

Number each line of a file (simple left alignment). Using a tab instead of space will preserve margins.

sed = filename | sed 'N;s/n/t/'

Count lines (emulates "wc -l")

sed -n '$='

Substitute (find and replace) "foo" with "bar" on each line

sed 's/foo/bar/'              # replaces only 1st instance in a line
sed 's/foo/bar/4'             # replaces only 4th instance in a line
sed 's/foo/bar/g'             # replaces ALL instances in a line
sed 's/(.*)foo(.*foo)/1bar2/' # replace the next-to-last case
sed 's/(.*)foo/1bar/'         # replace only the last case

Substitute "foo" with "bar" EXCEPT for lines which contain "baz"

sed '/baz/!s/foo/bar/g'

Substitute "foo" with "bar" ONLY for lines which contain "baz"

sed '/baz/s/foo/bar/g'

Change "scarlet" or "ruby" or "puce" to "red"

sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g'

Join pairs of lines side-by-side (like "paste")

sed '$!N;s/n/ /'

If a line ends with a backslash, append the next line to it

sed -e :a -e '/$/N; s/n//; ta'

Delete every 8th line

sed 'n;n;n;n;n;n;n;d;'

Delete ALL blank lines from a file (same as "grep '.' ")

sed '/^$/d'                           # method 1
sed '/./!d'                           # method 2

A use case: extract version number from string (only version, without other numbers). For example, I have: Chromium 12.0.742.112 Ubuntu 11.04. I want: 12.0.742.112 instead of: 12.0.742.11211.04.

sed 's/[^0-9.]*\([0-9.]*\).*/\1/'

Sed: online resources

November 13th

Other GNU text utilities: sort

This utility sorts the lines within a file or files. A variety of options exist to allow sorting on fields or character positions within the file, and to modify the comparison operation (numeric, date, case-insensitive, etc).

sort --help

I get the following output from GNU manual

sort --help
Usage: sort [OPTION]... [FILE]...
Write sorted concatenation of all FILE(s) to standard output.

Mandatory arguments to long options are mandatory for short options too.
Ordering options:

  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < `JAN' < ... < `DEC'
  -n, --numeric-sort          compare according to string numerical value
  -r, --reverse               reverse the result of comparisons

Other options:

  -c, --check               check whether input is sorted; do not sort
  -k, --key=POS1[,POS2]     start a key at POS1, end it at POS2 (origin 1)
  -m, --merge               merge already sorted files; do not sort
  -o, --output=FILE         write result to FILE instead of standard output
  -s, --stable              stabilize sort by disabling last-resort comparison
  -S, --buffer-size=SIZE    use SIZE for main memory buffer
  -t, --field-separator=SEP  use SEP instead of non-blank to blank transition
  -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or /tmp;
                              multiple options specify multiple directories
  -u, --unique              with -c, check for strict ordering;
                              without -c, output only the first of an equal run
  -z, --zero-terminated     end lines with 0 byte, not newline
      --help     display this help and exit
      --version  output version information and exit

POS is F[.C][OPTS], where F is the field number and C the character position
in the field.  OPTS is one or more single-letter ordering options, which
override global ordering options for that key.  If no key is given, use the
entire line as the key.

SIZE may be followed by the following multiplicative suffixes:
% 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

With no FILE, or when FILE is -, read standard input.

*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.

Simple sort

echo "John\nJack\nZeck\nJake\nJoseph\nJoe" | sort

Jack
Jake
Joe
John
Joseph
Zeck

Reverse sort (-r option):

echo "John\nJack\nZeck\nJake\nJoseph\nJoe" | sort

Zeck
Joseph
John
Joe
Jake
Jack

Opther useful options:
-f ignore case
-s stable sort

Sorting numbers

echo "5\n4\n12\n1\n3\n56" | sort

1
12
3
4
5
56

So, the simple sort of numbers is not what we really want. But we can use "-n" option to get the expected result.

echo "5\n4\n12\n1\n3\n56" | sort -n

1
3
4
5
12
56

And, if our lines happen to have some leading blanks, we can easily ignore those and still sort correctly (using the -b flag):

echo "A\na\nb\n     B\n    C\n  E\n D\n C\n" | sort

     B
    C
  E
 C
 D
A
a
b

echo "A\na\nb\n     B\n    C\n  E\n D\n C\n" | sort -b

A
     B
    C
 C
 D
  E
a
b

We may want to use unique (-u) option to remove duplicates from sort result

echo "A\na\nb\n     B\n    C\n  E\n D\n C\n" | sort -b -u

A
     B
    C
 D
  E
a
b

Sorting by column number in a multi-column file

ls -1l | sort -k5

drwx------+  2 kalanand  staff   68 Feb  4  2008 Mail
drwxr-xr-x+  2 kalanand  staff   68 Feb  4  2008 nt_files
drwxr-xr-x+  2 kalanand  staff   68 Feb  4  2008 private
drwxr-xr-x+ 18 kalanand  staff  612 Nov 25 13:27 public_html

Note: The column separator is, by default, any blank character. We can change using "-t" option (e.g., -t:). We can also sort for columns in a range:

ls -1l | sort -k5,9 -n

Other GNU text utilities: uniq

The utility uniq removes adjacent lines which are identical to each other--or if some switches are used, close enough to count as identical (you may skip fields, character postitions, or compare as case-insensitive). The most typical use of uniq is in the expression sort list_of_things | uniq , producing a list with just one of each item (one per line).

Examples:
Without requiring unique option

echo "John\nJack\nZeck\nJake\nJoseph\nJoe\nJake" | sort

Jack
Jake
Jake
Joe
John
Joseph
Zeck

With uniq

echo "John\nJack\nZeck\nJake\nJoseph\nJoe\nJake" | sort | uniq

Jack
Jake
Joe
John
Joseph
Zeck

Identify duplicates (-d)

echo "John\nJack\nZeck\nJake\nJoseph\nJoe\nJake" | sort | uniq -d

Jake

Count occurrences (-c)

echo "John\nJack\nZeck\nJake\nJoseph\nJoe\nJake" | sort | uniq -c

   1 Jack
   2 Jake
   1 Joe
   1 John
   1 Joseph
   1 Zeck

Skip a filed skip in comparisons (-f)

echo "John 100\nJack 551\nZeck 185\nJake 411\nJoseph 56\nJoe 21\nJake 29" | sort | uniq -f 1

Jack 551
Jake 29
Jake 411
Joe 21
John 100
Joseph 56
Zeck 185

Split and join large files using command-line

To split a large file into smaller ones:

split [OPTION] [INPUT [PREFIX]]

Description: Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT is -, read standard input.

Options:

       -a, --suffix-length=N
	      use suffixes of length N (default 2)

       -b, --bytes=SIZE
	      put SIZE bytes per output file

       -C, --line-bytes=SIZE
	      put at most SIZE bytes of lines per output file

       -d, --numeric-suffixes
	      use numeric suffixes instead of alphabetic

       -l, --lines=NUMBER
	      put NUMBER lines per output file

Use case: Let's say I have to send a personal video file of size 100MB to my friends through Gmail but Gmail has a maximum file upload limit of size 20MB. But I can split it into 5 smaller files of size 20MB and then upload the small files.

split -b 20m <Largefilename> <Smallfilename>

where 20m is the size of output file in MB,to split in KB put k instead of m. The split files have names <Smallfilename>x, where x= "aa", "ab", "ac", ... etc.

To merge several small files into a single large file:
We can join the part files using the following command

cat Smallfilename* >  Largefilename

Some real life commandline examples

wc -w *.txt | sort -nr | head -4

find . *.txt -print | xargs grep '[A-Z][A-Z]' 

awk '{if($1>300) print $1,$2,$3,$4}' employee.txt | sort -n | uniq | head -5

awk '{print $1,$2,$3,$4}' employee.txt  | sort -u | wc -l

find . -name employee.txt | xargs grep '[100-500]'  | awk '{print $1, " ", $2, " ", $3, " ", $4, " ", $5}' | sort -u | wc -l

Go to October's log

Last modified: Wed Mar 26 11:51:06 CST 2014