≡ Menu

Awk Introduction Tutorial – 7 Awk Print Examples

Linux Awk Tutorial - Introduction and Awk Examples

This is the first article on the new awk tutorial series. We’ll be posting several articles on awk in the upcoming weeks that will cover all features of awk with practical examples.

In this article, let us review the fundamental awk working methodology along with 7 practical awk print examples.

Note: Make sure you review our earlier Sed Tutorial Series.

Awk Introduction and Printing Operations

Awk is a programming language which allows easy manipulation of structured data and the generation of formatted reports. Awk stands for the names of its authors “Aho, Weinberger, and Kernighan”

The Awk is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that matches with the specified patterns and then perform associated actions.

Some of the key features of Awk are:

  • Awk views a text file as records and fields.
  • Like common programming language, Awk has variables, conditionals and loops
  • Awk has arithmetic and string operators.
  • Awk can generate formatted reports

Awk reads from a file or from its standard input, and outputs to its standard output. Awk does not get along with non-text files.

Syntax:

awk '/search pattern1/ {Actions}
     /search pattern2/ {Actions}' file

In the above awk syntax:

  • search pattern is a regular expression.
  • Actions – statement(s) to be performed.
  • several patterns and actions are possible in Awk.
  • file – Input file.
  • Single quotes around program is to avoid shell not to interpret any of its special characters.

Awk Working Methodology

  1. Awk reads the input files one line at a time.
  2. For each line, it matches with given pattern in the given order, if matches performs the corresponding action.
  3. If no pattern matches, no action will be performed.
  4. In the above syntax, either search pattern or action are optional, But not both.
  5. If the search pattern is not given, then Awk performs the given actions for each line of the input.
  6. If the action is not given, print all that lines that matches with the given patterns which is the default action.
  7. Empty braces with out any action does nothing. It wont perform default printing operation.
  8. Each statement in Actions should be delimited by semicolon.

Let us create employee.txt file which has the following content, which will be used in the
examples mentioned below.

$cat employee.txt
100  Thomas  Manager    Sales       $5,000
200  Jason   Developer  Technology  $5,500
300  Sanjay  Sysadmin   Technology  $7,000
400  Nisha   Manager    Marketing   $9,500
500  Randy   DBA        Technology  $6,000

Awk Example 1. Default behavior of Awk

By default Awk prints every line from the file.

$ awk '{print;}' employee.txt
100  Thomas  Manager    Sales       $5,000
200  Jason   Developer  Technology  $5,500
300  Sanjay  Sysadmin   Technology  $7,000
400  Nisha   Manager    Marketing   $9,500
500  Randy   DBA        Technology  $6,000

In the above example pattern is not given. So the actions are applicable to all the lines.
Action print with out any argument prints the whole line by default. So it prints all the
lines of the file with out fail. Actions has to be enclosed with in the braces.

Awk Example 2. Print the lines which matches with the pattern.

$ awk '/Thomas/
> /Nisha/' employee.txt
100  Thomas  Manager    Sales       $5,000
400  Nisha   Manager    Marketing   $9,500

In the above example it prints all the line which matches with the ‘Thomas’ or ‘Nisha’. It has two patterns. Awk accepts any number of patterns, but each set (patterns and its corresponding actions) has to be separated by newline.

Awk Example 3. Print only specific field.

Awk has number of built in variables. For each record i.e line, it splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 4 words, it will be stored in $1, $2, $3 and $4. $0 represents whole line. NF is a built in variable which represents total number of fields in a record.

$ awk '{print $2,$5;}' employee.txt
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000

$ awk '{print $2,$NF;}' employee.txt
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000

In the above example $2 and $5 represents Name and Salary respectively. We can get the Salary using  $NF also, where $NF represents last field. In the print statement ‘,’ is a concatenator.

Awk Example 4. Initialization and Final Action

Awk has two important patterns which are specified by the keyword called BEGIN and END.

Syntax: 

BEGIN { Actions}
{ACTION} # Action for everyline in a file
END { Actions }

# is for comments in Awk

Actions specified in the BEGIN section will be executed before starts reading the lines from the input.
END actions will be performed after completing the reading and processing the lines from the input.

$ awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";}
> {print $2,"\t",$3,"\t",$4,"\t",$NF;}
> END{print "Report Generated\n--------------";
> }' employee.txt
Name	Designation	Department	Salary
Thomas 	 Manager 	 Sales 	         $5,000
Jason 	 Developer 	 Technology 	 $5,500
Sanjay 	 Sysadmin 	 Technology 	 $7,000
Nisha 	 Manager 	 Marketing 	 $9,500
Randy 	 DBA 	 	 Technology 	 $6,000
Report Generated
--------------

In the above example, it prints headline and last file for the reports.

Awk Example 5. Find the employees who has employee id greater than 200

$ awk '$1 >200' employee.txt
300  Sanjay  Sysadmin   Technology  $7,000
400  Nisha   Manager    Marketing   $9,500
500  Randy   DBA        Technology  $6,000

In the above example, first field ($1) is employee id. So if $1 is greater than 200, then just do the default print action to print the whole line.

Awk Example 6. Print the list of employees in Technology department

Now department name is available as a fourth field, so need to check if $4 matches with the string “Technology”, if yes print the line.

$ awk '$4 ~/Technology/' employee.txt
200  Jason   Developer  Technology  $5,500
300  Sanjay  Sysadmin   Technology  $7,000
500  Randy   DBA        Technology  $6,000

Operator ~ is for comparing with the regular expressions. If it matches the default action i.e print whole line will be  performed.

Awk Example 7. Print number of employees in Technology department

The below example, checks if the department is Technology, if it is yes, in the Action, just increment the count variable, which was initialized with zero in the BEGIN section.

$ awk 'BEGIN { count=0;}
$4 ~ /Technology/ { count++; }
END { print "Number of employees in Technology Dept =",count;}' employee.txt
Number of employees in Tehcnology Dept = 3

Then at the end of the process, just print the value of count which gives you the number of employees in Technology department.

Recommended Reading

Sed and Awk 101 Hacks, by Ramesh Natarajan. I spend several hours a day on UNIX / Linux environment dealing with text files (data, config, and log files). I use Sed and Awk for all my my text manipulation work. Based on my Sed and Awk experience, I’ve written Sed and Awk 101 Hacks eBook that contains 101 practical examples on various advanced features of Sed and Awk that will enhance your UNIX / Linux life. Even if you’ve been using Sed and Awk for several years and have not read this book, please do yourself a favor and read this book. You’ll be amazed with the capabilities of Sed and Awk utilities.

Additional Awk Articles

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 108 comments… add one }

  • Steve Mills January 6, 2010, 3:38 am

    I have only just started reading these articles. So far I think they are well written and the explanations are clearly done with an awareness as to how they might possibly be misunderstood and hence extra layers of detail are presented where that might happen. For example, pointing out that the tilde (~) is used to compare with regular expressions when the reader might have otherwise expected an equals sign – without the explanation the reader might have decided that the tilde represented the same thing as an equals sign.

    I shall be reading more.

    Thanks for posting these articles.

    Kind Regards
    Steve

  • Daniel Reimann January 6, 2010, 6:02 am

    Thank you for the post here on awk. I use it frequently, but it is always good to have some updates and reminders. Happy New Year.

  • Lawrence January 7, 2010, 4:34 am

    awk is awesome! thanks for your sharing.

    Best Regards,
    Lawrence

  • Knusper January 9, 2010, 5:15 pm

    Hi… Good article – now I know what ark is, and what I could use it for – well written…. I follow you now on twitter!

  • Harsh January 10, 2010, 10:08 pm

    Thanks for posting a tutorial on awk with illustrated examples.
    I Will be expecting other articles on awk :)

  • Ramesh Natarajan January 14, 2010, 9:23 pm

    @Steve,

    Yeah. ~ can be little confusing in this context, if not explained properly. Thanks for you comment.

    @Daniel,

    Yeah. Most other readers of the blog are in similar position like you. So, we are here to give constant updated and remainders of the functionality that they already know.

    @Lawrence, Harsh,

    Thanks for the very nice comment. I’m glad you liked this article.

    @Knusper,

    Thanks for following us on twitter.

  • thalafan March 21, 2010, 10:24 am

    Nandraka Ulladhu!!!

    I guess the example 2 can be done without a new line like below? Pattern as regex.

    ~/bin$ awk ‘/Jason|Randy/’ employee
    200 Jason Developer Technology $5,500
    500 Randy DBA Technology $6,000

    And also what does the ; stands for? End of Pattern?

  • Andreia Amaral April 7, 2010, 5:14 am

    Hi,

    I want to use an if else statement like this:
    if $10>10 print $0 > filename1
    else print $0> filename2

    but it’s not working it is not creating filename1 or filename2, how can I do this?
    thanks?

  • Ashu Agrawal August 6, 2010, 10:31 am

    Grt post.Thanks for making me understand the awk working

  • avinash October 1, 2010, 7:30 am

    hi, this is avinash….
    suppose u have a emp table like this:
    id name designation salary
    1 avi manager 1000
    2 ash manager 1500
    3 nash manager 2000
    4 varma trainee 500
    5 chow trainee 600
    6 hemanth trainee 800

    using awk command, i hav to print manager total salary and trainee total salary….
    i need a program….. can any one plz post it

  • vikas October 13, 2010, 1:34 am

    Hi…..@Avinash…..
    u can try this one…….
    awk ‘BEGIN {man_sal=0;trainee_sal=0;}
    $3 ~/manager/ {man_sal+=$NF}
    /trainee/ {trainee_sal+=$NF}
    END {print “Total salary of manager’s and trainee’s are=”man_sal,trainee_sal}’ in_file.name

  • siva October 15, 2010, 1:44 am

    Hello forum members,

    Thanks for AWK tutorials ,it was very help ful to me.

  • avinash October 19, 2010, 5:12 am

    @ vikas
    thanks you

  • wish October 21, 2010, 3:36 am

    hi all,
    if i have a issue file like:

    101 add open vish iuo

    if exit and login again i should get the increment of the first field like

    102 add open vish iuo

  • mounika October 27, 2010, 10:40 pm

    its simply superb to understand

    its is very useful for the beginning learners and its is very help in exams time also

    so guys read and enjoy wd the unix

  • Lynda Zhang November 17, 2010, 12:36 pm

    This is very help. How about if I want to print the salary seperated by commas, e.g. 2,000 instead of 2000

  • Ikem December 30, 2010, 8:22 pm

    You’ve made a little typo:

    > Number of employees in _Tehcnology_ Dept = 3

  • sudha February 2, 2011, 12:38 pm

    vary vary healpul to every one

  • eben March 27, 2011, 11:28 pm

    Its very useful for beginers like me…………

  • kalandar April 7, 2011, 8:58 am

    Hi,
    I found this article to be very useful. Anybody who wants to know what an awk is , this will give a fair idea. Looking forward to similar articles on other topics of unix.

    Thanks :)

  • Bhagyaraj April 24, 2011, 1:38 am

    Hi,

    Good,
    Please try teach in Youtube to try differntly.
    It will be more success.

    Keep it up,
    I need to take an exam on awk, let me see how much I can successed.

  • kernelkid June 10, 2011, 6:48 am

    very simple and easy to understand, thanks a lot, it help me a lot

  • liju June 14, 2011, 3:05 am

    good simple article :)

  • Marija June 30, 2011, 9:03 am

    I have read few geekstuff articles until now, explanations provided are the best I have ever seen so far! Great job :) Thanks a lot :)

  • Muslim July 19, 2011, 12:02 pm

    hi,

    i have the question that how to using print command “awk” to sort or transpose this data from many coloums to 2 columns only
    #input file
    NAME JAN FEB MARCH APRIL MAY JUNE JULY
    ——- —– —— ———- ——– —— ——- ——
    BEN 5,000 6,000 7,000 8,000 6,500 7,500 9,000
    YONG 4,000 5,500 6,000 5,800 7,000 8,000 8,5000

    # output should be as below.
    BEN 5,000
    BEN 6,000
    BEN 7,000
    BEN 8,000
    BEN 6,500
    BEN 7,500
    BEN 9,000
    YONG 4,000
    YONG 5,500
    YONG 6,000
    YONG 5,800
    YONG 7,000
    YONG 8,000
    YONG 8,5000
    Anyone can help.thanks
    @muss

  • nails carmody August 9, 2011, 1:08 pm

    I know it’s late, but …

    #!/bin/bash

    awk ‘ {
    # skip the first two lines
    if (NR == 1 || NR == 2)
    continue

    for(i=2; i<=NF; i++)
    printf("%s %s\n", $1, $i)
    } ' datafile.txt

    Nice site! I learned some new things about sed I didn't know.

  • Sudhanshu August 29, 2011, 1:00 am

    The article is simply awesome!!!

  • Kalim January 25, 2012, 9:03 am

    Very gud information for beginners. Thanks

  • Len Richards February 15, 2012, 3:46 pm

    We have duplicate titles in our library database. I am able to find the duplicates but can’t figure out how to print the line above that matches the regular expression and that line also has the same expression. Each set of duplicates has a space between: example. I would match on SERSOL-EBK

    2012 Joe ate my dog|SERSOL-EBK|122456
    2012 Joe ate my dog|SERSOL-EBK|122459

    2011 Joe ate my toaster|SERSOL-EBK|122433
    2011 Joe ate my toaster|SERSOL-EBK|125567

  • Neha March 1, 2012, 12:37 am

    hi..
    i want to capture these data through awk.

    596583.46875(E) 4924298.34375(N)
    geology@PERMANENT in PERMANENT (9)sand

    604960.78125(E) 4922837.53125(N)
    geology@PERMANENT in PERMANENT (6)shale

    596911.40625(E) 4920512.15625(N)
    geology@PERMANENT in PERMANENT (4)sandstone

    the output should be :
    insert into mytable values(596583.46875,4924298.34375,geology@PERMANENT,shale);
    insert into mytable values(604960.78125,4922837.53125,geology@PERMANENT,shale);

    any help would be grateful.

    sorry i need to put shale within single codes so that i can insert into my table.insert into mytable values(596583.46875,4924298.34375,geology@PERMANENT,’shale’);

    Thank u

  • subhojit777 June 30, 2012, 12:55 am

    Great article. Thanks you :)

  • Mimi July 6, 2012, 2:20 pm

    Thank you for this site – you really saved me a lot of time. This is easy to follow and really helped me understand awk.

  • sudarshan July 17, 2012, 7:38 am

    Hi All,

    I have a log file with the contents as below.

    09:30:51 [00082] TIMER iatesystems.hub.logging.TimerLog: MPI_MxmRunMatch: 09:30:52 [00082] TIMER iatesystems.hub.logging.TimerLog: MPI_MxmRunMatch:
    09:30:53 [00082] TIMER iatesystems.hub.logging.TimerLog: MPI_MxmRunMatch:

    This will have entities based on the timestamp shown above (i.e. 09:30:51, 09:30:52, etc).

    Now my requirement is, I need to get the number of entries per an hour.
    For example if the first record is with timestamp as 09:30:51.
    Till 10:30:51 timestamp’s record I need to get the number of records.

    Can you please help me to get this?

    Thanks,
    Sudarshan

  • Steven August 13, 2012, 8:54 am

    Neha try this
    suppose your data filename=log.txt
    insert statements be put in insert.sql
    ————————————————————-
    awk ‘BEGIN {str=”insert into mytable values(“;}
    {if ((NR % 3)==1)
    { #print “NR1 is:” NR;

    instr=str””substr($1,1,12)”,”substr($2,1,13);
    #print “str1 is:” instr;
    }
    else if ((NR % 3)==2)
    {#print “NR2 is:”NR;
    instr=instr”,”$1″,””‘”‘”‘””ashale””‘”‘”‘””)”;
    print instr >”insert.sql”;
    }
    else ((NR % 3)==0)
    {# print “NR3 is:”NR;
    str=”insert into mytable values(“;
    }
    fi}
    END{}’ log.txt

  • Prabhu September 11, 2012, 5:29 am

    Comma delimited file in below format

    Name, Address,Start_Date,End_date

    But for few records there is extra comma in the Address field
    As
    Mark, 110A , Stuart Mark, 01/01/2012,06/07/2012

    Please help me as how to remove this extra comma, Out of 1 million record am having this issue in 10000 records

  • nails September 11, 2012, 1:36 pm

    Prabhu:
    If the number of fields is greater than 4, assume there is comma that makes field 2 into field 2 and field 3. Then, combine field 2 into field 2 and field 3:

    awk ‘ BEGIN { FS=OFS=”,” }
    {
    if(NF > 4)
    $2=”$2 $3″
    print $0
    } ‘ datafile.txt

  • Prabhu September 18, 2012, 3:38 am

    Thanks Nalis, Appreciate your help

  • Kiran Indrala November 7, 2012, 11:33 am

    It is really very helpfull for beginers.i really appreciate you for sharing this info.

    Thanks,
    Kiran Indrala

  • R.S.Wadhwa December 5, 2012, 4:46 am

    Hello,
    I have the following output:
    [Server, Modul, Version, timestamp]
    Serversv10 ;admin ;1.0.2 ;2012.12.03-14:07:39
    Serversv10 ;admin ;0.1.2 ;2012.12.03-14:07:39
    Serversv10 ;admin ;0.0.1 ;2012.12.04-09:29:45
    Serversv10 ;admin ;0.0.12 ;2012.12.03-10:23:31
    Serversv10 ;admin ;0.0.2 ;2012.12.03-14:07:39
    Serversv10 ;admin ;1.0.1 ;2012.12.04-09:29:45
    Serversv10 ;admin ;0.0.8 ;2012.12.04-09:34:39
    Serversv10 ;project ;0.0.1 ;2012.12.03-14:08:37
    Serversv11 ;admin ;0.0.6 ;2012.12.03-10:23:31
    Serversv10 ;project ;0.0.1 ;2012.12.04-08:22:09
    Serversv11 ;admin ;0.0.6 ;2012.12.03-13:58:58
    Serversv11 ;admin ;0.0.6 ;2012.12.03-11:39:15
    Serversv11 ;admin ;0.0.7 ;2012.12.03-14:07:39
    Serversv11 ;basis ;0.0.1 ;2012.12.03-11:39:59
    Serversv11 ;project ;0.0.1 ;2012.12.03-12:08:48
    Serversv11 ;project ;0.0.1 ;2012.12.03-14:08:37
    Serversv12 ;admin ;0.0.6 ;2012.12.03-10:23:31
    Serversv12 ;admin ;0.0.6 ;2012.12.03-11:47:35
    Serversv11 ;project ;0.0.1 ;2012.12.03-12:08:31
    Serversv12 ;admin ;0.0.7 ;2012.12.03-14:07:39
    Serversv12 ;basis ;0.0.1 ;2012.12.03-12:06:45
    Serversv12 ;config ;0.0.1 ;2012.12.03-12:04:57
    Serversv12 ;admin ;0.0.6 ;2012.12.03-13:58:58

    and want to sort it as follows
    [Server, Modul, Version, timestamp]
    Serversv10 ;admin ;1.0.2 ;2012.12.03-14:07:39
    Serversv10 ;admin ;0.1.2 ;2012.12.03-14:07:39
    Serversv10 ;admin ;0.0.12 ;2012.12.03-10:23:31
    Serversv10 ;admin ;0.0.8 ;2012.12.04-09:34:39
    Serversv10 ;admin ;0.0.2 ;2012.12.03-14:07:39
    Serversv10 ;project ;1.0.1 ;2012.12.03-14:08:37
    Serversv10 ;project ;0.0.1 ;2012.12.03-14:08:37
    the rest with the same order
    Can you please help?

  • nails December 5, 2012, 11:28 am

    R.S.

    Sorry, but I do not see a consistent sort order for your output. Please clarify exactly what you want.

    Anyway, the Unix/Linux sort command might do what you want. Assuming the field seperator is a semi-colon, this command sorts by the first field, then the second, and then the fourth (the time stamp)

    sort -t “;” -k 1,1 -k 2,2 -k 4,4 file.txt

  • Amit December 6, 2012, 6:23 am

    I am having files in below format

    2012-04-30 00:00:05,266 1335692570491, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 12486685 / 2012-04-30 00:00:05.196
    2012-04-30 00:00:05,313 1335692570492, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 12486685 / 2012-04-30 00:00:05.260
    2012-04-30 00:00:12,740 1335692570493, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 83022172 / 2012-04-30 00:00:12.687
    2012-04-30 00:00:12,868 1335692570494, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 83022172 / 2012-04-30 00:00:12.822
    2012-04-30 23:59:15,450 1335692590664, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1437954504 / 2012-04-30 23:59:15.404
    2012-04-30 23:59:15,645 1335692590665, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 200178220 / 2012-04-30 23:59:15.600
    2012-04-30 23:59:17,177 1335692590666, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1437954504 / 2012-04-30 23:59:17.128
    2012-04-30 23:59:18,574 1335692590667, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 200178220 / 2012-04-30 23:59:18.513
    2012-04-30 23:59:21,322 1335692590668, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 200178220 / 2012-04-30 23:59:21.274
    2012-04-30 23:59:34,467 1335692590669, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 228289416 / 2012-04-30 23:59:34.410
    2012-04-30 23:59:34,493 1335692590670, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 228289416 / 2012-04-30 23:59:34.434
    2012-04-30 23:59:40,094 1335692590671, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1195793760 / 2012-04-30 23:59:40.054
    2012-04-30 23:59:40,168 1335692590672, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1195793760 / 2012-04-30 23:59:40.127
    2012-04-30 23:59:40,560 1335692590673, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CHECK_REQ / 1195793760 / 2012-04-30 23:59:40.523
    2012-04-30 23:59:41,289 1335692590674, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1195793760 / 2012-04-30 23:59:41.011
    2012-04-30 23:59:46,895 1335692590675, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1174256481 / 2012-04-30 23:59:46.827
    2012-04-30 23:59:46,914 1335692590676, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1174256481 / 2012-04-30 23:59:46.841
    2012-04-30 23:59:56,228 1335692590677, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_LIST_REQ / 298656228 / 2012-04-30 23:59:56.167
    2012-04-30 23:59:58,399 1335692590678, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_LIST_REQ / 1181525888 / 2012-04-30 23:59:58.318
    2012-04-30 23:59:58,499 1335692590679, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 298656228 / 2012-04-30 23:59:58.436
    2012-04-30 23:59:58,661 1335692590681, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 298656228 / 2012-04-30 23:59:58.605
    2012-04-30 23:59:58,663 1335692590680, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 298656228 / 2012-04-30 23:59:58.600
    2012-04-30 23:59:58,706 1335692590682, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 298656228 / 2012-04-30 23:59:58.644
    2012-04-30 23:59:58,971 1335692590683, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CHECK_REQ / 298656228 / 2012-04-30 23:59:58.902
    2012-04-30 23:59:59,058 1335692590684, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_LIST_REQ / 1220865349 / 2012-04-30 23:59:59.010
    2012-04-30 23:59:59,794 1335692590685, Request received from client :: Transaction / PRS_ID / Timestamp :: Amit_CARD_DATA_REQ / 1220865349 / 2012-04-30 23:59:59.741

    now i want the output in below format

    2012-04-30 | 2012-04-30 23:59:59 | 1335692590685 | Amit_CARD_DATA_REQ | 1220865349

    1st field is extracted from the timestamp filed 2012-04-30 23:59:59


    In the above the adjacent row will be present as a single row….While copying it in the website its getting split into 2 rows

    Please help me on this

    thanks,
    Amit

  • Badri December 9, 2012, 2:31 am

    Nice and simple article.
    example 5 indicates print, angular brackets and ; are optional.. is that correct?

  • Kokanee January 7, 2013, 12:52 pm

    Best AWK introduction I’ve come across. Thanks!!

    >K<

  • priya February 17, 2013, 10:30 am

    i want awk program to display employee details

  • Linda March 18, 2013, 9:37 pm

    Hi All,

    I have a question on how to read n+ line in a file and substitute the data in a template file.Below is the example

    **NAME**

    ANNE 80

    **Template.txt**

    $NAME’s mark is $MARK.

    I managed to read the data, substitute into the template and append to RESULT as below.

    NAME=`awk ‘{print$1}’ NAME` ; MARK=`awk ‘{print$2}’ NAME`

    sed -e “s|NAME|$NAME|” -e “s|MARK|$MARK|” Template.txt >> RESULT

    The RESULT will be

    ANNE’s mark is 80.

    But when the are few rows of data, i dont know how to read the second line and the rest. I have 300 rows of data.

    **NAME**

    ANNE 80

    SHAWN 30

    NINA 50

    Can anyone help me since im newbiew in scripting.

    Thanks in advance :)

  • Eeshani April 18, 2013, 11:26 pm

    Hi all,

    I have a doubt
    I want to use a variable in the matching of regular expression field
    i.e i am reading a string from the user and I want to match it in awk against the whole file

    read var;
    cat int_out.txt | awk /var instead of reg exp/ { print $1};

    Can any1 help me regarding this

    thanks

  • Nails Carmody April 19, 2013, 11:30 am

    There are a number of ways of embedding a shell variable in an awk script. This link describes them:

    http://tek-tips.com/faqs.cfm?fid=1281

    Post a more descriptive example if you require more help. Thanks.

    Nails.

  • arun k May 20, 2013, 4:16 pm

    way of explanation is really nice …. thanks a lot

  • sisila May 21, 2013, 12:03 am

    thank you!

  • Soumya June 7, 2013, 11:12 pm

    Worth reading and understandable, thanks a lot

  • mantu July 17, 2013, 5:55 am

    i want to add zero in between character . how i ll do dat ?

  • Aravind July 22, 2013, 6:51 am

    How do I read a ( ` ) separated file using the awk command?? Can I read each column by column of the (`) delimited file??

  • Nails Carmody July 23, 2013, 1:49 pm

    Aravind:

    Does your data file look like this (back tic delimited):

    one`two`three
    four`five`six

    awk has an internal Field Seperator variable which allows to change the field delimiter:

    #!/bin/bash

    awk ‘ BEGIN { FS=”`” }
    {
    print $2 # print the second field
    } ‘ datafile.txt

  • Anshul July 24, 2013, 8:52 am

    Hi,

    below are the 2 lines from my log file:-
    2013-05-14 12:40:06,524 (http-172.24.90.20-5555-17) [ LoginLogger.java:55 :INFO ] [Login ID:superadmin] [User ID:SU001] [Network ID:NG] [User Name:superadmin] [User Type:OPERATOR] [Category Code:SUADM] [Log Type:LOGIN] [Login Time:14/05/13 12:40:06] [Logout Time: ] [IP Address:172.16.100.11] [Browser Type:Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C)] [Other Information:Successfully Login]

    2013-05-14 13:04:43,797 (ContainerBackgroundProcessor[StandardEngine[Catalina]]) [ LoginLogger.java:55 :INFO ] [Login ID:netadm] [User ID:PT130306.1715.012115] [Network ID:NG] [User Name:Network Admin] [User Type:OPERATOR] [Category Code:NWADM] [Log Type:LOGOUT] [Login Time:14/05/13 11:26:03] [Logout Time:14/05/13 13:04:43] [IP Address:172.24.180.35] [Browser Type:Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3)] [Other Information:Logged Out successfuly]\

    All lines are not same in my log file, Now my requirement is that i want only 5 fields from each line, and they are not on same place in all lines.

    I want following fields in csv file Login ID,User Name,Log Type, Login Time, IP Address.

    Thanks for your help.

  • Nails Carmody July 25, 2013, 12:30 am

    Anshul:

    Since the fields are not in order, I use a for loop to look at each field and match each of the 5 feilds you are interested in and then print out each of the 5 variables for each line processed:

    awk ‘ BEGIN { FS=”[” ; myarr[1]=0}
    {
    loginid=””
    username=””
    logtype=””
    logintime=””
    ipaddress=””
    for(i=1; i<=NF; i++)
    {
    if($i ~ /Login ID/)
    {
    sub("Login ID:","",$i) # delete the field identifier
    sub("]","",$i) # delete the ]
    sub(/[ \t]+$/, "", $i) # delete trailing whitespace
    loginid=$i
    }

    if($i ~ /User Name/)
    {
    sub("User Name:","",$i) # delete the field identifier
    sub("]","",$i) # delete the ]
    sub(/[ \t]+$/, "", $i) # delete trailing whitespace
    username=$i
    }

    if($i ~ /Log Type/)
    {
    sub("Log Type:","",$i) # delete the field identifier
    sub("]","",$i) # delete the ]
    sub(/[ \t]+$/, "", $i) # delete trailing whitespace
    logtype=$i
    }

    if($i ~ /Login Time/)
    {
    sub("Login Time:","",$i)
    sub("]","",$i) # delete the ]
    sub(/[ \t]+$/, "", $i) # delete trailing whitespace
    logintime=$i
    }

    if($i ~ /IP Address/)
    {
    sub("IP Address:","",$i) # delete the field identifier
    sub("]","",$i) # delete the ]
    sub(/[ \t]+$/, "", $i) # delete trailing whitespace
    ipaddress=$i
    }
    }

    printf("%s,%s,%s,%s,%s\n", loginid, username, logtype, logintime, ipaddress)

    } ' logfile.txt

  • Anshul July 25, 2013, 3:33 am

    Nails:-

    Thanks a lot for your wonderful scripts, it worked in one go. I am very oblidged for your kind help.

  • Nails Carmody July 25, 2013, 4:23 pm

    Anshul:

    Thank you for the kind words. Much appreciated!

    BTW, let me point out that by mistake I left an unused array declaration in the script:

    myarr[1]=0

    That can be deleted.

  • maria August 12, 2013, 11:28 pm

    I have a log file and would like to count the number of the same value of first column,
    for example count = 4 for -8.7, count = 4 for -7.8 … I am able to grep -v manually ; however, I would like to have the output as

    cnt value …

    4 -8.7
    4 -7.8
    1 -6.9
    1 -2.5
    2 -1.4

    =========

    input file

    -8.7 check
    -8.7 check
    -8.7 check
    -8.7 check
    -7.8 check
    -7.8 check
    -7.8 check
    -7.8 check
    -6.9 check
    -2.5 check
    -1.4 check
    -1.4 check

  • maria August 12, 2013, 11:53 pm

    I would like to have the two line table from

    BEN $5,000
    BEN $6,000
    BEN $7,000
    BEN $8,000
    BEN $6,500
    BEN $7,500
    BEN $9,000
    YONG $4,000
    YONG $5,500
    YONG $6,000
    YONG $5,800
    YONG $7,000
    YONG $8,000
    YONG $8,5000

    to

    BEN 5,000 6,000 7,000 8,000 6,500 7,500 9,000
    YONG 4,000 5,500 6,000 5,800 7,000 8,000 8,5000

    Thanks as I am learning awk now

  • maria August 13, 2013, 1:37 am

    I am able to have the counting script

    BEGIN { print “cnt “,”value” }
    {
    Slack[$1]++;
    }
    END {
    for (var in Slack) {
    print Slack[var],” “, var
    }
    }

    and the output file looks like

    cnt value
    1
    2 1.0
    3 7.8
    4 6.9
    1 1.1
    4 0.2
    2 0.4
    4 6.0
    2 -0.0
    1 -6.9
    4 -7.8
    4 -8.7
    2 7.3
    4 6.4
    2 -0.5
    2 -1.4
    10 -0.7
    1 -2.5
    ==========

    my questions:
    1) why I have the first line of “1” and no value? how to remove it?
    2) how to do the print-out with the sorted column, for example, sort 1st column, sort 2nd column

    Thx

  • nails August 16, 2013, 3:13 pm

    First, you probably are getting the first line printed “1” because you have a line in the data file that is whitespace (space character, tab, or newline) . Get rid of it by including something like this in the first line of your awk script:

    if(/^[ /t]*$/)
    next

    Second, the newer versions of GNU awk (such as gawk) has an internal sort routine called asort. Check it out at the GNU awk User’s Guide:

    http://www.gnu.org/software/gawk/manual/gawk.html

  • Oche J Ejembi August 30, 2013, 1:17 pm

    Thanks for this. BEST AWK tutorial on the World Wide Web.

    It seemed like an assembly language to me before I read this and after reading other tutorials, but yours has explained it in simple language and with easy examples to follow. Thanks!

  • Deiveegaraja Andaver September 19, 2013, 4:43 am

    Thanks for the Great article..
    Here after I will tell to my friends, I know AWK utility.. Thanks again for the great article..

  • Raji October 2, 2013, 8:27 am

    its very useful for our assignments thanks…………:-)

  • akhel October 10, 2013, 4:36 am

    thank you its very helpful me

  • Sujay October 26, 2013, 12:54 pm

    Hi all,
    I was doing some string manipulation in my script and wanted to try using awk. However, I have been stuck with string compare. A simplified form of my conundrum is given below:

    This works:
    $ echo ‘”status”:”Completed”‘ | awk ‘BEGIN{FS=”:”} /”status”/ {print $2}’
    “Completed” <– prints the correct word

    This doesn't:
    $ echo '"status":"Completed"' | awk 'BEGIN{FS=":"} /"status"/ {print ($2 == "Completed") ? "Yes" : "No"}'
    No <– Shouldn't this output be Yes given $2 is "Completed"?

    In the second one above, I'd have expected to see the output as "Yes" (since the second element is "Completed"), but it's been obstinately printing "No".

    What am I doing wrong?

    Btw, what I really need to do is set a variable if the string compare is successful. The print-test was just to see if it's work at all, and looks like I need to rethink.

    Any help is highly appreciated.

  • Ravi November 20, 2013, 3:49 pm

    this is my script

    #!/bin/bash
    KEEP=30 # days
    SOURCE=/apps/tibco/domain/xxxxxxxx/xxxxxxxxx
    DST=/apps/tibco/domain/xxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxx

    for item in `find $SOURCE -type f -mtime +$KEEP | sed ‘s#.*/##”; do
    DDATE=`stat –format=”%y” “$SOURCE/$item” | awk ‘{print $1}’
    if [ -z “$DDATE” ]; then
    continue
    fi

    # Make directory if needed
    if [ -d “$DST/$DDATE” ]; then
    mkdir “$DST/$DDATE”
    fi

    mv “$SOURCE/$item” “$DST/$DDATE/$item”
    gzip “$DST/$DDATE/$item”
    done
    =========================================
    Errors
    ./archive.sh: line 9: syntax error near unexpected token `|’
    ./archive.sh: line 9: ` DDATE=`stat –format=”%y” “$SOURCE/$item” | awk ‘{print $1}’`’

    I have only one ` at the end of the line.

  • nails November 21, 2013, 5:21 pm

    I see several problems with your script:

    First, I see that your command subsitution isn’t correct; I choose to use the bash method of command substituion: $(….) instead of `…..`

    for item in $(find for item in $(find $SOURCE -type f -mtime +$KEEP | sed ‘s#.*/##’)
    do

    Second, I thing you are mixing the quotation marks around the sed command.

    Also, are you sure that your syntax for the stat command is correct? I thought the format keyword is used with the -c option.

  • shilpa December 31, 2013, 6:36 am

    Write an AWK command to list all the files except the one which is having name as “sample”

    Write an AWK command to concatenate owner and group name of all the files in a directory.

  • Philip Warner January 3, 2014, 5:20 am

    Try these programs then type nawk -f circle or gawk -f circle).
    Create file circle:
    BEGIN { # circle nawk -f circle
    do { loopCount++
    system( “nawk -f s.circle” )
    printf(“Another one? “)
    getline
    if ( $1 ~ /[Yy]/ ) { continue }
    exit
    } while ( loopCount )
    }
    END { # Display loop count
    printf(“Ran circle %d times.\n”, loopCount)
    }
    Create file s.circle:
    BEGIN { # s.circle
    do { printf(“Type the radius ? “)
    getline
    r = $1 + 0
    if ( r > 0 ) { exit }
    printf(“Radius must be positive\n”)
    } while ( 1 )
    }
    #
    # Main
    #
    END { #
    tau = 710 / 113
    circumference = c = tau * r
    printf(“The circumference is %f and the area is %f\n”, c, c * r / 2 )
    }

  • Aashish Joshi January 7, 2014, 2:40 am

    Hi,

    Good to learn.

  • Aashish Joshi January 7, 2014, 2:45 am

    Can you please provide the same understaindings for sed also

  • kk Menon January 8, 2014, 10:09 pm

    i need to concatenate 1st 4 characters of previous line to the next few lines.
    How should I achieve this?

    Thanks

  • Philip Warner January 10, 2014, 4:03 am

    Item 66, the close round bracket is in the wrong place, it should be:
    nawk ‘/status/ {print ($2 == “Completed” ? “Yes” : “No” ) }’
    and this works (produces “Yes”).

  • cgs January 17, 2014, 10:47 am

    how do i convert the following

    mira 23 43 56 abcd
    mira 32 55 abcd xyz

    raju 12 4a 5/7 4:4 abcd
    raju 23 4r 3:4 ab
    ——————————————————–
    to

    Mira 23 43 56 abcd 32 55 abcd xyz

    Raju 12 4a 5/7 4:4 abcd 23 4r 3:4 ab

    ————————————————————-

    awk and print $1 can capture columns, but what about lines with same “name” in first columns to be grouped with all values in columns.

  • nails January 19, 2014, 2:06 am

    How about something like this? It’s a break point report that wipes out the first field until the first field changes:

    #!/bin/bash

    awk ‘ BEGIN { prevrec=”” ; fr=0 }
    {
    if(NF == 0)
    next # skip blank lines

    if(prevrec == “” || prevrec != $1)
    {
    prevrec=$1
    if(fr == 0)
    {
    printf “\n%s”, $0
    fr==0
    }
    else
    {
    $1=”” # wipe out the first field
    printf ” %s”, $0
    fr=1
    }
    }
    else
    {
    $1=””
    printf “%s”, $0
    }

    } END { printf “\n” } ‘ mytxt.txt

  • Nikhil February 18, 2014, 3:46 am

    Thanks for the post, it helped me to give a start.
    Can you provide me some of the basic questions, i can give a try..

    Thanks in Advance,
    Nikhil

  • kalyan chakravarthi G February 25, 2014, 1:37 am

    Thanks for your info….I grew up in unix by reading your postss.Nice explanation …..Good work

  • Chenna March 6, 2014, 1:29 am

    Hi,
    This output :
    c1t50060E80166CED71d334s2
    c1t50060E80166CED71d335s2
    c1t50060E80166CED71d336s2

    I want like this. i just want to remove last “s2″ from every line.
    Like this:
    c1t50060E80166CED71d334
    c1t50060E80166CED71d335
    c1t50060E80166CED71d336

  • nails March 6, 2014, 3:00 pm

    #Using sed:

    sed ‘s/s2$//’ test.txt

    # this works on my Solaris 9 box using nawk:
    awk ‘ { sub(“s2$”, “”); print } ‘ test.txt

  • Chenna March 7, 2014, 11:34 pm

    thanks a lot. its working fine

  • nishi April 16, 2014, 5:15 am

    Hi

    I have following file content and array of ids :

    20140320 00:08:23.846 INFO [WebContainer : 84] – anything in line
    20140320 00:08:23.846 Test [translate : 55] – Virtual and lab lab anything
    20140320 00:08:23.846 Data [anything : 60] – anything in line
    20140320 00:08:23.847 anyting [anything : 5] – anything in line
    20140320 00:08:23.846 INFO [WebContainer : 84] – anything in line
    20140320 00:08:23.846 Test [translate : 55] – Virtual and lab lab anything
    20140320 00:08:23.846 Data [anything : 60] – anything in line
    20140320 00:08:23.847 anyting [anything : 5] – anything in line

    Suppose I have array elements 84 and 55 then only lines which has iside the brackets :

    20140320 00:08:23.846 INFO [WebContainer : 84] – anything in line
    20140320 00:08:23.846 Test [translate : 55] – Virtual and lab lab anything
    20140320 00:08:23.846 INFO [WebContainer : 84] – anything in line
    20140320 00:08:23.846 Test [translate : 55] – Virtual and lab lab anything

    not sure how many elements are in array, but based on all the elements, lines should direct to output file

    Please help me on this

  • nails April 16, 2014, 10:48 am

    One way is to read each line and use cut to save everything between the brackets. Then use the set command to perform the parsing. The last field can be determined with the eval command, and finally perform the comparison:

    #!/bin/bash

    while read line
    do
    orgline=”$line”
    tt=$(echo $line|cut -d'[‘ -f2 | cut -d’]’ -f1)
    set – $( echo “$tt”)
    fvalue=$(eval echo “\$$#”)
    if [[ $fvalue -eq 55 || $fvalue = 84 ]]
    then
    echo “$orgline”
    fi

    done < datafile.txt

  • Nishi April 17, 2014, 11:02 pm

    Thanks nails,

    I wanted to implement using awk, As there are more than 10000 of lines and this for look eating lot of times.

    here no of threadid is also not fixed, I just gave two entry only it may be more.

    Please suggest me

  • nails April 18, 2014, 10:36 am

    First, depending on the progamming implementation, don’t expext awk to be faster. As awk is an external command, an extra process is spawned.

    I don’t understand:
    “no of threadid is also not fixed”

    Please post a better example of your data.

    nails

  • nishi April 22, 2014, 10:03 pm

    thanks Nails,

    The ThreadID is comming from other file as in array such as (55, 84) with two array elements. So here no of threadid, I mean no of elements are not fixed.

  • krishna April 25, 2014, 1:00 am

    hi ,
    I have two files as below
    file 1
    machine1 shutdown time 21:30:00
    machine2 shutdown time 21:32:07
    file 2:
    machine1 started time 22:31:05
    machine2 started time 21:30:25

    now , i need a program, if shutdown time is less than started then print output as
    “machine1/machine2 running as expected”
    if shutdown time is greater than started time then print output as
    “machine1/machine2 improper restart ”

    thanks for help !

  • Tanujib Patra May 8, 2014, 11:28 pm

    Hi Help,
    I have an INPUT file shown below, I want to convert into the following output file as stated under the OUTPUT FILE.
    Can anybody help me in awk scripts.

    INPUT FILE
    640 0.0 1480 1313.77 1500.09 1694.76 1536.66 1933.32 1583.61
    2419.99 1699.16 2780.7 1872.48 3156.68 2020.53 4295.82 2596.38
    5996.92 3417.41
    960 0.0 1480 1500 1496.46 1758.46 1538.27 2061.54 1568.46
    2381.54 1635.82 2741.54 1772.87 3283.08 2030.7 4295.82 2596.38
    5998.46 3403.47
    ………………………………………………………………………………………………………………
    …………………………………..
    OUTPUT FILE:
    640 0.0 1480
    640 1313.77 1500.09
    640 1694.76 1536.66
    640 1933.32 1583.61
    640 2419.99 1699.16
    640 2780.7 1872.48
    640 3156.68 2020.53
    640 4295.82 2596.38
    640 5996.92 3417.41
    960 0.0 1480
    960 1500 1496.46
    960 1758.46 1538.27
    960 2061.54 1568.46
    960 2381.54 1635.82
    960 2741.54 1772.87
    960 3283.08 2030.7
    960 4295.82 2596.38
    960 5998.46 3403.47
    ………………………………………
    ………………………..

    Thanks in Advance
    Best Regards
    Tanujib

  • fanyi June 13, 2014, 11:25 pm

    How to (swap) column four to column two:
    100 Thomas Manager Sales $5,000
    200 Jason Developer Technology $5,500
    300 Sanjay Sysadmin Technology $7,000
    400 Nisha Manager Marketing $9,500
    500 Randy DBA Technology $6,000

  • Lela June 19, 2014, 1:33 pm

    This is excellent. Thank you,.

  • Harsh June 21, 2014, 1:04 am

    Amit, I know a little late but here is the command u were looking for:
    awk ‘{print $1″ | “$19″ “substr($20,1,8) ” | “substr($3,1,length($3)-1) ” | ” $15″ |”$17}’ file1.txt

  • More August 1, 2014, 6:03 am

    Hi Friends,
    I did MCA 3 years back (due some problems i didnt do course)
    can i get job in linux
    i learning linux now
    if any one guide me pls

  • Sudarshan August 19, 2014, 2:16 am

    Any idea on how to parse |+ as delimiter together. I have a file which looks like below

    123|+MN|+2014
    456|+DE|+2015

    It is |+ separated file. But not able to process |+ as delimiter via awk…I want to parse and print third column i.e. which contains data points 2014 and 2015

    any help would be really appreciated.

  • Stephan November 10, 2014, 5:05 am

    Great article, thank you!

    Recently used AWK to get creation date of user by running:
    sudo passwd -S sys | tail -1 | awk ‘{print $3}

    Had to go through 50 servers one by one to get this info. I know with WMI in Windows I can create batch script to get system info.
    Is it possible to run this script in Linux across more than one server with script or some kind of management tool?

  • Sujith November 23, 2014, 1:21 am

    Great article. Many Thanks :)

  • Vignesh November 24, 2014, 4:18 am

    Hi Tanujib,

    It is pretty late,but below command will give you the result

    awk ‘{ for (i=2;i<=NF;i=i+2) {print $1,$i,$(i+1)}}'

  • Vignesh November 24, 2014, 4:31 am

    Hi Sudarshan,

    Since + is a special character it must be escaped,below will give you the desired result.

    awk -F’|\\+’ ‘{print $3}’ <>

  • Vignesh November 24, 2014, 4:35 am

    Hi fanyi,

    Just swap while printing

    awk ‘{print $1,$4,$3,$2,$5} <>

  • ramana December 12, 2014, 10:59 am

    Hi ,

    i have employee table like below.

    100 Thomas Manager Sales $5,000
    200 Jason Developer Technology $5,500
    300 Sanjay Sysadmin Technology $7,000
    400 Nisha Manager Marketing $9,500
    500 Randy DBA Technology $6,000

    i need data where $1=300 and $3=Sales (that is employee number 100 and dept is sales)

  • Abhishek December 23, 2014, 2:19 am

    i want to compare two .csv files columnwise in unix using shell scripting

    file1
    METASRCID BMVTRID MersionID COUNTRY Curr
    MET_CCD V14121011081 RECENT US USD
    MET_CCD V141210110810 RECENT US USD
    MET_CCD V141210110811 RECENT GB GBP
    MET_CCD V141210110812 RECENT IE GBP
    MET_CCD V141210110813 RECENT GB GBP
    MET_CCD V141210110814 RECENT AU AUD
    MET_CCD V141210110815 RECENT HK HKD
    MET_CCD V141210110816 RECENT SG SGD

    file2
    METASRCID BMVTRID MersionID COUNTRY Curr
    MET_CCD V14121011081 RECENT US USD
    MET_CCD V141210110810 RECENT US USD
    MET_CCD V141210110811 RECENT GB GBP
    MET_CCD V141210110812 RECENT IE GBP
    MET_CCD V141210110813 RECENT US GBP
    MET_CCD V141210110814 RECENT AU AUD
    MET_CCD V141210110818 RECENT HK HKD
    MET_CCD V141210110816 RECENT SG SGD

    output
    msg: Files are not same
    and changes in particular cell should get highlighted

    MET_CCD V141210110813 RECENT “US” GBP
    MET_CCD “V141210110818″ RECENT HK HKD

  • Vignesh January 15, 2015, 10:50 pm

    Hi Ramana,

    Below would work for you.

    awk ‘($1==100 && $4==”Sales”)’ <>

  • Vignesh January 15, 2015, 11:51 pm

    Something like this would help
    grep -Fxvf file1 file2

  • surendra January 17, 2015, 11:38 am

    I have two csv files
    error.csv
    ———————————————–
    “Legacy ID”, “Message”
    12222,”Pass”
    12345,”Fail”

    success.csv
    ———————————
    “ID”,”Legacy ID”
    1,12345
    2,45678

    I want to merge these two files and create one file as below
    “ID”,”Legacy ID”,”Message”
    ”,12222,”Pass”
    ”,12345,”Fail”
    1,12345,”
    2,45678,”

    I am new to linux, can you please help

  • Vignesh January 18, 2015, 10:42 pm

    Hi Surendra,

    Below would work

    awk ‘NR==FNR{print “\”””,”$0;next} {print $0″,””\””}’ error.csv success.csv

    First print statement would print for first file and next print statement would print for the other file

  • lily January 20, 2015, 11:00 am

    how can I remove the duplicates
    for example
    col1 col2 col3
    a x 10.3
    a y 15.7
    a x 14.3
    b x 10.3
    b y 15.7
    b x 14.3
    I would like to have as
    a x 10.3
    a y 15.7
    b x 10.3
    b y 15.7
    meaning i want to have unique entries to col1 and col2.

  • Vignesh January 20, 2015, 10:31 pm

    Hi lily,

    Below would work
    awk ‘{k=$1$2;if(!a[k])a[k]=$0} END { for(k in a) {print a[k]} }’ <>

  • Rakesh April 13, 2015, 12:59 am

    Nice article for someone new to awk

  • Arun M June 3, 2015, 9:18 pm

    INPUT:
    ========
    ilapp_cl1:stg_00 ilapp_ex1:stg_00_r 0:10:34
    ilapp_cl1:test_00 ilapp_ex1:test_00_r 0:10:34
    netezza_cl1:prod_00b netezza_ex1:prod_00b_r 0:20:33
    oracore11g_cl1:ORPP2 oracore11g_ex1:ORPP2_r 9:5:37
    oracore11g_cl1:ORPP2_arch oracore11g_ex1:ORPP2_arch_r 9:5:37
    oracore11g_cl1:ORPP2_redo oracore11g_ex1:ORPP2_redo_r 9:5:37
    vmcore_cl1:clc2_ds_1000 vmcore_ex1:clc2_dsr_1000 12:15:33
    vmcore_cl1:clc2_ds_1001 vmcore_ex1:clc2_dsr_1001 12:15:34
    vmcore_cl1:clc2_ds_1002 vmcore_ex1:clc2_dsr_1002 12:15:34

    OUTPUT:
    ===========

    Want to list the output of 3rd column first field is greater than 10 as I pasted below

    vmcore_cl1:clc2_ds_1000 vmcore_ex1:clc2_dsr_1000 12:15:33
    vmcore_cl1:clc2_ds_1001 vmcore_ex1:clc2_dsr_1001 12:15:34
    vmcore_cl1:clc2_ds_1002 vmcore_ex1:clc2_dsr_1002 12:15:34
    vmcore_cl1:clc2_ds_1003 vmcore_ex1:clc2_dsr_1003 12:15:34

Leave a Comment