AWK Arrays Explained with 5 Practical Examples

by Sasikala on March 10, 2010

Awk programming language supports arrays. As part of our on-going awk examples series, we have seen awk user defined variables and awk built-in variables. Arrays are an extension of variables. Arrays are variable that hold more than one value. Similar to variables, arrays also has names. In some programming languages, arrays has to be declared, so that memory will be allocated for the arrays. Also, array indexes are typically integer, like array[1],array[2] etc.,

Awk Associative Array

Awk supports only associative array. Associative arrays are like traditional arrays except they uses strings as their indexes rather than numbers. When using an associative array, you can mimic traditional array by using numeric string as index.

Syntax:

arrayname[string]=value

In the above awk syntax:

  • arrayname is the name of the array.
  • string is the index of an array.
  • value is any value assigning to the element of the array.

Accessing elements of the AWK array

If you want to access a particular element in an array, you can access through its index — arrayname[index], which gives you the value assigned in that index.

If you want to access all the array elements, you can use a loop to go through all the indexes of an array as shown below.

Syntax:

for (var in arrayname)
actions

In the above awk syntax:

  • var is any variable name
  • in is a keyword
  • arrayname is the name of the array.
  • actions are list of statements to be performed. If you want to perform more than one action, it has to be enclosed within braces.

This loop executes list of actions for each different value which was used as an index in array with the variable var set to that index.

Removing an element from the AWK array

If you want to remove an element in a particular index of an array, use awk delete statement. Once you deleted an element from an awk array, you can no longer obtain that value.

Syntax:

delete arrayname[index];

The loop command below removes all elements from an array. There is no single statement to remove all the elements from an array. You have to go through the loop and delete each array element using awk delete statement.

for (var in array)
     delete array[var]

5 Practical Awk Array Examples

All the examples given below uses the Iplogs.txt file shown below. This sample text file contains list of ip address requested by the gateway server. This sample Iplogs.txt file contains data in the following format:

[date] [time] [ip-address] [number-of-websites-accessed]
$ cat Iplogs.txt
180607 093423	123.12.23.122 133
180607 121234	125.25.45.221 153
190607 084849   202.178.23.4 44
190607 084859   164.78.22.64 12
200607 012312	202.188.3.2 13
210607 084849   202.178.23.4 34
210607 121435	202.178.23.4 32
210607 132423	202.188.3.2 167

Example 1. List all unique IP addresses and number of times it was requested

$ awk '{
> Ip[$3]++;
> }
> END{
> for (var in Ip)
> print var, "access", Ip[var]," times"
> }
> ' Iplogs.txt
125.25.45.221 access 1  times
123.12.23.122 access 1  times
164.78.22.64 access 1  times
202.188.3.2 access 2  times
202.178.23.4 access 3  times

In the above script:

  • Third field ($3) is an ip address. This is used as an index of an array called Ip.
  • For each line, it increments the value of the corresponding ip address index.
  • Finally in the END section, all the index will be the list of unique IP address and its corresponding values are the occurrence count.

Example 2. List all the IP address and calculate how many sites it accessed

The last field in the Iplogs.txt is the number of sites each IP address accessed on a particular date and time. The below script generates the report which has list of IP address and how many times it requested gateway and total number of sites it accessed.

$cat ex2.awk
BEGIN {
print "IP Address\tAccess Count\tNumber of sites";
}
{
Ip[$3]++;
count[$3]+=$NF;
}
END{
for (var in Ip)
	print var,"\t",Ip[var],"\t\t",count[var];
}

$ awk -f ex2.awk Iplogs.txt
IP Address	Access Count	Number of sites
125.25.45.221 	 1 		 153
123.12.23.122 	 1 		 133
164.78.22.64 	 1 		 12
202.188.3.2 	 2 		 180
202.178.23.4 	 3 		 110

In the above example:

  • It has two arrays. The index for both the arrays are same — which is the IP address (third field).
  • The first array named “Ip” has list of unique IP address and its occurrence count. The second array called “count” has the IP address as an index and its value will be the last field (number of sites), so whenever the IP address comes it just keeps on adding the last field.
  • In the END section, it goes through all the IP address and prints the Ip address and access count from the array called Ip and number of sites from the array count.

Example 3. Identify maximum access day

$ cat ex3.awk
{
date[$1]++;
}
END{
for (count in date)
{
	if ( max < date[count] ) {
		max = date[count];
		maxdate = count;
	}

}
print "Maximum access is on", maxdate;
}

$ awk -f ex3.awk Iplogs.txt
Maximum access is on 210607

In this example:

  • array named “date” has date as an index and occurrence count as the value of the array.
  • max is a variable which has the count value and used to find out the date which has max count.
  • maxdate is a variable which has the date for which the count is maximum.

Example 4. Reverse the order of lines in a file

$ awk '{ a[i++] = $0 } END { for (j=i-1; j>=0;) print a[j--] }' Iplogs.txt
210607 132423	202.188.3.2 167
210607 121435	202.178.23.4 32
210607 084849   202.178.23.4 34
200607 012312	202.188.3.2 13
190607 084859   164.78.22.64 12
190607 084849   202.178.23.4 44
180607 121234	125.25.45.221 153
180607 093423	123.12.23.122 133

In this example,

  • It starts by recording all the lines in the array ‘a’.
  • When the program has finished processing all lines, Awk executes the END { } block.
  • The END block loops over the elements in the array ‘a’ and prints the recorded lines in reverse manner.

Example 5. Remove duplicate and nonconsecutive lines using awk

$ cat > temp
foo
bar
foo
baz
bar

$ awk '!($0 in array) { array[$0]; print }' temp
foo
bar
baz

In this example:

  • Awk reads every line from the file “temp”, and using “in” operator it checks if the current line exist in the array “a”.
  • If it does not exist, it stores and prints the current line.

Recommended Reading

Sed and Awk 101 Hacks, by Ramesh Natarajan. I spend several hours a day on UNIX / Linux environment dealing with text files (data, config, and log files). I use Sed and Awk for all my my text manipulation work. Based on my Sed and Awk experience, I’ve written Sed and Awk 101 Hacks eBook that contains 101 practical examples on various advanced features of Sed and Awk that will enhance your UNIX / Linux life. Even if you’ve been using Sed and Awk for several years and have not read this book, please do yourself a favor and read this book. You’ll be amazed with the capabilities of Sed and Awk utilities.


Linux Sysadmin Course Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t know what these tools are and how to use them, you could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools, which will help you to become an effective Linux system administrator.
Get the Linux Sysadmin Course Now!

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 9 comments… read them below or add one }

1 aldoem March 10, 2010 at 2:46 am

Very gut…
Multiarray example:

#!/bin/bash
Matrix=(“one” “two” “three”)
one=(“1.1″ “1.2″ “1.3″)
two=(“2.1″ “2.2″ “2.3″)
three=(“3.1″ “3.2″ “3.3″ )

for i in ${Matrix[@]}
do
one_Item=$(eval echo “\${$i[0]}”)
two_Item=$(eval echo “\${$i[1]}”)
three_Item=$(eval echo “\${$i[2]}”)
done

2 Eric Pulvino March 10, 2010 at 8:53 am

Looking at several of these examples here I would refer to these arrays more as MAPS. The difference in my mind is that the indexes are not necessarily numerically consecutive in the awk cases…. they are more like the keys used in C++ style maps which can take any value as needed to suit the data set.

Interesting functionality though, I will be sure to use it at some point I’m sure.

3 porosec June 6, 2010 at 1:30 am

I think solve example 3 more effective is

awk ‘max < $1 { max = $1 } END { print max }' Iplogs.txt

4 jagadeeshwaran k June 11, 2012 at 7:20 am

really interesting . i have learned many things regarding arrays in awk.

5 kaique June 28, 2012 at 12:30 am

..i need more examples of an array!

6 archana August 10, 2012 at 1:25 am

suer super super!!!!!!!!

7 Aishwarya April 30, 2013 at 2:26 am

another approach for Example 4. Reverse the order of lines in a file
awk ‘BEGIN{s=”"} {s=$0″\n”s;} END{printf(“%s”,s);}’ Iplogs.txt

8 Jan June 11, 2013 at 7:36 am

Please help. I need to add ” 0|” before “QESTMD” or all the 4th segments – how do I do this with awk?

data:

0||08276101|QESTMD||10|1|257.05|787736015|7877360B|10062013|0|DISP|||
0||08276101|QESTMD||10|2|85.86|744549019|7445490B|10062013|0|DISP|||
0||08276401|PHARMD||100|1|7.49|894672004|8946720E|10062013|0|DISP|||
0||08276402|TRANNS||20|0|0|759694001|7596940B|10062013|0|DISP|||

output shoud be:

0||08276101| 0|QESTMD||10|1|257.05|787736015|7877360B|10062013|0|DISP|||
0||08276101| 0|QESTMD||10|2|85.86|744549019|7445490B|10062013|0|DISP|||
0||08276401| 0|PHARMD||100|1|7.49|894672004|8946720E|10062013|0|DISP|||
0||08276402| 0|TRANNS||20|0|0|759694001|7596940B|10062013|0|DISP|||

9 Manish July 30, 2013 at 1:19 am

a1 is the file contain the question data.
awk ‘BEGIN {FS = “|”;OFS=”|” }; {TT=”0|”$4;$4=TT;print $0}’ a1

Leave a Comment

Previous post:

Next post: