Advanced Regular Expressions in Grep Command with 10 Examples – Part II

by Sasikala on January 17, 2011

In our previous regular expression part 1 article, we reviewed basic reg-ex with practical examples.

But we can do much more with the regular expressions. You can often accomplish complex tasks with a single regular expression instead of writing several lines of codes.

When applying a regex to a string, the regex engine will start at the first character of the string. It will try all possible permutations of the regular expression at the first character. Only if all possibilities have been tried and found to fail, will the regex engine continue with the second character in the text.

The regex will try all possible permutations of the regex, in exactly the same order. The result is that the regex-directed engine will return the leftmost match.

In this article, let us review some advanced regular expression with examples.

Example 1. OR Operation (|)

Pipe character (|) in grep is used to specify that either of two whole subexpressions occur in a position. “subexpression1|subexpression2″ matches either subexpression1 or subexpression2.

The following example will remove three various kind of comment lines in a file using OR in a grep command.

First, create a sample file called “comments”.

$ cat comments
This file shows the comment character in various programming/scripting languages
### Perl / shell scripting
If the Line starts with single hash symbol,
then its a comment in Perl and shell scripting.
' VB Scripting comment
The line should start with a single quote to comment in VB scripting.
// C programming single line comment.
Double slashes in the beginning of the line for single line comment in C.

The file called “comments” has perl,VB script and C programming comment lines. Now the following grep command searches for the line which does not start with # or single quote (‘) or double front slashes (//).

$ grep  -v "^#\|^'\|^\/\/" comments
This file shows the comment character in various programming/scripting languages
If the Line starts with single hash symbol,
then its a comment in Perl and shell scripting.
The line should start with a single quote to comment in VB scripting.
Double slashes in the beginning of the line for single line comment in C.

Example 2. Character class expression

As we have seen in our previous regex article example 9, list of characters can be mentioned with in the square brackets to match only one out of several characters. Grep command supports some special character classes that denote certain common ranges. Few of them are listed here. Refer man page of grep to know various character class expressions.

[:digit:] 	Only the digits 0 to 9
[:alnum:] 	Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:] 	Any alpha character A to Z or a to z.
[:blank:] 	Space and TAB characters only.

These are always used inside square brackets in the form [[:digit:]]. Now let us grep all the process Ids of ntpd daemon process using appropriate character class expression.

$ grep -e "ntpd\[[[:digit:]]\+\]" /var/log/messages.4
Oct 28 11:42:20 gstuff1 ntpd[2241]: synchronized to LOCAL(0), stratum 10
Oct 28 11:42:20 gstuff1 ntpd[2241]: synchronized to 15.11.13.123, stratum 3
Oct 28 12:33:31 gstuff1 ntpd[2241]: synchronized to LOCAL(0), stratum 10
Oct 28 12:50:46 gstuff1 ntpd[2241]: synchronized to 15.11.13.123, stratum 3
Oct 29 07:55:29 gstuff1 ntpd[2241]: time reset -0.180737 s

Example 3. M to N occurences ({m,n})

A regular expression followed by {m,n} indicates that the preceding item is matched at least m times, but not more than n times. The values of m and n must be non-negative and smaller than 255.

The following example prints the line if its in the range of 0 to 99999.

$ cat  number
12
12345
123456
19816282

$ grep  "^[0-9]\{1,5\}$" number
12
12345

The file called “number” has the list of numbers, the above grep command matches only the number which 1 (minimum is 0) to 5 digits (maximum 99999).

Note: For basic grep command examples, read 15 Practical Grep Command Examples.

Example 4. Exact M occurence ({m})

A Regular expression followed by {m} matches exactly m occurences of the preceding expression. The following grep command will display only the number which has 5 digits.

$ grep  "^[0-9]\{5\}$" number
12345

Example 5. M or more occurences ({m,})

A Regular expression followed by {m,} matches m or more occurences of the preceding expression. The following grep command will display the number which has 5 or more digits.

$ grep "[0-9]\{5,\}" number
12345
123456
19816282

Note: Did you know that you can use bzgrep command to search for a string or a pattern (regular expression) on bzip2 compressed files.

Example 6. Word boundary (\b)

\b is to match for a word boundary. \b matches any character(s) at the beginning (\bxx) and/or end (xx\b) of a word, thus \bthe\b will find the but not thet, but \bthe will find they.

# grep -i "\bthe\b" comments
This file shows the comment character in various programming/scripting languages
If the Line starts with single hash symbol,
The line should start with a single quote to comment in VB scripting.
Double slashes in the beginning of the line for single line comment in C.

Example 7. Back references (\n)

Grouping the expressions for further use is available in grep through back-references. For ex, \([0-9]\)\1 matches two digit number in which both the digits are same number like 11,22,33 etc.,

# grep -e '^\(abc\)\1$'
abc
abcabc
abcabc

In the above grep command, it accepts the input the STDIN. when it reads the input “abc” it didnt match, The line “abcabc” matches with the given expression so it prints. If you want to use Extended regular expression its always preferred to use egrep command. grep with -e option also works like egrep, but you have to escape the special characters like paranthesis.

Note: You can also use zgrep command to to search inside a compressed gz file.

Example 8. Match the pattern “Object Oriented”

So far we have seen different tips in grep command, Now using those tips, let us match “object oriented” in various formats.

$ grep "OO\|\([oO]bject\( \|\-\)[oO]riented\)"

The above grep command matches the “OO”, “object oriented”, “Object-oriented” and etc.,

Example 9. Print the line “vowel singlecharacter samevowel”

The following grep command print all lines containing a vowel (a, e, i, o, or u) followed by a single character followed by the same vowel again. Thus, it will find eve or adam but not vera.

$ cat input
evening
adam
vera

$ grep "\([aeiou]\).\1" input
evening
adam

Example 10. Valid IP address

The following grep command matches only valid IP address.

$ cat input
15.12.141.121
255.255.255
255.255.255.255
256.125.124.124

$ egrep  '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' input
15.12.141.121
255.255.255.255

In the regular expression given above, there are different conditions. These conditioned matches should occur three times and one more class is mentioned separately.

  1. If it starts with 25, next number should be 0 to 5 (250 to 255)
  2. If it starts with 2, next number could be 0-4 followed by 0-9 (200 to 249)
  3. zero occurence of 0 or 1, 0-9, then zero occurence of any number between 0-9 (0 to 199)
  4. Then dot character

For the 1st part of this article, read Regular Expressions in Grep Command with 10 Examples – Part I


Linux Sysadmin Course Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t know what these tools are and how to use them, you could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools, which will help you to become an effective Linux system administrator.
Get the Linux Sysadmin Course Now!

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 16 comments… read them below or add one }

1 Felix Frank January 17, 2011 at 8:17 am

Nice article.

I have two minor gripes:
1. The remarks about (b)zgrep are helpful. Why are they placed so randomly?
2. The example of “object oriented” would be a nice spot to note that grep -i can make your whole expression case-insensitive (and save some gratuitous character classing at the cost of probably unnecessary accuracy).

2 Jidifi January 17, 2011 at 11:04 am

In example 10,
[01]?[0-9][0-9] is doubtful, since
00 -> 09, 000 -> 009, 010 -> 099 are not valid IP
but
0 -> 9, 10 -> 99 are valid IP.

3 dpminusa January 17, 2011 at 10:17 pm

May be an oops oops or typo in your IP example. Using test data does not work with expression.

Would this be a better expression:

RE1=’\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9
][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[
0-9][0-9]?)\b’

4 bitstream January 18, 2011 at 7:38 am

‘-w’ saves you from using ‘\b’.

5 Jidifi January 18, 2011 at 8:00 am

To find all ipv4 addresses in ‘yourfile’, I suggest (long but looks like correct):

grep -e “\([^0-9\.]\|^\)\(\([1-9][0-9]\?\|1[0-9][0-9]\|2[0-4][0-9]\|25[0-5]\)\.\)\{3\}\([1-9][0-9]\?\|1[0-9][0-9]\|2[0-4][0-9]\|25[0-5]\)\([^0-9\.]\|$\)” yourfile

Note:
grep -e “\b\(\([1-9][0-9]\?\|1[0-9][0-9]\|2[0-4][0-9]\|25[0-5]\)\.\)\{3\}\([1-9][0-9]\?\|1[0-9][0-9]\|2[0-4][0-9]\|25[0-5]\)\b” yourfile
failed in some cases : 158.231.45.56.77 for instance

6 dpminusa January 18, 2011 at 9:29 pm

RE1=’\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-
9][0-9]?)){3}\b’

This is optimized and simplified a bit. Is there a construct for using the first group as a reference in the second group?

7 dpminusa January 18, 2011 at 9:37 pm

REG=’(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
RE1=”\b$REG(\.$REG){3}\b”

This seems to work. Better form to test and tweak with as well. This is the best I can come up with, without more research and maybe a more obscure features. I you look through perl.org, and the full scope of extended regex, it gets pretty esoteric.

8 roham September 10, 2011 at 2:16 am

use “-o” to print only parts matches with pattern and skip other parts of line .

grep -o -e “your pattern” filepath

9 Jidifi September 14, 2011 at 4:35 pm

As already said, example 10 is wrong:
test with the IP 204204204204 for example.

10 malati October 24, 2011 at 4:33 am

Hi,

can anybody please help me to grep the numbers from -15 to -20, m trying to grep the processes whose nice value ranges from -15 to -20, command m using is ps -eo “%n %p”" | grep \-1[5-9], it greps process having nice value from -15 to -19 but i wan for -15 to -20

11 Glen Neff December 7, 2011 at 8:53 am

This fails to work for a lot of valid IP addresses:

dhcpadm@thebrain-/home/dhcpadm$ echo 192.118.200.1 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
dhcpadm@thebrain-/home/dhcpadm$ echo 192.118.200.10 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
dhcpadm@thebrain-/home/dhcpadm$ echo 10.6.8.15 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
10.6.8.15
dhcpadm@thebrain-/home/dhcpadm$ echo 10.111.120.5 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
10.111.120.5
dhcpadm@thebrain-/home/dhcpadm$ echo 192.118.200.1 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
dhcpadm@thebrain-/home/dhcpadm$ echo 192.168.200.1 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
dhcpadm@thebrain-/home/dhcpadm$ echo 255.255.255.0 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
dhcpadm@thebrain-/home/dhcpadm$ echo 10.6.41.0 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
10.6.41.0
dhcpadm@thebrain-/home/dhcpadm$ echo 172.31.255.2 | egrep ‘\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)’
dhcpadm@thebrain-/home/dhcpadm$

12 Kushagra April 15, 2012 at 9:40 am

grep ‘^\(25[0-5]\.\|2[0-4][0-9]\.\|1[0-9][0-9]\.\|[0-9][0-9]\.\|[0-9]\.\)\{3\}\(25[0-5]\|2[0-4][0-9]\|1[0-9][0-9]\|[0-9][0-9]\|[0-9]\)’

13 arpit May 2, 2012 at 1:01 pm

it also matches
123.13.2.3.243
or 123.13.2.259
or 123.13.2.3.assff
it should be as
grep ‘^\(25[0-5]\.\|2[0-4][0-9]\.\|1[0-9][0-9]\.\|[0-9][0-9]\.\|[0-9]\.\)\{3\}\(25[0-5]\|2[0-4][0-9]\|1[0-9][0-9]\|[0-9][0-9]\|[0-9]\)$’

14 Kushagra May 3, 2012 at 10:05 pm

grep -E ‘^(25[0-5].|2[0-4][0-9].|1[0-9][0-9].|[0-9][0-9].|[0-9].){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9][0-9]|[0-9])$’

15 Kushagra Jaiswal May 3, 2012 at 10:12 pm

use -E option then there no need to use ‘\’
grep ‘^(25[0-5].|2[0-4][0-9].|1[0-9][0-9].|[0-9][0-9].|[0-9].){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9][0-9]|[0-9])$’
or use egrep
egrep ‘^(25[0-5].|2[0-4][0-9].|1[0-9][0-9].|[0-9][0-9].|[0-9].){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9][0-9]|[0-9])$’

16 sam July 30, 2012 at 4:17 pm

print “Please enter the file URL:”;
$PnpLogPath = ;

print “Please enter the Device URI: “;
$DeviceURI = ;

open(HANDLE, $PnpLogPath);
@LogInput = ;

@subarray = grep /\Q$DeviceURI\E/, @LogInput;
foreach $subarray(@subarray)
{
print $subarray.”\n”;
}
I have a logfile say logfile.txt which contains info. I am passing name of file from command line and also the word which I want to find through command line and print those lines which contains that word. But somehow grep is not returning the line which contains it. HElp me out here.

The word which I am trying to find is –https://sn1.notify.Sunday,.net/unthrottSeaTacJFKInternational/01.00/AAHdC–PWgJuTrmAX1A3jeoyAgAAAAADAQAX1AQUZm52OkJCMjg1QTQ

The logFile is :

2012-07-25 18:31:30,788 access INFO request accepted. Request: , URI:https://sn1.notify.Sunday,.net/unthrottSeaTacJFKInternational/01.00/AAHdC–PWgJuTrmAX1A3jeoyAgAAAAADAQAX1AQUZm52OkJCMjg1QTQ**, Payload:

2012-07-25 18:31:30,859 root INFO insert req into server queue 2012-07-25 18:31:30,861 root DEBUG start consume requests from server queue [sn1.notify.Sunday,.net]

2012-07-25 18:31:30,862 root INFO try sending more requests. requests in queue: 688

2012-07-25 18:31:31,331 access INFO request accepted. Request: , URI:https://sn1.notify.Sunday,.net/unthrottSeaTacJFKInternational/01.00/AAFVBNQ4MAySQK-rDr0-CmOvAgAAAAADAQrDr0QUZm52OkJCMjg1QTQ, Payload:

Leave a Comment

Previous post:

Next post: