Filtering Input in Linux

Filtering Input in Linux

The grep command is a text filter that will search input and return lines which contain a match to a given pattern.

grep [OPTIONS] PATTERN [FILE]

For example, the passwd file we previously copied into the Documents directory [cp /etc/passwd .] contains the details of special system accounts and user accounts on the system. This file can be very large, however the grep command can be used filter out information about a specific user, such as the sysadmin user. Use sysadmin as the pattern argument and passwd as the file argument:

The example above uses a simple search term as the pattern, however grep is able to interpret much more complex search patterns.

Regular Expressions

Regular expressions have two common forms: basic and extended. Most commands that use regular expressions can interpret basic regular expressions. However, extended regular expressions are not available for all commands and a command option is typically required for them to work correctly.
The following table summarizes basic regular expression characters:

Basic Regex Character(s)Meaning
.Any one single character
[ ]Any one specified character
[^ ]Not the one specified character
*Zero or more of the previous character
^If first character in the pattern, then pattern must be at beginning of the line to match, otherwise just a literal ^
$If last character in the pattern, then pattern must be at the end of the line to match, otherwise just a literal $

The following table summarizes the extended regular expressions, which must be used with either the egrep command or the -E option with the grep command:

Basic Regex Character(s)Meaning
+One or more of the previous pattern
{ }Specify minimum, maximum or exact matches of the previous pattern
|Alternation – a logical “or”
( )Used to create groups
Basic Patterns

Regular expressions are patterns that only certain commands are able to interpret. Regular expressions can be expanded to match certain sequences of characters in text. The examples displayed on this page will make use of regular expressions to demonstrate their power when used with the grep command. In addition, these examples provide a very visual demonstration of how regular expressions work, the text that matches will be displayed in a red color.

The simplest of all regular expressions use only literal characters, like the example from the previous page:

Anchor Characters

Anchor characters are one of the ways regular expressions can be used to narrow down search results. For example, the pattern root appears many times in the /etc/passwd file:

The first anchor character ^ is used to ensure that a pattern appears at the beginning of the line. For example, to find all lines in /etc/passwd that start with root use the pattern ^root. Note that ^ must be the first character in the pattern to be effective.

For the next example, first examine the alpha-first.txt file. The cat command can be used to print the contents of a file:

The second anchor character $ can be used to ensure a pattern appears at the end of the line, thereby effectively reducing the search results. To find the lines that end with an r in the alpha-first.txt file, use the pattern r$:

Again, the position of this character is important, the $ must be the last character in the pattern in order to be effective as an anchor.

Match a Single Character With .

The following examples will use the red.txt file. One of the most useful expressions is the period . character. It will match any character except for the new line character. The pattern r..f would find any line that contained the letter r followed by exactly two characters (which can be any character except a newline) and then the letter f:

The line does not have to be an exact match, it simply must contain the pattern, as seen here when r..t is searched for in the /etc/passwd file:

Match a Single Character With []

The square brackets [ ] match a single character from the list or range of possible characters contained within the brackets. For example, given the profile.txtfile. To find all the lines in the profile.txt which have a number in them, use the pattern [0123456789] or [0-9]:

When other regular expression characters are placed inside of square brackets, they are treated as literal characters. For example, the . normally matches any one character, but placed inside the square brackets, then it will just match itself. Try this yourself.

Match a Repeated Character Or Patterns With *

The regular expression character * is used to match zero or more occurrences of a character or pattern preceding it. For example e* would match zero or more occurrences of the letter e. It is also possible to match zero or more occurrences of a list of characters by utilizing the square brackets. The pattern [oe]* used in the following example will match zero or more occurrences of the o character or the e character:

Standard Input

If a file name is not given, the grep command will read from standard input, which normally comes from the keyboard with input provided by the user who runs the command. This provides an interactive experience with grep where the user types in the input and grep filters as it goes. Feel free to try it out, just press Ctrl-D when you’re ready to return to the prompt.

Leave a Reply
Your email address will not be published. *

This site uses Akismet to reduce spam. Learn how your comment data is processed.