Advanced Filters

In Linux, filters are used to project interested text from one or more files, process the text, and manipulate the selected text files. Filters are frequently used by admins and developers to scan and filter the logs and edit files in remote server through CLI.

grep

The grep program searches a file or files for lines that have a certain pattern. The syntax is:

grep "pattern" file(s)

The name grep derives from the ed (a Linux line editor) command g/re/p, which means globally search for a regular expression and print all lines containing it. A regular expression is either some plain text (a word, for example) and/or special characters used for pattern matching.

When we learn more about regular expressions, we can use them to specify complex patterns of text. The simplest use of grep is to look for a pattern consisting of a single word.

It can be used in a pipe so that only those lines of the input files containing a given string are sent to the standard output. But let’s start with an example reading from files: searching all files in the working directory for a word–say, Linux. We’ll use the wildcard * to quickly give grep all filenames in the directory.

grep "Linux" *
ch01:Linux is a flexible and powerful operating system
ch01:When the Linux designers started work, little did
ch05:What can we do with Linux?

When grep searches multiple files, it shows the filename where it finds each matching line of text. Alternatively, if we don’t give grep a filename to read, it reads its standard input; that’s the way all filter programs work:

ls -l | grep "Aug"
-rw-rw-rw-   1 john  doc           11008 Aug  6 14:10 ch02
-rw-rw-rw-   1 john  doc           8515 Aug  6 15:30 ch07
-rw-rw-r--   1 john  doc           2488 Aug 15 10:51 intro
-rw-rw-r--   1 carol doc           1605 Aug 23 07:35 macros

First, the example runs ls -l to list our directory. The standard output of ls -l is piped to grep, which only outputs lines that contain the string Aug (that is, files that were last modified in August). Because the standard output of grep isn’t redirected, those lines go to the terminal screen.

options with grep

Let us modify the search. Given table lists some of the options.

OptionDescription
-vPrint all lines that do not match pattern.
-nPrint the matched line and its line number.
-lPrint only the names of files with matching lines (lowercase letter “L”). -c Print only the count of matching lines.
-iMatch either upper- or lowercase.

Next, let’s use a regular expression that tells grep to find lines with carol, followed by zero or more other characters (abbreviated in a regular expression as .*), then followed by Aug:

Note: that the regular expression for zero or more characters, .*, is different than the corresponding filename wildcard *. We can’t cover regular expressions in enough depth here. As a rule of thumb, remember that the first argument to grep is a regular expression. Other arguments, if any, are filenames that can use wildcards.

ls -l | grep "carol.*Aug"
-rw-rw-r-- 1 carol doc 1605 Aug 23 07:35 macros

sed

sed (which stands for Stream EDitor) is a simple but powerful computer program used to apply various pre-specified textual transformations to a sequential stream of text data.

It reads input files line by line, edits each line according to rules specified in its simple language (the sed script), and then outputs the line.

While originally created as a Linux utility by Lee E. McMahon of Bell Labs from 1973 to 1974, sed is now available for virtually every operating system that supports a command line.

Functions

sed is often thought of as a non-interactive text editor. It differs from conventional text editors in that the processing of the two inputs is inverted. Instead of iterating once through a list of edit commands applying each one to the whole text file in memory, sed iterates once through the text file applying the whole list of edit commands to each line.

Because only one line at a time is in memory, sed can process text files with an arbitrarily-large number of lines. Some implementations of sed can only process lines of limited lengths. sed’s command set is modeled after the ed editor, and most commands work similarly in this inverted paradigm.

For example, the command 25d means if this is line 25, then delete (don’t output) it, rather than go to line 25 and delete it as it does in ed. The notable exceptions are the copy and move commands, which span a range of lines and thus don’t have straightforward equivalents in sed.

Instead, sed introduces an extra buffer called the hold space, and additional commands to manipulate it. The ed command to copy line 25 to line 76 (25t76) for example would be coded as two separate commands in sed (25h; 76g), to store the line in the hold space until the point at which it should be retrieved.

awk

The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of awk was written in 1977 at AT&T Bell Laboratories. In 1985 a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions. This new version became generally available with Linux System V Release 3.1.

.awk is a programming language designed to search for, match patterns, and perform actions on files. awk programs are generally quite small, and are interpreted. This makes it a good language for prototyping.

input lines to awk: When awk scans an input line, it breaks it down into a number of fields. Fields are separated by a space or tab character. Fields are numbered beginning at one, and the dollar symbol ($) is used to represent a field.

For instance, the following line in a file I like money. Has three fields. They are

1 I
2 like
3 money.

Field zero ($0) refers to the entire line. awk scans lines from a file(s) or standard input.

Subscribe For More Content