Pipes And Filters
The purpose of this lesson is to introduce us to the way that we can construct powerful Linux command lines by combining Linux commands. Linux commands alone are powerful, but when we combine them together, we can accomplish complex tasks with ease. The way we combine Linux commands is through using pipes and filters.
Using a Pipe
The symbol |
is the Linux pipe symbol that is used on the command line. What it means is that the standard output of the command to the left of the pipe gets sent as standard input of the command to the right of the pipe. Note that this functions a lot like the >
symbol used to redirect the standard output of a command to a file. However, the pipe is different because it is used to pass the output of a command to another command, not a file.
Here is an example:
cat apple.txt
core
worm seed
jewel
cat apple.txt | wc
3 4 21
In this example, at the first shell prompt, we show the contents of the file apple.txt to us. In the next shell prompt, we use the cat command to display the contents of the applex.txt
file, but we sent the display not to the screen, but through a pipe to the wc
(word count) command. The wc
command then does its job and counts the lines, words, and characters of what it got as input.
We can combine many commands with pipes on a single command line. Here’s an example where we count the characters, words, and lines of the apple.txt file, then mail the results to nobody@december.com with the subject line The count.
cat apple.txt | wc | mail -s "The count" nobody@december.com
Using simple filter
In Linux and Linux-like operating systems, a filter is program that gets most of its data from standard input (the main input stream) and writes its main results to standard output (the main output stream). Linux filters are often used as elements of pipelines. The pipe operator ("|") on a command line signifies that the main output of the command to the left is passed as main input to the command on the right.
List of Linux filter programs
Head
Head is a program on Linux and Linux-like systems used to display the first few lines of a text file or piped data. The command-syntax is:
head [options] <file_name>
By default, head will print the first 10 lines of its input to the standard output. The number of lines printed may be changed with a command line option.
The following example shows the first 20 lines of filename:
head -20 filename
This displays the first 5 lines of all files starting with foo:
head -5 foo*
Tail
Tail is a program on Linux and Linux-like systems used to display the last few lines of a text file or piped data. The command-syntax is:
tail [options] <file_name>
By default, tail will print the last 10 lines of its input to the standard output. With command line options the number of lines printed and the printing units (lines, blocks or bytes) may be changed. The following example shows the last 20 lines of filename:
tail -20 filename
This example show all lines of filename after the first 2 lines:
tail +2 filename
File monitoring
Tail has a special command line option -f (follow) that allows a file to be monitored. Instead of displaying the last few lines and exiting, tail displays the lines and then monitors the file. As new lines are added to the file by another process, tail updates the display. This is particularly useful for monitoring log files.
The following command will display the last 10 lines of messages and append new lines to the display as new lines are added to messages:
tail -f /var/adm/messages
To interrupt tail while it is monitoring, break-in with CTRL-C
More
more is better, isn’t it? Better than what? Better than the cat command. cat dumps its arguments to std out, which is the terminal (unless we redirect it with > or »). But what if we’re working on your dissertation, and we’d like to read it page by page, we’d use a command like:
more dissertation.txt
This will generate a nice page-by-page display of your masterpiece. Type at a command prompt, and check out the man page to get more details
man more
Here we are only going to tell us the most important features of more ( i.e. the features that we use). There are three important things we should know: Typing q while examining a file quits more
Typing /SEARCHSTRING
while examining a file searches for SEARCHSTRING
more is a great example of a filter
Less
Opposite of the more command. Both less and more display the contents of a file one screen at a time, waiting for us to press the Spacebar
between screens. This lets us read text without it scrolling quickly off your screen. The less utility is generally more flexible and powerful than more, but more is available on all Linux systems while less may not be.
The less command is a pager that allows us to move forward and backward (instead of only forward, as the more pager behaves on some systems) when output is displayed one screen at a time. :
To read the contents of a file named textfile in the current directory, enter:
less textfile
The less utility is often used for reading the output of other commands. For example, to read the output of the ls command one screen at a time, enter:
ls -la | less
In both examples, we could substitute more for less with similar results. To exit either less or more, press q. exit less after viewing the file, press q.
Wc
In Linux, to get the line, word, or character count of a document, use the wc command. At the Linux shell prompt, enter: wc filename Replace filename with the file or files for which we want information. For each file, wc will output three numbers. The first is the line count, the second the word count, and the third is the character count
Syntax:
To count the characters in a file. Here it counts the no of characters in the file abc.txt
Wc –c / abc.txt
For example, to find out how many bytes are in the .login file, we could enter:
wc -c .login
We may also pipe standard output into wc to determine the size of a stream. For example, to find out how many files are in a directory, enter:
/bin/ls -l | wc -l
Sort
The sort filter arranges each line of input in ASCII order. As the name suggests the sort command can be used for sorting the contents of a file. While sorting, the sort command bases its comparisons on the first character on each line in the file. If the first character of two lines is same then the second character in each line is compared and so on. That is it sorts the spaces and the tabs first, then the punctuation marks followed by numbers, uppercase letters and lowercase letters.
Syntax:
sort [options]
Cut
The cut filter of Linux is useful when a file has to be queried to display selective fields from a file. It cuts or picks up a given number of character or fields from the specified file. The cut assumes that a tab separates the fields.
Syntax:
cut [options]
The -c option:
The option is used to cut the specified columns from a file. For example, Linux
cut -c 2,4 filename
As a result, the second column and the fourth column from each line in the given file would be displayed
Tr
Tr (abbreviated from translate or transliterate) is a command in Linux-like operating systems. When executed, the program reads from the standard input and writes to the standard output. It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other set. The following inputs, for instance, shift the input letters of the alphabet back by one character.
echo "ibm 9000" >computer.txt
tr a-z za-y <computer.txt
hal 9000
Note: when ever we are using the “tr” operator we have to use inpur rediction operator
Uniq
Uniq is a Linux utility which, when fed a text file, outputs the file with adjacent identical lines collapsed to one. It is a kind of filter program. Typically it is used after sort. It can also output only the duplicate lines (with the -d option), or add the number of occurrences of each line (with the -c option).
An example: To see the list of lines in a file, sorted by the number of times each occurs:
sort file|uniq -c|sort -n
Using uniq like this is common when building pipelines in shell scripts.
Switches
-u
Print only lines which are not repeated in the original file-d
Print one copy only of each repeated line in the input file.-c
Generate an output report in default style except that each line is preceded by a count of the number of times it occurred. If this option is specified, the -u and -d options are ignored if either or both are also present.-i
Ignore case differences when comparing lines-s
Skips a number of characters in a line-w
Specifies the number of characters to compare in lines after any characters and fields have been skipped--help
Displays a help message--version
Displays version number on stdout and exits.