Filters
A filter is a Linux command that does some manipulation of the text of a file. A filter reads a file, else its standard input (stdin)A filter is a Linux command that does some manipulation of the text of a file. A filter reads a file, else its standard input (stdin); writes to its standard output (stdout). A pipe (written |) connects stdout of one process to stdin of another process.
Pipeline And Text Manipulation
Linux commands alone are powerful, but when you combine them together, you can accomplish complex tasks with ease. The way you combine Linux commands is through using pipes and filters.
Pipeline
Linux allows you to connect processes, by letting the standard output of one process feed into the standard input of another process. That mechanism is called a pipe. Connecting simple processes in a pipeline allows you to perform complex tasks without writing complex programs.
Text Manipulation
Many of the tasks a Systems Administrator will perform involve the manipulation of textual information. Some examples include manipulating system log files to generate reports and modifying shell programs. Manipulating textual information is something which UNIX is quite good at and provides a number of tools which make tasks like this quite simple, once you understand how to use the tools.
Filters
Filters can be classified into
- Simple filters
- Advanced filters
Simple filters
Filters like more, less, head, tail, wc (word count), tr (translate), tee, cut, sort, uniq, cmp comes under this category. We will discuss all these in detail.
more
more
is a filter for paging through text one screenful at a time.
Example:
ls –l | more
The above command will show long listing of files only one screen at a time. With keystroke space bar
, it scrolls one page up and with keystroke return
button, it scrolls one line up and q to quit.
less
The command less
lists the output (e.g., specified files) on the terminal screen by screen like the command more, but in addition allows backward movement in the file (press b to go back one full screen) as well as forward movement. You can also move a set number of lines instead of a whole page.
Note:
more
andless
filters working is same. Butless
filter works only in Linux OS, whereasmore
works in all flavors of UNIX including Linux.
head
head
is a program on Linux and Linux-like systems used to display the first few lines of a text file or piped data.
Syntax:
head [options] <file_name>
Example: The following example shows the first 20 lines of filename
head -n 20 filename
Note:
By default,
head
displays first 10 lines, if no option is mentioned.
tail
tail
is a program on Linux and Linux-like systems used to display the last few lines of a text file or piped data.
Syntax:
tail [options] <file_name>
Example: The following example shows the last 15 lines of filename
tail -n 15 filename
Note:
By default,
tail
displays last 10 lines, if no option is mentioned.
wc
wc
Counts the number of lines, words, and bytes in the files specified by the File parameter.
The program reads either standard input or a list of files and generates one or more of the following statistics -
- Number of bytes.
- Number of words.
- Number of lines (specifically, the number of newline characters).
Examples:
wc *.txt # counts the lines, words, bytes in all txt files
wc –l /etc/passwd # count the number of users in your system.
wc -l <filename> # print the line count
wc -c <filename> # print the byte count
wc -m <filename> # print the character count
wc -L <filename> # print the length of longest line
wc -w <filename> # print the word count
tr
tr
(abbreviated from translate or transliterate) is a command in Linux-like operating systems.
When executed, the program reads from the standard input and writes to the standard output. It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other set.
Syntax:
tr "[set1]" "[set2]" < filename
Example: Creates file2 as a copy of file1, with all uppercase letters translated to the corresponding lowercase ones
tr "A-Z" "a-z" < file1 > file2
cut
cut
is a Linux command line utility which is used to extract sections from each line of input — usually from a file. Extraction of line segments can typically be done by bytes (-b), characters (-c), or fields (-f) separated by a delimiter (-d — the tab character by default.
Examples: Assuming a file named file containing the lines
foo:bar:baz:qux:quux
one2️⃣three4️⃣five6️⃣seven
alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
- To output the fourth through tenth characters of each line,
cut -c 4-10 file
# output
This gives the output:
:bar:ba
2️⃣th
ha:beta
- To output the fifth field through the end of the line of each line using the colon character as the field delimiter
cut -d : -f 5- file
# output
quux
five6️⃣seven
epsilon:zeta:eta:teta:iota:kappa:lambda:mu
sort
sort
is a standard Linux command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is taken used as default field separator. The -r flag will reverse the sort order.
Examples:
- Sort the current directory by file size
ls -s | sort -n
# output
96 Nov1.txt
128 _arch_backup.lst
128 _arch_backup.lst.tmp
1708 NMON
- Sort a file in alpha order
cat phonebook
# output
Smith, Brett 555-4321
Doe, John 555-1234
Doe, Jane 555-3214
Avery, Cory 555-4321
Fogarty, Suzie 555-2314
sort phonebook
# output
Avery, Cory 555-4321
Doe, Jane 555-3214
Doe, John 555-1234
Fogarty, Suzie 555-2314
- Sort by number
du /bin/* | sort -n
# output
4 /bin/domainname
24 /bin/ls
102 /bin/sh
304 /bin/csh
The -n option makes the program sort according to numerical value.
uniq
uniq
is a Linux utility which, when fed a text file, outputs the file with adjacent identical lines collapsed to one. It is a kind of filter program. Typically, it is used after sort. It can also output only the duplicate lines (with the -d option), or add the number of occurrences of each line (with the -c option).
Examples: if you have a file called foo that looked like,
davel
davel
davel
jiffy
jones
jiffy
mark
mark
mark
chuck
bonni
chuck
You could run uniq on it
$ uniq foo
# output
davel
jiffy
jones
jiffy
mark
chuck
bonnie
chuck
Notice that there are still two jeffy lines and two chuck lines. This is because the duplicates were not adjacent. To get a true unique list you have to make sure the stream is sorted
sort foo | uniq
# output
jones
bonnie
davel
chuck
jiffy
mark
That gives you a truly unique list. However, it’s also a useless use of uniq since sort(1) has an argument, -u to do this very common operation.
sort -u foo
# output
jones
bonnie
davel
chuck
jiffy
mark
uniq
has other arguments that let it do more interesting mutilations on its input
- -d tells uniq to eliminate all lines with only a single occurrence (delete unique lines), and print just one copy of repeated lines
$sort foo | uniq –d
# output
davel
chuck
jiffy
mark
- -u tells uniq to eliminate all duplicated lines and show only those which appear once (only the unique lines)
$ sort foo | uniq –u
jones
bonnie
- -c tells uniq to count the occurrences of each line
sort foo | uniq -c
1 jones
1 bonnie
3 davel
2 chuck
2 jeffy
3 mark
often pipe the output of “uniq -c” to “sort -n” (sort in numeric order) to get the list in order of frequency
$sort foo | uniq -c | sort -n
1 jones
1 bonnie
2 chuck
2 jeffy
3 davel
3 mark
paste
paste is a Linux command line utility which is used to join files horizontally (parallel merging) by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output. It is effectively the horizontal equivalent to the utility cat command which operates on the vertical plane of two or more files. To paste several columns of data together into the file www from files who, where, and when.
If the files contain,
Who | where | when |
---|---|---|
Sam | Detroit | January |
Dave | Edgewood | 3February |
Sue | Tampa | 4March 19 |
This creates the file named www containing
Sam Detroit January 3
Dave Edgewood February 4
Sue Tampa March 19