A filter is a Linux command that does some manipulation of the text of a file. A filter reads a file, else its standard input (stdin)A filter is a Linux command that does some manipulation of the text of a file. A filter reads a file, else its standard input (stdin); writes to its standard output (stdout). A pipe (written |) connects stdout of one process to stdin of another process.
Pipeline And Text Manipulation
Linux commands alone are powerful, but when you combine them together, you can accomplish complex tasks with ease. The way you combine Linux commands is through using pipes and filters.
Linux allows you to connect processes, by letting the standard output of one process feed into the standard input of another process. That mechanism is called a pipe. Connecting simple processes in a pipeline allows you to perform complex tasks without writing complex programs.
Many of the tasks a Systems Administrator will perform involve the manipulation of textual information. Some examples include manipulating system log files to generate reports and modifying shell programs. Manipulating textual information is something which UNIX is quite good at and provides a number of tools which make tasks like this quite simple, once you understand how to use the tools.
Filters can be classified into
- Simple filters
- Advanced filters
Filters like more, less, head, tail, wc (word count), tr (translate), tee, cut, sort, uniq, cmp comes under this category. We will discuss all these in detail.
more is a filter for paging through text one screenful at a time.
ls –l | more
The above command will show long listing of files only one screen at a time. With keystroke
space bar, it scrolls one page up and with keystroke
return button, it scrolls one line up and q to quit.
less lists the output (e.g., specified files) on the terminal screen by screen like the command more, but in addition allows backward movement in the file (press b to go back one full screen) as well as forward movement. You can also move a set number of lines instead of a whole page.
lessfilters working is same. But
lessfilter works only in Linux OS, whereas
moreworks in all flavors of UNIX including Linux.
head is a program on Linux and Linux-like systems used to display the first few lines of a text file or piped data.
head [options] <file_name>
Example: The following example shows the first 20 lines of filename
head -n 20 filename
headdisplays first 10 lines, if no option is mentioned.
tail is a program on Linux and Linux-like systems used to display the last few lines of a text file or piped data.
tail [options] <file_name>
Example: The following example shows the last 15 lines of filename
tail -n 15 filename
taildisplays last 10 lines, if no option is mentioned.
wc Counts the number of lines, words, and bytes in the files specified by the File parameter.
The program reads either standard input or a list of files and generates one or more of the following statistics -
- Number of bytes.
- Number of words.
- Number of lines (specifically, the number of newline characters).
wc *.txt # counts the lines, words, bytes in all txt files wc –l /etc/passwd # count the number of users in your system. wc -l <filename> # print the line count wc -c <filename> # print the byte count wc -m <filename> # print the character count wc -L <filename> # print the length of longest line wc -w <filename> # print the word count
tr (abbreviated from translate or transliterate) is a command in Linux-like operating systems.
When executed, the program reads from the standard input and writes to the standard output. It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other set.
tr "[set1]" "[set2]" < filename
Example: Creates file2 as a copy of file1, with all uppercase letters translated to the corresponding lowercase ones
tr "A-Z" "a-z" < file1 > file2
cut is a Linux command line utility which is used to extract sections from each line of input — usually from a file. Extraction of line segments can typically be done by bytes (-b), characters (-c), or fields (-f) separated by a delimiter (-d — the tab character by default.
Examples: Assuming a file named file containing the lines
foo:bar:baz:qux:quux one2️⃣three4️⃣five6️⃣seven alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
- To output the fourth through tenth characters of each line,
cut -c 4-10 file # output This gives the output: :bar:ba 2️⃣th ha:beta
- To output the fifth field through the end of the line of each line using the colon character as the field delimiter
cut -d : -f 5- file # output quux five6️⃣seven epsilon:zeta:eta:teta:iota:kappa:lambda:mu
sort is a standard Linux command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is taken used as default field separator. The -r flag will reverse the sort order.
- Sort the current directory by file size
ls -s | sort -n # output 96 Nov1.txt 128 _arch_backup.lst 128 _arch_backup.lst.tmp 1708 NMON
- Sort a file in alpha order
cat phonebook # output Smith, Brett 555-4321 Doe, John 555-1234 Doe, Jane 555-3214 Avery, Cory 555-4321 Fogarty, Suzie 555-2314
sort phonebook # output Avery, Cory 555-4321 Doe, Jane 555-3214 Doe, John 555-1234 Fogarty, Suzie 555-2314
- Sort by number
du /bin/* | sort -n # output 4 /bin/domainname 24 /bin/ls 102 /bin/sh 304 /bin/csh
The -n option makes the program sort according to numerical value.
uniq is a Linux utility which, when fed a text file, outputs the file with adjacent identical lines collapsed to one. It is a kind of filter program. Typically, it is used after sort. It can also output only the duplicate lines (with the -d option), or add the number of occurrences of each line (with the -c option).
Examples: if you have a file called foo that looked like,
davel davel davel jiffy jones jiffy mark mark mark chuck bonni chuck
You could run uniq on it
$ uniq foo # output davel jiffy jones jiffy mark chuck bonnie chuck
Notice that there are still two jeffy lines and two chuck lines. This is because the duplicates were not adjacent. To get a true unique list you have to make sure the stream is sorted
sort foo | uniq # output jones bonnie davel chuck jiffy mark
That gives you a truly unique list. However, it’s also a useless use of uniq since sort(1) has an argument, -u to do this very common operation.
sort -u foo # output jones bonnie davel chuck jiffy mark
uniq has other arguments that let it do more interesting mutilations on its input
- -d tells uniq to eliminate all lines with only a single occurrence (delete unique lines), and print just one copy of repeated lines
$sort foo | uniq –d # output davel chuck jiffy mark
- -u tells uniq to eliminate all duplicated lines and show only those which appear once (only the unique lines)
$ sort foo | uniq –u jones bonnie
- -c tells uniq to count the occurrences of each line
sort foo | uniq -c 1 jones 1 bonnie 3 davel 2 chuck 2 jeffy 3 mark
often pipe the output of “uniq -c” to “sort -n” (sort in numeric order) to get the list in order of frequency
$sort foo | uniq -c | sort -n 1 jones 1 bonnie 2 chuck 2 jeffy 3 davel 3 mark
paste is a Linux command line utility which is used to join files horizontally (parallel merging) by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output. It is effectively the horizontal equivalent to the utility cat command which operates on the vertical plane of two or more files. To paste several columns of data together into the file www from files who, where, and when.
If the files contain,
This creates the file named www containing
Sam Detroit January 3 Dave Edgewood February 4 Sue Tampa March 19