Filters

A filter is a Linux command that does some manipulation of the text of a file. A filter reads a file, else its standard input (stdin)A filter is a Linux command that does some manipulation of the text of a file. A filter reads a file, else its standard input (stdin); writes to its standard output (stdout). A pipe (written |) connects stdout of one process to stdin of another process.

Pipeline And Text Manipulation

Linux commands alone are powerful, but when you combine them together, you can accomplish complex tasks with ease. The way you combine Linux commands is through using pipes and filters.

Pipeline

Linux allows you to connect processes, by letting the standard output of one process feed into the standard input of another process. That mechanism is called a pipe. Connecting simple processes in a pipeline allows you to perform complex tasks without writing complex programs.

Text Manipulation

Many of the tasks a Systems Administrator will perform involve the manipulation of textual information. Some examples include manipulating system log files to generate reports and modifying shell programs. Manipulating textual information is something which UNIX is quite good at and provides a number of tools which make tasks like this quite simple, once you understand how to use the tools.

Filters

Filters can be classified into

  • Simple filters
  • Advanced filters

Simple filters

Filters like more, less, head, tail, wc (word count), tr (translate), tee, cut, sort, uniq, cmp comes under this category. We will discuss all these in detail.

more

more is a filter for paging through text one screenful at a time.

Example:

ls –l | more

The above command will show long listing of files only one screen at a time. With keystroke space bar, it scrolls one page up and with keystroke return button, it scrolls one line up and q to quit.

less

The command less lists the output (e.g., specified files) on the terminal screen by screen like the command more, but in addition allows backward movement in the file (press b to go back one full screen) as well as forward movement. You can also move a set number of lines instead of a whole page.

Note:

more and less filters working is same. But less filter works only in Linux OS, whereas more works in all flavors of UNIX including Linux.

head is a program on Linux and Linux-like systems used to display the first few lines of a text file or piped data.

Syntax:

head [options] <file_name>

Example: The following example shows the first 20 lines of filename

head -n 20 filename

Note:

By default, head displays first 10 lines, if no option is mentioned.

tail

tail is a program on Linux and Linux-like systems used to display the last few lines of a text file or piped data.

Syntax:

tail [options] <file_name>

Example: The following example shows the last 15 lines of filename

tail -n 15 filename

Note:

By default, tail displays last 10 lines, if no option is mentioned.

wc

wc Counts the number of lines, words, and bytes in the files specified by the File parameter.

The program reads either standard input or a list of files and generates one or more of the following statistics -

  • Number of bytes.
  • Number of words.
  • Number of lines (specifically, the number of newline characters).

Examples:

wc *.txt # counts the lines, words, bytes in all txt files
wc –l /etc/passwd # count the number of users in your system.
wc -l <filename>  # print the line count 
wc -c <filename>  # print the byte count
wc -m <filename>  # print the character count 
wc -L <filename>  #  print the length of longest line 
wc -w <filename>  # print the word count

tr

tr (abbreviated from translate or transliterate) is a command in Linux-like operating systems.

When executed, the program reads from the standard input and writes to the standard output. It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other set.

Syntax:

tr "[set1]" "[set2]" < filename

Example: Creates file2 as a copy of file1, with all uppercase letters translated to the corresponding lowercase ones

tr "A-Z" "a-z" < file1 > file2

cut

cut is a Linux command line utility which is used to extract sections from each line of input — usually from a file. Extraction of line segments can typically be done by bytes (-b), characters (-c), or fields (-f) separated by a delimiter (-d — the tab character by default.

Examples: Assuming a file named file containing the lines

foo:bar:baz:qux:quux
one2️⃣three4️⃣five6️⃣seven
alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
  • To output the fourth through tenth characters of each line,
cut -c 4-10 file
# output
This gives the output:
:bar:ba
2️⃣th
ha:beta
  • To output the fifth field through the end of the line of each line using the colon character as the field delimiter
cut -d : -f 5- file
# output
quux
five6️⃣seven
epsilon:zeta:eta:teta:iota:kappa:lambda:mu

sort

sort is a standard Linux command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is taken used as default field separator. The -r flag will reverse the sort order.

Examples:

  • Sort the current directory by file size
ls -s | sort -n
# output
96 Nov1.txt
128 _arch_backup.lst
128 _arch_backup.lst.tmp
1708 NMON
  • Sort a file in alpha order
cat phonebook
# output
Smith, Brett     555-4321
Doe, John        555-1234
Doe, Jane        555-3214
Avery, Cory      555-4321
Fogarty, Suzie   555-2314
sort phonebook
# output
Avery, Cory      555-4321
Doe, Jane        555-3214
Doe, John        555-1234
Fogarty, Suzie   555-2314
  • Sort by number
du /bin/* | sort -n
# output
4       /bin/domainname
24      /bin/ls
102     /bin/sh
304     /bin/csh

The -n option makes the program sort according to numerical value.

uniq

uniq is a Linux utility which, when fed a text file, outputs the file with adjacent identical lines collapsed to one. It is a kind of filter program. Typically, it is used after sort. It can also output only the duplicate lines (with the -d option), or add the number of occurrences of each line (with the -c option).

Examples: if you have a file called foo that looked like,

davel
davel
davel
jiffy
jones
jiffy
mark
mark
mark
chuck
bonni
chuck

You could run uniq on it

$ uniq foo
# output
davel
jiffy
jones
jiffy
mark
chuck
bonnie
chuck

Notice that there are still two jeffy lines and two chuck lines. This is because the duplicates were not adjacent. To get a true unique list you have to make sure the stream is sorted

sort foo | uniq
# output
jones
bonnie
davel
chuck
jiffy
mark

That gives you a truly unique list. However, it’s also a useless use of uniq since sort(1) has an argument, -u to do this very common operation.

sort -u foo
# output
jones
bonnie
davel
chuck
jiffy
mark

uniq has other arguments that let it do more interesting mutilations on its input

  • -d tells uniq to eliminate all lines with only a single occurrence (delete unique lines), and print just one copy of repeated lines
$sort foo | uniq –d
# output
davel
chuck
jiffy
mark
  • -u tells uniq to eliminate all duplicated lines and show only those which appear once (only the unique lines)
$ sort foo | uniq –u
jones
bonnie
  • -c tells uniq to count the occurrences of each line
sort foo | uniq -c
1 jones
1 bonnie
3 davel
2 chuck
2 jeffy
3 mark

often pipe the output of “uniq -c” to “sort -n” (sort in numeric order) to get the list in order of frequency

$sort foo | uniq -c | sort -n
1 jones
1 bonnie
2 chuck
2 jeffy
3 davel
3 mark

paste

paste is a Linux command line utility which is used to join files horizontally (parallel merging) by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output. It is effectively the horizontal equivalent to the utility cat command which operates on the vertical plane of two or more files. To paste several columns of data together into the file www from files who, where, and when.

If the files contain,

Whowherewhen
SamDetroitJanuary
DaveEdgewood3February
SueTampa4March 19

This creates the file named www containing

Sam            Detroit         January 3
Dave           Edgewood        February 4
Sue            Tampa           March 19

Subscribe For More Content