The solution is to use data pipes (aka pipelines). A pipe redirects the first program's standard output to the second program's standard input, and it is denoted by a vertical bar (|
):
For instance, suppose that first generates some system statistics, such as system uptime, CPU use, number of users logged in, and so on. This output might be lengthy, so you want to trim it a bit. You might therefore use second, which could be a script or command that echoes from its standard input only the information in which you're interested. (The grep
command, described in “Using grep
,” is often used in this role.)
Pipes can be used in sequences of arbitrary length:
Another redirection tool often used with pipes is the tee
command. This command splits standard input so that it's displayed on standard output and in as many files as you specify. Typically, tee
is used in conjunction with data pipes so that a program's output can be both stored and viewed immediately. For instance, to view and store the output of the echo $PATH
command, you might type this:
Notice that not only were the results of the command displayed to STDOUT, but they were also redirected to the path.txt
file by the tee
command. Ordinarily, tee
overwrites any files whose names you specify. If you want to append data to these files, pass the -a
option to tee
.
Generating Command Lines
Sometimes you'll find yourself needing to conduct an unusual operation on your Linux server. For instance, suppose you want to remove every file in a directory tree that belongs to a certain user. With a large directory tree, this task can be daunting!
The usual file-deletion command, rm
(described in more detail in Chapter 4), doesn't provide an option to search for and delete every file that matches a specific criterion. One command that can do the search portion is find
(also described in more detail in Chapter 4). This command displays all of the files that match the criteria you provide. If you could combine the output of find
to create a series of command lines using rm
, the task would be solved. This is precisely the purpose of the xargs
command.
The xargs
command builds a command from its standard input. The basic syntax for this command is as follows:
The command is the command you want to execute, and initial-arguments is a list of arguments you want to pass to the command. The options are xargs
options; they aren't passed to command. When you run xargs
, it runs command once for every word passed to it on standard input, adding that word to the argument list for command. If you want to pass multiple options to the command, you can protect them by enclosing the group in quotation marks.
For instance, consider the task of deleting several files that belong to a particular user. You can do this by piping the output of find
to xargs
, which then calls rm
:
The first part of this command (find / – user Christine
) finds all of the files in directory tree (/) and its subdirectories that belong to user Christine
. (Since you are looking through the entire directory tree, you need superuser privileges for this to work properly.) This list is then piped to xargs
, which adds each input value to its own rm
command. Problems can arise if filenames contain spaces because by default xargs
uses both spaces and newlines as item delimiters. The -d ”\n”
option tells xargs
to use only newlines as delimiters, thus avoiding this problem in this context. (The find
command separates each found filename with a newline.)
It is important to exercise caution when using the rm
command with superuser privileges. This is especially true when piping the files to delete into the rm
command. You could easily delete the wrong files unintentionally.
A tool that's similar to xargs
in many ways is the backtick (`
), which is a character to the left of the 1 key on most keyboards. The backtick is not the same as the single quote character ('
), which is located to the right of the semicolon (;) on most keyboards.
Text within backticks is treated as a separate command whose results are substituted on the command line. For instance, to delete those user files, you can type the following command:
The backtick solution works fine in some cases, but it breaks down in more complex situations. The reason is that the output of the backtick-contained command is passed to the command it precedes as if it had been typed at the shell. By contrast, when you use xargs
, it runs the command you specify (rm
in these examples) once for each of the input items. What's more, you can't pass options such as -d ”\n”
to a backtick. Thus these two examples will work the same in many cases, but not in all of them.
Use of the backtick is falling out of favor because backticks are so often confused with single quotation marks. In several shells, you can use $()
instead. For instance, the backtick example used in the preceding example would be changed to
This command works just as well, and it is much easier to read and understand.
Processing Text Using Filters
In keeping with Linux's philosophy of providing small tools that can be tied together via pipes and redirection to accomplish more complex tasks, many simple commands to manipulate text are available. These commands accomplish tasks of various types, such as combining files, transforming the data in files, formatting text, displaying text, and summarizing data.
Many of the following descriptions include input-file specifications. In most cases, you can omit these input-file specifications, in which case the utility reads from standard input instead.
File-Combining Commands
The first text-filtering commands are those used to combine two or more files into one file. Three important commands in this category are cat
, join
, and paste
, which join files end to end based on fields in the file or by merging on a line-by-line basis.
Combining Files with cat
The cat
command's name is short for concatenate, and this tool does just that: It links together an arbitrary number of files end to end and sends the result to standard output. By combining cat
with output redirection, you can quickly combine two files into one:
Although cat
is officially a tool for combining files, it's also commonly used to display the contents of a short file to STDOUT. If you type only one filename as an option, cat
displays that file. This is a great way to review short files; but for long files, you're better off using a full-fledged pager command, such as more
or less
.
You