Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4535

Common Linux Text Search

$
0
0

1. Overview

Searching text is a very common operation in Linux. For example, we want to find the files that contain specific text, or we want to find the lines within a file that contain specific text.

In this tutorial, we'll go through some examples together and learn how to perform some common text searching in Linux using the grep command-line utility.

2. The grep Command

The grep command searches one or more input files for lines containing a match to a specified pattern.

Its name comes from the ed command g/re/p (globally search a regular expression and print).

By default, grep outputs the matching lines. The grep command has different variants and is available on almost every distribution of the Unix-like system by default. In this tutorial, we'll focus on the most widely used GNU grep.

3. Common Usage of grep

Now let's see some practical examples of how grep helps us to do text searches. In this section, all examples are done with GNU grep version 3.3. 

Let's create a text file named input.txt to help us explore the grep command's results:

Linux is a great system.
Learning linux is very interesting.

This Linux system has 17 users.
The uptime of this linux system: 77 hours.

File report
There are 100 directories under */*.
There are 250 files under */opt*. 
There are 300 files under */home/root*.
There are 20 mountpoints.

3.1. Basic String Search

To see how simple it is to perform a basic text search using grep, let's search our file for lines containing the string “linux“:

$ grep 'linux' input.txt
Learning linux is very interesting.
The uptime of this linux system: 77 hours.

Quoting the search string is a good practice. Whether to use a single or double quote depends on if we want the shell to expand the expression before executing the grep process.

3.2. Case-Insensitive Search

The basic string search with grep is pretty simple. What if we want to search lines containing “linux” or “Linux” — that is, do a case-insensitive search? grep‘s -i option can help us with that:

$ grep -i 'linux' input.txt
Linux is a great system.
Learning linux is very interesting.
This Linux system has 17 users.
The uptime of this linux system: 77 hours.

We can see that all lines containing linux or Linux are listed.

3.3. Whole-Word Search

We can use the -w option to tell grep to treat the pattern as a whole word.

For example, let's find lines in our input file that contain “is” as a whole word:

$ grep -w 'is' input.txt 
Linux is a great system.
Learning linux is very interesting.

Note that the lines containing the word “this” – but not the word “is” – were not included in the result.

4. Advanced grep Usage

4.1. Regular Expressions

If we've understood the meaning of grep‘s name, it's not hard to imagine that regular expressions (regex) and grep are good friends. GNU grep understands three different versions of regular expression syntax:

  • BRE (Basic Regular Expressions)
  • ERE (Extended Regular Expressions)
  • PCRE (Perl Compatible Regular Expressions)

In GNU grep, there is no difference in functionality between the basic and extended syntaxes. However, PCRE gives additional functionality and is more powerful than both BRE and ERE.

By default, grep will use BRE. In BRE, the meta-characters ?, +, {, |, (, and ) lose their special meanings. We can use the backslash-escaped versions \?, \+, \{, \|, \(, and \) to make them have special meanings.

With the -E option, grep will work with ERE syntax. In ERE, the meta-characters we mentioned above have special meanings. If we backslash-escape them, they lose their special meanings.

Finally, the -P option will tell grep to do pattern matching with PCRE syntax.

4.2. Fixed String Search

We've learned that grep will do a BRE search by default. So the pattern “linux” or “is” that we gave in the previous examples are regex as well. They don't have any characters with special meaning. Therefore, they match the literal text “linux” and “is“.

If the text we want to search contains any characters with special meaning in regex (for example, “.”  or “*“), we have to either escape those characters or use the -F option, to tell grep to do a fixed-string search.

For example, we may want to search for lines containing “*/opt*“:

$ grep -F '*/opt*' input.txt 
There are 250 files under */opt*.

Let's do the same without using the -F option:

$ grep '\*/opt\*' input.txt
There are 250 files under */opt*.

4.3. Inverting the Search

We can use grep to search lines that don't contain a certain pattern. Let's see an example that finds all lines that don't contain numbers:

$ grep -v '[0-9]' input.txt 
Linux is a great system.
Learning linux is very interesting.


File report

[0-9] in the above example is a regex that matches on a single numerical digit.

If we switch to PCRE with the -P option, we can use \d to match a numerical digit and get the same result:

$ grep -vP '\d' input.txt
Linux is a great system.
Learning linux is very interesting.


File report

In the outputs of the above two commands, we see that empty lines are also matched because blank lines don't have numerical digits either.

4.4. Print Only the Matched Parts

As we can see, grep prints each line that matches a pattern. However, sometimes only the matched parts are interesting for us. We can make use of the -o option to tell grep to print only matched parts of a matching line.

For example, we may want to find all strings that look like directories:

$ grep -o '/[^/*]*' input.txt
/
/opt
/home
/root

5. Other grep Tricks

5.1. Print Additional Context Lines Before or After Match

Sometimes we want to see lines before or after our matching lines in the result. grep has three options to handle additional context lines: -B (before a match), -A (after a match), and -C (before and after a match).

Now, let's search for the text “report” and print the three lines after the matching line:

$ grep -A3 'report' input.txt 
File report
There are 100 directories under */*.
There are 250 files under */opt*. 
There are 300 files under */home/root*.

The context line control options can be handy when we want to check several continuous lines but only know one line among them matching some pattern.

For example, YAML is widely used in applications for configuration files. Instead of viewing the entire configuration file, we might only need to see part of it. For example, to see the datasource configuration in a YAML file, we can make use of grep‘s -A option:

$ grep -A5 'datasource' src/main/resources/application.yml
datasource:
  driverClassName: ${DATABASE_DRIVER}
  url: ${DATABASE_URL}
  username: ${DATABASE_USERNAME}
  password: ${DATABASE_PASSWORD}

5.2. Count the Matching Lines

The -c option in grep allows us to suppress the standard output, and instead print only the count of matching lines. For example, we want to know how many lines contain “*”:

$ grep -Fc '*' input.txt
3

grep is a line-based search utility. The -c option will output the count of matched lines instead of the count of pattern occurrences. That's why the above command outputs three instead of six.

5.3. Recursively Search a Directory

In addition to files, grep accepts a directory as input as well. A common problem is to search in a directory recursively and find all files that contain some pattern.

Let's search in the /var/log directory recursively to find all files that contain “boot”. Here, we'll use the -l option to skip the matching information and let grep print only the file names of matched files:

$ grep -Rl 'boot' /var/log
/var/log/lxdm.log
/var/log/pacman.log
/var/log/Xorg.0.log
/var/log/nginx/access.log
/var/log/nginx/error.log

6. Conclusion

In this article, we’ve learned how to use the grep command to do simple text searches and how to control the output. grep finds text efficiently and quickly and is a great tool to have in our arsenal of Linux commands.


Viewing all articles
Browse latest Browse all 4535

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>