This article aims to find the optimal way to search for a word in a folder containing multiple inner folders and a large set of files.

I did the below experiment on a folder containing 8918 folders and 48170 files. The purpose of this experiment is to find out the various ways of searching a string in these folders and trying to find the performance of each.

grep

Using grep with -r to recursively search in the current directory and -i for ignoring case and -n to display line numbers and -F to treat the search term as a fixed string rather than regexp. grep is used to search term in the current directory like below.

grep -rinF search_term .

On executing with time

$ time grep -rinF "logo sc-cxo-logo" .
Binary file ./build/docroot.tar matches
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:
    <a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  
    class="logo sc-cxo-logo" title="Back to homepage">
...
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} -rinF  .  
116.12s user 8.37s system 38% cpu 5:22.50 total

above took 5 mins 22 seconds to complete the search.

Note: Previous versions of grep might not support recursive (-r or -R) and also in POSIX systems also this option is not available, now if this option is not available we can use find with grep

grep + find

If the below command is used

grep search_term`find . -type f`

Using the above command, we first find all files and apply grep on those files; this will work if the folder has fewer files. If you have a large number of files, then it will fail with argument list too long: grep

On executing with time

$ time grep "logo sc-cxo-logo" `find . -type f`

zsh: argument list too long: grep

grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} "logo sc-cxo-logo"   
0.42s user 0.04s system 99% cpu 0.459 total

grep + find + exec {} \;

In order to avoid argument list too long: grep let’s use exec . We can use

find . -type f -exec grep -n search_term {} \; -print

Each line is found by the find; it would be fed to grep to search in that file. ( {} is replaced with each file)

On executing with time

$ time find . -type f -exec grep -n "logo sc-cxo-logo" {} \; -print
Binary file ./build/docroot.tar matches
./build/docroot.tar
43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html
65:                  class="logo sc-cxo-logo" title="Back to homepage">

...

find . -type f -exec grep -n "logo sc-cxo-logo" {} \; -print  
124.81s user 139.21s system 51% cpu 8:34.44 total

The above worked fine, but it took 8 minutes 34 seconds to complete. Can we improve this?

grep + find + exec {}+

We can use

find . -type f -exec grep -n search_term {} +

Above is same as option 3, but instead ; we are using + . By having + , set of as many paths possible are sent to grep ( {} is replaced with as many paths as possible)

$ time find . -type f -exec grep -n "logo sc-cxo-logo" {} +
Binary file ./build/docroot.tar matches
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    
   <a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:
   class="logo sc-cxo-logo" title="Back to homepage">

...

find . -type f -exec grep -n "logo sc-cxo-logo" {} +  
82.80s user 7.51s system 26% cpu 5:35.59 total

The above command took 5 mins 35 seconds, similar to grep -r. Can we do this in another way?

grep + find + xargs

Now lets try

find . -type f -print0 | xargs -0 grep -n search_term

The same can be accomplished with xargs too.

Note: —print0 and -0 are required if the folders and filenames contains spaces.

On executing with time

$ time find . -type f -print0 | xargs -0 grep -n "logo sc-cxo-logo"
Binary file ./build/docroot.tar matches
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:
   <a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:
   class="logo sc-cxo-logo" title="Back to homepage">

...

xargs -0 grep -n "logo sc-cxo-logo"  
82.92s user 7.36s system 32% cpu 4:39.14 total

Above took 4 minutes 39 seconds which did somewhat better than option 1 and option 3. Can we do better than this? Yes, by using third-party utilities.

ack

ack is a grep-like source code search tool.

On executing with time

$ time ack "logo sc-cxo-logo” * 
modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html
43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>
modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html
65:                  class="logo sc-cxo-logo" title="Back to homepage">

...

ack "logo sc-cxo-logo" *  
6.14s user 6.61s system 9% cpu 2:09.14 total

Above took 2 min, 9 seconds which is a significant improvement on previous options. Can we do better than this?

rg

ripgrep is a line-oriented search tool that recursively searches your current directory for a regex pattern

On executing with time

$ time rg 'logo sc-cxo-logo'
modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp
23:       <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">
modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp
111:       class="logo sc-cxo-logo" title="Back to homepage">

...

rg 'logo sc-cxo-logo'  
1.23s user 4.83s system 8% cpu 1:13.77 total

The above command took, 1 min, 13 seconds, which is better than above all options. Note by default, ripgrep excludes folders like bin. Can we do better than this?

ag

Silver Searcher A code searching tool similar to ack, with a focus on speed.

$ time ag "logo sc-cxo-logo" *
modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html
43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>
modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html
65:                  class="logo sc-cxo-logo" title="Back to homepage">

ag "logo sc-cxo-logo" *  
0.88s user 8.13s system 16% cpu 53.761 total

Above took 54 seconds which is a significant improvement. Thus a search which took 5 mins, by using the above tools, we can search in seconds.

Conclusion

Below is the summary of this experiment

command execution time
grep 5 mins 22 seconds
grep + find + exec {} \; 8 mins 34 seconds
grep + find + exec {} + 5 mins 35 seconds
grep + find + xargs 4 mins 39 seconds
ack 2 mins 9 seconds
rg 1 min 13 seconds
ag 54 seconds

We went through various commands to search in a folder; we went through different options and tools to find the optimal way to search.

– RC