Home

Awesome

<p align="center"> <img height="100" src="doc/images/logo.png"/> </p>

Highlights

<p align="center"> <img src="doc/images/ignore_case_ascii.png"/> </p>

Performance

The following tests compare the performance of hypergrep against:

System Details

TypeValue
Processor11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz
Instruction Set ExtensionsIntel® SSE4.1, Intel® SSE4.2, Intel® AVX2, Intel® AVX-512
Installed RAM32.0 GB (31.9 GB usable)
SSDADATA SX8200PNP
OSUbuntu 20.04 LTS
C++ Compilerg++ (Ubuntu 11.1.0-1ubuntu1-20.04) 11.1.0

Vcpkg Installed Libraries

vcpkg commit: 662dbb5

LibraryVersion
argparse2.9
concurrentqueue1.0.3
fmt10.0.0
hyperscan5.4.2
libgit21.6.4

Single Large File Search: OpenSubtitles.raw.en.txt

The following searches are performed on a single large file cached in memory (~13GB, OpenSubtitles.raw.en.gz).

RegexLine Countagugrepripgrephypergrep
Count number of times Holmes did something<br/>hgrep -c 'Holmes did \w'27n/a1.8201.0220.696
Literal with Regex Suffix<br/>hgrep -nw 'Sherlock [A-Z]\w+' en.txt7882n/a1.8121.5090.803
Simple Literal<br/>hgrep -nw 'Sherlock Holmes' en.txt765315.7641.8881.5240.658
Simple Literal (case insensitive)<br/>hgrep -inw 'Sherlock Holmes' en.txt787115.5996.9452.1620.650
Alternation of Literals<br/>hgrep -n 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' en.txt10078n/a6.8861.8360.689
Alternation of Literals (case insensitive)<br/>hgrep -in 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' en.txt10333n/a7.0293.9400.770
Words surrounding a literal string<br/>hgrep -n '\w+[\x20]+Holmes[\x20]+\w+' en.txt5020n/a6m 11s1.5230.638

Git Repository Search: torvalds/linux

The following searches are performed on the entire Linux kernel source tree (after running make defconfig && make -j8). The commit used is f1fcb.

RegexLine Countagugrepripgrephypergrep
Simple Literal<br/>hgrep -nw 'PM_RESUME'92.8070.3160.1470.140
Simple Literal (case insensitive)<br/>hgrep -niw 'PM_RESUME'392.9040.4350.1490.141
Regex with Literal Suffix<br/>hgrep -nw '[A-Z]+_SUSPEND'5363.0801.4520.1480.143
Alternation of four literals<br/>hgrep -nw '(ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)'163.0850.4100.1530.146
Unicode Greek<br/>hgrep -n '\p{Greek}'1113.7620.4840.3450.146

Git Repository Search: apple/swift

The following searches are performed on the entire Apple Swift source tree. The commit used is 3865b.

RegexLine Countagugrepripgrephypergrep
Function/Struct/Enum declaration followed by a valid identifier and opening parenthesis<br/>hgrep -n '(func|struct|enum)\s+[A-Za-z_][A-Za-z0-9_]*\s*\('590261.1480.9540.1540.090
Words starting with alphabetic characters followed by at least 2 digits<br/>hgrep -nw '[A-Za-z]+\d{2,}'1278581.1691.2380.1560.095
Workd starting with Uppercase letter, followed by alpha-numeric chars and/or underscores <br/>hgrep -nw '[A-Z][a-zA-Z0-9_]*'20123723.1312.5980.5500.482
Guard let statement followed by valid identifier<br/>hgrep -n 'guard\s+let\s+[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*\w+'8390.8280.1740.0540.047

Directory Search: /usr

The following searches are performed on the /usr directory.

RegexLine Countagugrepripgrephypergrep
Any HTTPS or FTP URL<br/>hgrep "(https?|ftp)://[^\s/$.?#].[^\s]*"136824.5972.8940.3050.171
Any IPv4 IP address<br/>hgrep -w "(?:\d{1,3}\.){3}\d{1,3}"126434.7272.3400.3240.166
Any E-mail address<br/>hgrep -w "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"475095.47737.2090.4940.220
Any valid date MM/DD/YYYY<br/>hgrep "(0[1-9]|1[0-2])/(0[1-9]|[12]\d|3[01])/(19|20)\d{2}"1164.2391.8270.2510.163
Count the number of HEX values<br/>hgrep -cw "(?:0x)?[0-9A-Fa-f]+"680425.76528.6911.4390.611
Search any C/C++ for a literal<br/>hgrep --filter "\.(c|cpp|h|hpp)$" test7355n/a0.5050.1180.079

Build

Install Dependencies with vcpkg

git clone https://github.com/microsoft/vcpkg
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg install concurrentqueue fmt argparse libgit2 hyperscan

Build hypergrep using cmake and vcpkg

Clone the repository

git clone https://github.com/p-ranav/hypergrep
cd hypergrep

If cmake is older than 3.19

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=<path_to_vcpkg>/scripts/buildsystems/vcpkg.cmake ..
make

If cmake is newer than 3.19

Use the release preset:

export VCPKG_ROOT=<path_to_vcpkg>
cmake -B build -S . --preset release
cmake --build build

Binary Portability

To build the binary for x86_64 portability, invoke cmake with -DBUILD_PORTABLE=on option. This will use -march=x86-64 -mtune=generic and -static-libgcc -static-libstdc++, and link the C++ standard library and GCC runtime statically into the binary, reducing dependencies on the target system.