Home

Awesome

Docfd

TUI multiline fuzzy document finder

Think interactive grep for text files, PDFs, DOCXs, etc, but word/token based instead of regex and line based, so you can search across lines easily.

Docfd aims to provide good UX via integration with common text editors and PDF viewers, so you can jump directly to a search result with a single key press.


Navigating repo and editing command history:


Quick search with non-interactive mode:


Navigating PDF and opening it to the closest location to the selected search result via PDF viewer integration:

Features

<details>

Text editor integration

Docfd uses the text editor specified by $VISUAL (this is checked first) or $EDITOR.

Docfd opens the file to the first line of the search result for the following editors:

PDF viewer integration

Docfd guesses the default PDF viewer based on the output of xdg-mime query default application/pdf, and invokes the viewer either directly or via flatpak depending on where the desktop file can be first found in the list of directories specified by $XDG_DATA_DIRS.

Docfd opens the file to the first page of the search result and starts a text search of the most unique word of the matched phrase within the same page for the following viewers:

Docfd opens the file to the first page of the search result for the following viewers:

</details>

Installation

Statically linked binaries for Linux and macOS are available via GitHub releases.

Docfd is also packaged on the following platforms for Linux:

The only way to use Docfd on Windows right now is via WSL.

Notes for packagers: Outside of the OCaml toolchain for building (if you are packaging from source), Docfd also requires the following external tools at run time for full functionality:

Launching

Read from piped stdin

command | docfd

No paths should be supplied as arguments in this case. If any paths are specified, then stdin is ignored.

Handling a large collection of files

In this case, the default cache soft limit might not be enough or you might want to keep a stable cache for this collection of files

The following script template may be handy in this situation for creating a collection specific cache

#!/usr/bin/env bash

docfd --cache-dir /large/collection/.cache --cache-soft-limit 20000 /large/collection

Scan for files

docfd [PATH]...

The list of paths can contain directories. Each directory in the list is scanned recursively for files with the following extensions by default:

You can change the file extensions to use via --exts and --single-line-exts, or add onto the list of extensions via --add-exts and --single-line-add-exts.

If the list PATHs is empty, then Docfd defaults to scanning the current directory . unless any of the following is used: --paths-from, --glob, --single-line-glob.

Scan for files then select with fzf

docfd [PATH]... ?

The ? can be in any position in the path list. If any of the path is ?, then file selection of the discovered files via fzf is invoked.

Use list of paths from file

docfd [PATH]... --paths-from paths.txt

The final list of paths used is then the concatenation of PATHs and paths listed in paths.txt, which has one path per line.

Globbing

docfd --glob 'relative/path/glob' --glob '/absolute/path/glob'

Resolution of relative globs starts at current working directory.

File collection rules

<details> </details>

File globbing

It matches the common file globbing syntax

Additional markers:

Searching

The search field takes a search expression as input. A search expression is one of:

To use literal ?, (, ) or |, a backslash (\) needs to be placed in front of the character.

A search phrase is a sequence of tokens where a token is one of:

Tokens that are not separated by spaces, operators, or parentheses are treated specially, we call these linked tokens. For example, 12, :, 30 are linked in 12:30, but not in 12 : 30. Linked tokens have a much stricter search distance by default, e.g. in 12:30, Docfd will search for : only up to a few tokens away from 12, and so on. This allows user to state intention of reduced fuzziness.

To link spaces to tokens, one needs to be make use of ~. For example, to search for "John Smith" ("John" and "Smith" separated by some number of spaces), one can use John~Smith to establish linkage.

For ', ^, $ to be considered annotation markers, there cannot be space between the marker and token, e.g. ^abc means "prefix match abc", but ^ abc means "fuzzy match ^ and fuzzy match abc".

Annotated linked tokens are also treated specially:

But with even stricter search restriction than the normal linked tokens, namely the next matching token must follow immediately from the current match, e.g. ^12:3 will not match 12 : 30 but will match 12:30

Search is asynchronous, specifically:

<details>

Optional operator handling specifics

For a phrase with optional operator, such as ?word0 word1 ..., the first word is grouped implicitly, i.e. it is treated as (?word0) word1 ....

Search phrase and search procedure

Document content and user input in the search field are tokenized/segmented in the same way, based on:

A search phrase is a list of said tokens.

Search procedure is a DFS through the document index, where the search range for a word is fixed to a configured range surrounding the previous word (when applicable).

A token in the index matches a token in the search phrase if they fall into one of the following cases:

Search results are then ranked using a heuristic.

</details>

UI

The default TUI is divided into four sections:

File path filter bar consists of the file path filter status indicator and the file path filter field. The file path filter status indicator shows one of the following values:

Search bar consists of the search status indicator and the search field. The search status indicator shows one of the following values:

Controls

<details>

Docfd operates in modes, the initial mode is navigation mode.

Navigation mode

Search mode

Filter mode

Clear mode

Copy mode

Copy paths mode

Narrow mode

Drop mode

Reload mode

</details>

Limitations

Acknowledgement