Docfd

TUI multiline fuzzy document finder

Think interactive grep for text files, PDFs, DOCXs, etc, but word/token based instead of regex and line based, so you can search across lines easily.

Docfd aims to provide good UX via integration with common text editors and PDF viewers, so you can jump directly to a search result with a single key press.

Features

Multithreaded indexing and searching
Multiline fuzzy search of multiple files
Content view pane that shows the snippet surrounding the search result selected
Text editor and PDF viewer integration
Editable command history - rewrite/plan your actions in text editor
Search scope narrowing - limit scope of next search based on current search results
Clipboard integration

Installation

Statically linked binaries for Linux and macOS are available via GitHub releases.

Docfd is also packaged on the following platforms for Linux:

opam
AUR (as docfd-bin) by kseistrup
Nix (as docfd) by chewblacka

The only way to use Docfd on Windows right now is via WSL.

Notes for packagers: Outside of the OCaml toolchain for building (if you are packaging from source), Docfd also requires the following external tools at run time for full functionality:

pdftotext from poppler-utils for PDF support
pandoc for support of .epub, .odt, .docx, .fb2, .ipynb, .html, and .htm files
fzf for file selection menu
wl-clibpard for clipboard support on Wayland
xclip for clipboard support on X11

Basic usage

The typical usage of Docfd is to either cd into the directory of interest and launch docfd directly, or specify the paths as arguments:

docfd [PATH]...

The list of paths can contain directories. Each directory in the list is scanned recursively for files with the following extensions by default:

For multiline search mode:
- .txt, .md, .pdf, .epub, .odt, .docx, .fb2, .ipynb, .html, .htm
For single line search mode:
- .log, .csv, .tsv

You can change the file extensions to use via --exts and --single-line-exts, or add onto the list of extensions via --add-exts and --single-line-add-exts.

If the list PATHs is empty, then Docfd defaults to scanning the current directory . unless any of the following is used: --paths-from, --glob, --single-line-glob.

Documentation

See GitHub Wiki for more examples/cookbook, and technical details.

Limitations

Docfd cannot handle large collection of documents yet
- Docfd right now loads the entire content index into memory, which severely limits how many documents it can handle at a time.
- Content index redesign is ongoing to remove this limitation.
File auto-reloading is not supported for PDF files, as PDF viewers are invoked in the background via shell. It is possible to support this properly in the ways listed below, but requires a lot of engineering for potentially very little gain:
- Docfd waits for PDF viewer to terminate fully before resuming, but this prohibits viewing multiple search results simultaneously in different PDF viewer instances.
- Docfd manages the launched PDF viewers completely, but these viewers are closed when Docfd terminates.
- Docfd invokes the PDF viewers via shell so they stay open when Docfd terminates. Docfd instead periodically checks if they are still running via the PDF viewers' process IDs, but this requires handling forks.
- Outside of tracking whether the PDF viewer instances interacting with the files are still running, Docfd also needs to set up file update handling either via inotify or via checking file modification times periodically.

Acknowledgement

Big thanks to @lunacookies and @jthvai for the many UI/UX discussions and suggestions
Demo gifs and some screenshots are made using vhs.
ripgrep-all was used as reference for text extraction software choices
Marc Coquand (author of Stitch) for discussions and inspiration of results narrowing functionality
Part of the search syntax was copied from fzf
Command history editing workflow was inspired by Git interactive rebase workflow, e.g. git rebase -i

Name		Name	Last commit message	Last commit date
Latest commit History 1,463 Commits
.github/workflows		.github/workflows
bin		bin
containers		containers
demo-vhs-gifs		demo-vhs-gifs
demo-vhs-tapes		demo-vhs-tapes
file-collection-tests.t		file-collection-tests.t
lib		lib
line-wrapping-tests.t		line-wrapping-tests.t
match-type-tests.t		match-type-tests.t
misc-behavior-tests.t		misc-behavior-tests.t
printing-tests.t		printing-tests.t
profiling		profiling
screenshots		screenshots
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo-vhs.sh		demo-vhs.sh
docfd.opam		docfd.opam
docfd.opam.locked		docfd.opam.locked
docfd.opam.template		docfd.opam.template
dune-project		dune-project
publish.sh		publish.sh
run-container.sh		run-container.sh
run.sh		run.sh
update-version-string.py		update-version-string.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docfd

Features

Installation

Basic usage

Documentation

Limitations

Acknowledgement

About

Releases 82

Packages

Contributors 2

Languages

License

darrenldl/docfd

Folders and files

Latest commit

History

Repository files navigation

Docfd

Features

Installation

Basic usage

Documentation

Limitations

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 82

Packages 0

Contributors 2

Languages

Packages