Skip to content

Commit

Permalink
further clarifications in README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Elias authored and Elias committed Feb 20, 2024
1 parent 9f56909 commit ad7bdd1
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 4 deletions.
22 changes: 19 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,13 @@ This tells `rosinenpicker` to look in `/path/to/documents` for PDF files contain
Now of course it is not very useful to just extract the term "apple pie" out of documents. But you can do much more. Instead of "apple pie" you can enter a regular expression, e.g. "\d{8}" to extract numbers consisting of exactly eight digits. But there's more: if you enter an expression along with "@@@" (which stands for "variable string"), only a match to "@@@" is returned. E.g. "Name: @@@" will return whatever follows "Name:"!

#### Even further fine-grained control
You can even add more fine-grained control by appending characters after the string '===' (three equal signs):
You *can* (i.e. you don't have to) even add more fine-grained control by appending characters after the string '===' (three equal signs):
- `m` (**m**ultiline) will allow multiline pattern matching (default: off)
- `l` (**l**inebreak to space) will replace linebreaks with space (only applies for multiline matching, default: off)
- `c(x)` (**c**rop length to x) will crop the length of the returned string to x (default: off)
- `?` (optional term) will mark the term as not optional (default: off, i.e. without the question mark the term is assumed to be required); optional key `move_to_directory` (see [sample configuration file](configs/config.yml)) will ignore these terms.

So e.g. the term `start@@@finish===mc(100)l?` will search for text between pattern "start" and "finish" over multiple lines, replace line breaks with space, crop the returned text to 100 characters, and mark the term as optional (i.e. not required).
You can use one of above options or several together; the order doesn't count, the main thing is that the option is represented by above flags. So e.g. the term `start@@@finish===mc(100)l?` would search for text between pattern "start" and "finish" over multiple lines, replace line breaks with space, crop the returned text to 100 characters, and mark the term as optional (i.e. not required); it could also have been written as `start@@@finish===lc(100)?m` (i.e. flag order is up to you)!

## Using `rosinenpicker`

Expand All @@ -74,7 +74,23 @@ For a list of commands and options, type:
rosinenpicker -h
```

This command displays all you need to know to navigate `rosinenpicker`.
This command displays all you need to know to navigate `rosinenpicker`:

```
usage: rosinenpicker [-h] [-c CONFIG] [-d DATABASE] [-v] [-r READOUT]
A package for picking the juciest text morsels out of a pile of documents.
options:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Path to configuration YAML file.
-d DATABASE, --database DATABASE
Path to SQLite database file.
-v, --version Print version and exit.
-r READOUT, --readout READOUT
Only read contents of file and exit.
```

## Conclusion

Expand Down
2 changes: 1 addition & 1 deletion src/rosinenpicker/start.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = '0.1.7'
__version__ = '0.1.8'
import yaml
import re
import os
Expand Down

0 comments on commit ad7bdd1

Please sign in to comment.