Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

Commit

Permalink
Merge pull request #4 from xbelonogov/fix_readmemd
Browse files Browse the repository at this point in the history
fixed readme
  • Loading branch information
kalaidin authored Jul 22, 2019
2 parents e84cb42 + 94eebf5 commit 16fcf1d
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ Trains BPE model and saves to file.
* `vocab_size`: int, number of tokens in the final vocabulary
* `coverage`: float, fraction of characters covered by the model. Must be in the range [0, 1]. A good value to use is about 0.9999.
* `n_threads`: int, number of parallel threads used to run. If
* equal to -1 then maximum number of threads available will be used.
equal to -1 then minimum of the number of available threads and 8
will be used (see [benchmark](benchmark.md#number-of-threads)).
* `pad_id`: int, reserved id for padding
* `unk_id`: int, reserved id for unknown symbols
* `bos_id`: int, reserved id for begin of sentence token
Expand Down Expand Up @@ -234,7 +235,8 @@ Options:

Apply BPE encoding for a corpus of sentences. Use `stdin` for input and `stdout` for output.

By default, encoding works in parallel using `n_threads` threads.
By default, encoding works in parallel using `n_threads` threads. Number of threads is limited by
8 (see [benchmark](benchmark.md#number-of-threads)).

With the `--stream` option, `--n_threads` will be ignored and all sentences will be processed one by one.
Each sentence will be tokenized and written to the `stdout` before the next sentence is read.
Expand Down
1 change: 0 additions & 1 deletion youtokentome/youtokentome.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ def encode(
) -> Union[List[List[int]], List[List[str]]]:
if not isinstance(output_type, OutputType):
raise TypeError(
# f"parameter output_type must be youtokentome.OutputType, not {type(output_type)}"
"parameter output_type must be youtokentome.OutputType, not %s}" % str(type(output_type))
)

Expand Down

0 comments on commit 16fcf1d

Please sign in to comment.