A simple and efficient token count program written in Rust! ๐
English | ็ฎไฝไธญๆ | ็น้ซไธญๆ | ๆฅๆฌ่ช | ํ๊ตญ์ด | Deutsch
This Rust implementation of the classic wc
(word count) command-line tool allows you to count lines, words, characters, and even tokens in text files or from standard input. It's fast, reliable, and supports Unicode! ๐โจ
- Count lines ๐
- Count words ๐ค
- Count characters (including multi-byte Unicode characters) ๐ก
- Count tokens using various tokenizer models ๐ข
- Process multiple files ๐
- Read from standard input ๐ฅ๏ธ
- Supports various languages (English, Korean, Japanese, and more!) ๐
There are two ways to install tc:
-
Make sure you have Rust installed on your system. If not, get it from rust-lang.org ๐ฆ
-
Clone this repository:
git clone https://github.com/guuzaa/tc.git cd tc
-
Build the project:
cargo build --release
-
The executable will be available at
target/release/tc
-
Go to the Releases page of the tc repository.
-
Download the latest release for your operating system and architecture.
-
Extract the downloaded archive.
-
Move the
tc
executable to a directory in your system's PATH (e.g.,/usr/local/bin
on Unix-like systems). -
You can now use tc from anywhere in your terminal!
-l, --lines
: Show line count ๐-w, --words
: Show word count ๐ค-c, --chars
: Show character count ๐ก-t, --tokens
: Show token count ๐ข--model <MODEL>
: Choose tokenizer model (default: gpt3)
Available models:
gpt3
: r50k_baseedit
: p50k_editcode
: p50k_basechatgpt
: cl100k_basegpt4o
: o200k_base
If no options are specified, all counts (lines, words, characters, and tokens) will be shown.
-
Count lines, words, and characters in a file:
tc example.txt
-
Count only words in multiple files:
tc -w file1.txt file2.txt file3.txt
-
Count lines and characters from standard input:
echo "Hello, World!" | tc -lc
-
Count tokens using the ChatGPT tokenizer:
tc -t --model chatgpt example.txt
-
Count everything in files with different languages:
tc english.txt korean.txt japanese.txt
Contributions are welcome! Feel free to submit issues or pull requests. ๐
This project is licensed under the MIT License. See the LICENSE file for details. ๐
- The Rust community for their amazing tools and support ๐ฆโค๏ธ
- The original Unix
wc
command for inspiration ๐ฅ๏ธ - The editor Cursor ๐ค
Happy counting! ๐๐๐