A simple and efficient token count program written in Rust! 🚀
English | 简体中文 | 繁體中文 | 日本語 | 한국어 | Deutsch
This Rust implementation of the classic wc
(word count) command-line tool allows you to count lines, words, characters, and even tokens in text files or from standard input. It's fast, reliable, and supports Unicode! 🌍✨
- Count lines 📏
- Count words 🔤
- Count characters (including multi-byte Unicode characters) 🔡
- Count tokens using various tokenizer models 🔢
- Process multiple files 📚
- Read from standard input 🖥️
- Supports various languages (English, Korean, Japanese, and more!) 🌐
There are two ways to install tc:
-
Make sure you have Rust installed on your system. If not, get it from rust-lang.org 🦀
-
Clone this repository:
git clone https://github.com/guuzaa/tc.git cd tc
-
Build the project:
cargo build --release
-
The executable will be available at
target/release/tc
-
Go to the Releases page of the tc repository.
-
Download the latest release for your operating system and architecture.
-
Extract the downloaded archive.
-
Move the
tc
executable to a directory in your system's PATH (e.g.,/usr/local/bin
on Unix-like systems). -
You can now use tc from anywhere in your terminal!
-l, --lines
: Show line count 📏-w, --words
: Show word count 🔤-c, --chars
: Show character count 🔡-t, --tokens
: Show token count 🔢--model <MODEL>
: Choose tokenizer model (default: gpt3)
Available models:
gpt3
: r50k_baseedit
: p50k_editcode
: p50k_basechatgpt
: cl100k_basegpt4o
: o200k_base
If no options are specified, all counts (lines, words, characters, and tokens) will be shown.
-
Count lines, words, and characters in a file:
tc example.txt
-
Count only words in multiple files:
tc -w file1.txt file2.txt file3.txt
-
Count lines and characters from standard input:
echo "Hello, World!" | tc -lc
-
Count tokens using the ChatGPT tokenizer:
tc -t --model chatgpt example.txt
-
Count everything in files with different languages:
tc english.txt korean.txt japanese.txt
Contributions are welcome! Feel free to submit issues or pull requests. 🎉
This project is licensed under the MIT License. See the LICENSE file for details. 📄
- The Rust community for their amazing tools and support 🦀❤️
- The original Unix
wc
command for inspiration 🖥️ - The editor Cursor 🤖
Happy counting! 🎉📊🚀