Skip to content
/ tc Public

๐Ÿ“Š A simple and efficient token count program written in Rust!

License

Notifications You must be signed in to change notification settings

guuzaa/tc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

18 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š Token Count (tc) ๐Ÿฆ€

A simple and efficient token count program written in Rust! ๐Ÿš€

English | ็ฎ€ไฝ“ไธญๆ–‡ | ็น้ซ”ไธญๆ–‡ | ๆ—ฅๆœฌ่ชž | ํ•œ๊ตญ์–ด | Deutsch

๐Ÿ“ Description

This Rust implementation of the classic wc (word count) command-line tool allows you to count lines, words, characters, and even tokens in text files or from standard input. It's fast, reliable, and supports Unicode! ๐ŸŒโœจ

๐ŸŽฏ Features

  • Count lines ๐Ÿ“
  • Count words ๐Ÿ”ค
  • Count characters (including multi-byte Unicode characters) ๐Ÿ”ก
  • Count tokens using various tokenizer models ๐Ÿ”ข
  • Process multiple files ๐Ÿ“š
  • Read from standard input ๐Ÿ–ฅ๏ธ
  • Supports various languages (English, Korean, Japanese, and more!) ๐ŸŒ

๐Ÿ› ๏ธ Installation

There are two ways to install tc:

Option 1: Install from source

  1. Make sure you have Rust installed on your system. If not, get it from rust-lang.org ๐Ÿฆ€

  2. Clone this repository:

    git clone https://github.com/guuzaa/tc.git
    cd tc
    
  3. Build the project:

    cargo build --release
    
  4. The executable will be available at target/release/tc

Option 2: Install pre-built binaries

  1. Go to the Releases page of the tc repository.

  2. Download the latest release for your operating system and architecture.

  3. Extract the downloaded archive.

  4. Move the tc executable to a directory in your system's PATH (e.g., /usr/local/bin on Unix-like systems).

  5. You can now use tc from anywhere in your terminal!

๐Ÿš€ Usage

Options:

  • -l, --lines: Show line count ๐Ÿ“
  • -w, --words: Show word count ๐Ÿ”ค
  • -c, --chars: Show character count ๐Ÿ”ก
  • -t, --tokens: Show token count ๐Ÿ”ข
  • --model <MODEL>: Choose tokenizer model (default: gpt3)

Available models:

  • gpt3: r50k_base
  • edit: p50k_edit
  • code: p50k_base
  • chatgpt: cl100k_base
  • gpt4o: o200k_base

If no options are specified, all counts (lines, words, characters, and tokens) will be shown.

Examples:

  1. Count lines, words, and characters in a file:

    tc example.txt
    
  2. Count only words in multiple files:

    tc -w file1.txt file2.txt file3.txt
    
  3. Count lines and characters from standard input:

    echo "Hello, World!" | tc -lc
    
  4. Count tokens using the ChatGPT tokenizer:

    tc -t --model chatgpt example.txt
    
  5. Count everything in files with different languages:

    tc english.txt korean.txt japanese.txt
    

๐Ÿค Contributing

Contributions are welcome! Feel free to submit issues or pull requests. ๐ŸŽ‰

๐Ÿ“œ License

This project is licensed under the MIT License. See the LICENSE file for details. ๐Ÿ“„

๐Ÿ™ Acknowledgements

  • The Rust community for their amazing tools and support ๐Ÿฆ€โค๏ธ
  • The original Unix wc command for inspiration ๐Ÿ–ฅ๏ธ
  • The editor Cursor ๐Ÿค–

Happy counting! ๐ŸŽ‰๐Ÿ“Š๐Ÿš€