CoffeeTalk ☕

CoffeeTalk is a tool that transforms codebases into fine-tuned language models. It offers flexible training configurations tailored to various hardware profiles, accommodating both memory-constrained systems and high-performance setups. This makes it accessible for developers with a wide range of hardware resources.

CoffeeTalk can be used by developers to enhance code understanding, automate code documentation, and generate highly contextual code explanations. It can also serve as a backend for AI systems that require deep code comprehension capabilities, enabling them to assist in code reviews, refactoring, and debugging. By integrating CoffeeTalk, both human developers and AI systems can leverage its capabilities to improve productivity and code quality through a nuanced understanding of code context.

Setup and Usage

Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:

On Mac/Linux:
```
source venv/bin/activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Run the script:

Note: If TARGET_REPO_PATH is not set, you will be prompted. Set it via an environment variable or pass it as a command-line argument, e.g., TARGET_REPO_PATH="/path/to/repo" python src/main.py.
```
python src/main.py
```
For verbose output:
```
python src/main.py -v
```
Deactivate the virtual environment:
```
deactivate
```

Process Diagram

graph TD
    Start[Target Code / Repository] --> CoffeeTalk

    subgraph CoffeeTalk["CoffeeTalk ☕"]
        LD[Language Detection] --> ECS

        ECS{{Extract Code Snippets}} --> CS

        PTM_1[Pre-trained Model] --> GenerateTD
        CS[Code Snippets] --> GenerateTD
        HP_1[Hardware Profile] --> GenerateTD

        GenerateTD{{Generate Training Data}} --> TD

        PTM_2[Pre-trained Model] --> TrainData
        TD[Training Data] --> TrainData
        HP_2[Hardware Profile] --> TrainData

        TrainData{{Train Data}} --> FTM
    end

    FTM[Fine-tuned Model] --> End[Inference by User/AI]

Configuration

Hardware Profiles

CoffeeTalk supports different hardware profiles to optimize training for various systems. The profiles are defined in src/hardware_profiles.py and can be selected by setting the HARDWARE_PROFILE environment variable. Available profiles include:

apple_silicon: Optimized for Apple Silicon Macs with limited memory.
cuda_gpu: For systems with CUDA-enabled GPUs.
cpu: For CPU-only systems.

Model Selection

You can choose the model to use for training by setting the TRAINING_MODEL environment variable. If not set, the script defaults to distilgpt2, which is a highly memory-efficient model.

Testing

CoffeeTalk uses pytest for testing. After installing the dependencies, you can run tests using any of these commands:

Run all tests:
```
pytest
```
Run with verbose output:
```
pytest -v
```
Run with coverage report:
```
pytest --cov=src
```

Run a specific test file:

pytest tests/test_language_extractors.py

TODO

Add more tests
Specify versions in requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoffeeTalk ☕

Setup and Usage

Process Diagram

Configuration

Hardware Profiles

Model Selection

Testing

TODO

About

Languages

License

rob-mosher/coffeetalk-app

Folders and files

Latest commit

History

Repository files navigation

CoffeeTalk ☕

Setup and Usage

Process Diagram

Configuration

Hardware Profiles

Model Selection

Testing

TODO

About

Resources

License

Stars

Watchers

Forks

Languages