Skip to content

Latest commit

 

History

History
97 lines (80 loc) · 4.67 KB

README.md

File metadata and controls

97 lines (80 loc) · 4.67 KB

Codebase AI

Issues Contributors License TiDB Cloud

Table of contents

Overview

The Semantic Search Engine for Code Repositories is an AI-powered tool designed to help developers find relevant code snippets, functions, or entire libraries based on natural language queries. By leveraging advanced NLP techniques, large language models (LLMs), and TiDB Serverless with Vector Search, this tool allows users to efficiently locate specific code patterns, structures, or algorithms within a codebase.

Watch our demo on YouTube

Features

  • Natural Language Querying: Search for code using plain English queries like "find the piece of code that initializes the BST" or "locate the function that performs quicksort."
  • Vector Search: Utilizes TiDB Serverless's Vector Search capabilities to identify and retrieve semantically similar code snippets.
  • File and Line Number Retrieval: Provides the exact file path and line number where the relevant code appears, along with a code snippet for context.
  • Contextual Understanding: Employs LLMs to understand the context and intent behind queries, making the search highly accurate and intuitive.
  • Code Reuse Encouragement: Facilitates code reuse by making it easy to find existing solutions, reducing redundancy in development.

App-Architecture

app architect

Prerequisites 🛠️

Before you begin, ensure you have the following installed on your machine:

  • Python 3.8+
  • Git
  • Virtual Environment (Optional but recommended)
    You'll also need:
  • A TiDB Cloud account and a Serverless instance set up.

Installation ⚙️

  1. Clone the Repository
git clone https://github.com/jackabald/TiDB-Hack-NL-repo-search.git  
cd TiDB-Hack-NL-repo-search`
  1. Set Up a Virtual Environment (Optional but Recommended)
python -m venv venv  
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  1. Install Dependencies
pip install -r requirements.txt
  1. Install, configure Ollama and connect to large language models via the Ollama server.
ollama pull deepseek-coder
  1. Set up your secrets.toml file under .streamlit directory and copy example.secrets.toml into secrets.toml and replace the keys
TIDB_URL="<your-tidb-pymysql>"
GITHUB_TOKEN="<your-github-token>"
JINA_API_KEY="<your-jina-api-key>"

Contributing 🤝

Contributions to this project are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request on the project's GitHub repository.

Getting-started-with-TiDB-Vectors

Are you looking for implementing TiDB vectors in your application? Curious about getting started. You can definitely jump into the official docs here.

License 📝

This project is licensed under the Apache License. Feel free to use, modify, and distribute the code as per the terms of the license.