Knowledge Base Project

This repository contains a scalable Knowledge Base system that integrates web scraping, data cleaning, categorization, and vector similarity search for machine learning (ML) and artificial intelligence (AI) content. The project uses technologies like Flask, Scrapy, Milvus, and Redis to provide an end-to-end solution for collecting, storing, and querying data.

Features

Web Scraping: Automated article collection from predefined sources using Scrapy.
Data Cleaning: Clean and normalize text data for better analysis.
Categorization: Classify content into predefined ML/AI categories using zero-shot classification models.
Vector Similarity Search: Store and retrieve documents using vector embeddings with Milvus.
Interactive Web Interface: Query and interact with the knowledge base via a user-friendly web app.
Microservices Architecture: Deployed using Docker and Docker Compose for seamless scaling.

Directory Structure

artpedro-knowledge_base_project/
├── app/                # Core application code
│   ├── cleaner/        # Text cleaning utilities
│   ├── embeddings/     # Vector embeddings generation
│   ├── milvus_handler/ # Milvus database interaction
│   ├── organizer/      # Content categorization
│   ├── scraper/        # Web scraping using Scrapy
│   ├── worker/         # Asynchronous task processing
│   ├── templates/      # HTML templates for Flask
│   ├── retrieval.py    # Query and retrieval logic
│   ├── routes.py       # API routes for Flask
├── tests/              # Unit tests for all components
├── requirements.txt    # Python dependencies
├── Dockerfile          # Docker setup for the Flask app
├── docker-compose.yml  # Multi-container setup
└── run.py              # Flask application entry point

Installation

Prerequisites

Python 3.9+
Docker & Docker Compose
Redis & Milvus installed (or use the provided docker-compose.yml)

Setup Instructions

Clone the Repository:

git clone https://github.com/username/knowledge_base_project.git
cd knowledge_base_project

Environment Variables: Create a .env file in the project root to define your environment variables:

MILVUS_HOST=standalone
MILVUS_PORT=19530
REDIS_HOST=redis
REDIS_PORT=6379
OPENAI_API_KEY=your_openai_api_key

Install Python Dependencies:
```
pip install -r requirements.txt
```
Start Services: Run the app and its dependencies using Docker Compose:
```
docker-compose up --build
```
Access the Application: Visit the web interface at http://localhost:5000.

Usage

Web Interface

Home Page: Displays key functionalities of the application.
Health Check: Verify the application's health and dependencies.
Query: Ask questions or search the knowledge base.
Scraper Trigger: Start the web scraping process for new data.

API Endpoints

Health Check: GET /health
- Verify the health of the application.
Trigger Scraper: POST /scrape
- Trigger a scraping job with an optional url.
Query: POST /query
- Query the knowledge base with JSON payload: {"query": "Your question"}.

Testing

Run the unit tests using unittest:

python -m unittest discover -s tests

Deployment

Docker Compose

Use the provided docker-compose.yml to deploy the application along with its dependencies:

docker-compose up -d

Kubernetes (Optional)

For production-grade deployment, you can extend the Docker configuration for Kubernetes using Helm charts or other orchestration tools.

Contributing

Fork the repository.
Create a new branch for your feature:
```
git checkout -b feature-name
```

Commit your changes:

git commit -m "Description of your feature"

Push to your branch and open a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Milvus for vector similarity search.
Scrapy for powerful web scraping.
OpenAI for the GPT API.
Redis for job queue management.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
app		app
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
requirements-flask.txt		requirements-flask.txt
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Base Project

Features

Directory Structure

Installation

Prerequisites

Setup Instructions

Usage

Web Interface

API Endpoints

Testing

Deployment

Docker Compose

Kubernetes (Optional)

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

artpedro/knowledge_base_project

Folders and files

Latest commit

History

Repository files navigation

Knowledge Base Project

Features

Directory Structure

Installation

Prerequisites

Setup Instructions

Usage

Web Interface

API Endpoints

Testing

Deployment

Docker Compose

Kubernetes (Optional)

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages