Parvus - Quantum-Inspired Data Compression

A sophisticated data compression and similarity search system that uses quantum-inspired techniques for efficient data storage and retrieval. The name "Parvus" comes from the Latin word for "small" or "reduced", reflecting the system's primary purpose of data reduction while maintaining semantic meaning.

Features

Quantum-Inspired Compression: Utilizes advanced dimensionality reduction techniques
Semantic Search: Performs similarity searches on compressed data
GPU Acceleration: Supports GPU-accelerated processing for improved performance
Interactive GUI: Streamlit-based UI for easy data manipulation
Flexible Input: Supports both JSON and CSV file formats

Installation

From PyPI

# Basic installation
pip install parvus

# With GPU support
pip install parvus[gpu]

# For development
pip install parvus[dev]

From Source

Clone the repository:

git clone https://github.com/yourusername/parvus.git
cd parvus

Install dependencies:

pip install -r requirements.txt

For GPU acceleration (recommended):

Install CUDA toolkit (if using NVIDIA GPU)
Install GPU-enabled packages:

conda install -c pytorch faiss-gpu

Usage

Command Line Interface

The package provides a command-line interface for basic operations:

Compressing Data

python -m parvus --input data.json --output compressed_output

The input JSON file should have the following format:

{
    "messages": [
        {
            "id": "1",
            "content": "Your text content here"
        }
    ]
}

Searching Compressed Data

python -m parvus --load compressed_output --query "your search query"

Starting the GUI

python -m parvus --gui

Starting the API Server

python -m parvus --server

Starting the GUI

python -m parvus --gui
# or
python -m streamlit run gui.py

Using the Python API

from parvus import ParvusCompressor

# Initialize the system
compressor = ParvusCompressor()

# Load and compress data
embeddings = compressor.load_data_from_json('your_data.json')
compressor.compress(embeddings)

# Perform queries
results, distances = compressor.query("Your search query")

System Requirements

Python 3.8+
RAM: 16GB recommended
GPU: NVIDIA GPU with CUDA support (optional)
Storage: Depends on dataset size

Project Structure

parvus/
├── parvus.py              # Core compression engine
├── gui.py                 # Streamlit-based interface
├── server.py              # Server endpoints and API
├── requirements.txt       # Project dependencies
├── README.md             # Project documentation
├── CONTRIBUTING.md       # Contribution guidelines
├── data/                 # Sample and test data
│   ├── large_chat_history.json
│   ├── sample_data.csv
│   ├── sample_data.npy
│   └── test_data.json
├── models/               # Saved model states
│   ├── compressed_data.pkl
│   └── faiss_index.bin
├── tests/                # Test files and artifacts
│   ├── test_compressed.pkl
│   └── test_index.bin
└── archive/             # Archived/deprecated files

Essential Components

Core Files:
- parvus.py: Main compression engine implementing quantum-inspired algorithms
- gui.py: Interactive web interface built with Streamlit
- server.py: Server endpoints and API for integration
- requirements.txt: Project dependencies
Data Directory:
- Contains sample data files
- Test datasets
- JSON and CSV examples
Models Directory:
- Saved compression states
- FAISS indices
- Serialized model data
Tests Directory:
- Test artifacts
- Compressed test data
- Test indices

Architecture

The system consists of three main components:

Core Compression Engine (parvus.py)
- Handles data compression and decompression
- Manages similarity search operations
- Provides GPU acceleration when available
Interactive Interface (gui.py)
- Web-based user interface
- File upload and management
- Query interface
- Results visualization
Server API (server.py)
- RESTful API endpoints
- Data management
- Remote compression operations

Performance

Performance metrics on a typical dataset:

Compression Ratio: ~5x (varies by data)
Query Time: <100ms (with GPU)
Memory Usage: Proportional to dataset size

Development

Creating a New Release

To publish a new version to PyPI:

Update the version number in:
- setup.py
- parvus/__init__.py
Create and push a new tag:

git tag v0.1.x
git push origin v0.1.x

Go to GitHub and create a new release using the tag. The GitHub Action will automatically build and publish to PyPI.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

Support

For support, please open an issue in the GitHub repository or contact the maintainers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parvus - Quantum-Inspired Data Compression

Features

Installation

From PyPI

From Source

Usage

Command Line Interface

Compressing Data

Searching Compressed Data

Starting the GUI

Starting the API Server

Starting the GUI

Using the Python API

System Requirements

Project Structure

Essential Components

Architecture

Performance

Development

Creating a New Release

License

Contributing

Support

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parvus - Quantum-Inspired Data Compression

Features

Installation

From PyPI

From Source

Usage

Command Line Interface

Compressing Data

Searching Compressed Data

Starting the GUI

Starting the API Server

Starting the GUI

Using the Python API

System Requirements

Project Structure

Essential Components

Architecture

Performance

Development

Creating a New Release

License

Contributing

Support