QText

QText is a microservices framework for building the RAG pipeline, or semantic search engine on top of Postgres. It provides a simple API to add, query, and highlight the text in your existing database.

The main features include:

Full-text search with Postgres GIN index.
Vector and sparse search with pgvecto.rs
Reranking with cross-encoder model, cohere reranking API, or other methods.
Semantic highlight

Besides this, qtext also provides a dashboard to visualize the vector search, sparse vector search, full text search, and reranking results.

Design goals

Simple: easy to deploy and use.
Customizable: can be integrated into your existing databases.
Extensible: can be extended with new features.

How to use

To start all the services with docker compose:

docker compose -f docker/compose.yaml up -d server

Some of the dependent services can be opt-out:

emb: used to generate embedding for query and documents
sparse: used to generate sparse embedding for query and documents (this requires a HuggingFace token that signed the agreement for prithivida/Splade_PP_en_v1)
highlight: used to provide the semantic highlight feature
encoder: rerank with cross-encoder model, you can choose other methods or other online services

For the client example, check:

test.py: simple demo.
test_cohere_wiki.py: a Wikipedia dataset with Cohere embedding.

API

We provide a simple sync/async client. You can also refer to the OpenAPI and build your own client.

/api/namespace POST: create a new namespace and configure the index
/api/doc POST: add a new doc
/api/query POST: query the docs
/api/highlight POST: semantic highlight
/metrics GET: open metrics

Check the OpenAPI documentation for more information (this requires the qtext service).

Terminal UI

We provide a simple terminal UI powered by Textual for you to interact with the service.

pip install textual
# need to run the qtext service first
python tui/main.py $QTEXT_PORT

Configurations

Check the config.py for more detail. It will read the $HOME/.config/qtext/config.json if this file exists.

Integrate to the RAG pipeline

This project has most of the components you need for the RAG except for the last LLM generation step. You can send the retrieval + reranked docs to any LLM providers to get the final result.

Customize the table schema

Note

If you already have the table in Postgres, you will be responsible for the text-indexing and vector-indexing part.

Define a dataclass that includes the necessary columns as class attributes
- annotate the primary_key, text_index, vector_index, sparse_index with metadata (not all of them are required, only the necessary ones)
- attributes without default value or default factory is treated as required when you add new docs
Implement the to_record and from_record methods to be used in the reranking stage
Change the config.vector_store.schema to the class you have defined

Check the schema.py for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
docker		docker
docs/images		docs/images
qtext		qtext
tui		tui
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
test.py		test.py
test_cohere_wiki.py		test_cohere_wiki.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QText

Design goals

How to use

API

Terminal UI

Configurations

Integrate to the RAG pipeline

Customize the table schema

About

Releases

Packages

Contributors 2

Languages

License

tensorchord/qtext

Folders and files

Latest commit

History

Repository files navigation

QText

Design goals

How to use

API

Terminal UI

Configurations

Integrate to the RAG pipeline

Customize the table schema

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages