Skip to content

Use vector search to search for documents related to topics derived from a topic model.

License

Notifications You must be signed in to change notification settings

kstohr/topic-vector-search

Repository files navigation

Thinking of Topic Modeling as Search

Use vector search to search for documents related to topics derived from a topic model.

Presentation

Typically when you think about using a topic model in production you encounter two hurdles: First, topics change continually, and document tags become stale as soon as they are created. Second, while unsupervised topic models do a good job of clustering topics, creating robust, human-interpretable labels is challenging. Framing topic modeling as a search problem, helps overcome these challenges and makes it easier to use supervised or unsupervised topic models in real-time applications.

Requirements

Installation

After cloning the repository and change directory to the project root:

  1. Run Docker Compose This will install a local version of the Open Search database.
docker compose up
  1. Install and launch Poetry environment
pipx install poetry
poetry install
poetry shell
  1. Create a .env file with OPENAI credentials (or adapt to the service of your choice)
OPENAI_API_KEY="sk-some-key"
OPENAI_ORGANIZATION="org-some-org-or-none"
OPENAI_PROJECT="proj_some-project-or-none"

Comments

This demonstration depends on OpenSearch, which is a forked version of ElasticSearch, and is now supported by AWS's managed service rather than ElasticSearch. However, this functionality also exists on ElasticSearch. The code to run the search is slightly different and you will need to adapt this example.

References and Credits

Thanks go to Maarten Grootendorst for his work and excellent documentation in BERTopic as well as colleagues at Unified and coding partner, Ray 'Urgent'McLendon for his interest and input.

Open Search Vector Search:
https://opensearch.org/docs/latest/search-plugins/vector-search/

Text Embeddings: [https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/]
(https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/)

BERTopic - Package for topic modeling by Maarten Grootendorst
https://github.com/MaartenGr/BERTopic

Comparing Clustering Algorithms (HDBSCAN)
https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html

c-TF-IDF
https://www.maartengrootendorst.com/blog/ctfidf/

Vector Search with Text Search
https://towardsdatascience.com/text-search-vs-vector-search-better-together-3bd48eb6132a
https://machine-mind-ml.medium.com/enhancing-llm-performance-with-vector-search-and-vector-databases-1f20eb1cc650

About

Use vector search to search for documents related to topics derived from a topic model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages