This is a vesy simple way to map your text data using Altas from NOMIC using the lib click
.
You have to create an account to get API_KEY NOMIC.
<< Atlas enables you to:
-
Store, update and organize multi-million point datasets of unstructured text, images and embeddings.
-
Visually interact with your datasets from a web browser.
-
Run semantic search and vector operations over your datasets. Use Atlas to:
- Visualize, interact, collaborate and share large datasets of text and embeddings.
- Collaboratively clean, tag and label your datasets
- Build high-availability apps powered by semantic search
- Understand and debug the latent space of your AI model trains >>
To install the necessary dependencies, run the following command:
python -m venv mymapenv
source mymapenv/bin/activate
pip install --upgrade pip
pip install text2mapviewer
Login/create your Nomic account:
nomic login
If you have already your account :
nomic login [YOUR_API_TOKEN_NOMIC_HERE]
from text2mapviewer.examples.map_embedding import project
# Use the projet from the lib text2mapviewer
print(project)
python scr/text2mapviewer/examples/map_embedding_click.py --num_embeddings 10000 --embedding_dim 256
This project supports a variety of transformer models, including models from the Hugging Face Model Hub and sentence-transformers. Below are some examples: - Hugging Face Model: 'prajjwal1/bert-mini' - Hugging Face Model: 'Sahajtomar/french_semantic' (french version for semantic search embedding) - Sentence-Transformers Model: 'sentence-transformers/all-MiniLM-L6-v2' etc...
Please ensure that the model you choose is compatible with the project requirements and adjust the --transformer_model_name
option accordingly.
pip install -r requirements.txt
python main.py --transformer-model-name MODEL_NAME --cache_dir CACHE_DIR --batch-size BATCH_SIZE --file-path FILE_PATH
NOTE: for the CACHE_DIR : you can setup it like ==>
export TRANSFORMERS_CACHE=/path_to_your/transformers_cache
Give a fidback.