MTB Trail Recommendation app that uses HuggingFace and PineCone VectorDB to embed text descriptions of trail data and upload to a backend PineCone DB that allows NLP searching.
- For basic search we can run PineConeSearch.py
- Update PineConeSearch with your trail criteria
- $ python PineConeSearch.py
- First we run the mtb-project-crawler project to crawl and scrape mtb data and store in a jsonline file for storing trail/ urls.
* cd MTBCrawlerScraper/
* Run the crawler using:
$scrapy crawl mtbproject --logfile mtbproject.log -o mtbproject.jl:jsonlines
- Run MTBTrailController.py to store data to a MongoDB
- Uses a pool processor to split urls and process them in chunks
- Calls MTBTrailUrlParser.py & MTBTrailParser.py for creating Routes and Descriptions
- Calls MTBTrailMongoDB.py to load the route/description tuples to Atlas
- Need to create the pkl data from the MongoDB to be used by PineCone
- Run the MTBTrailPickleCreator.py which creates route and desc pkl files
* cd PineCone/
- Run PineConeEmbeddingCreator.py to create the data to insert to PineCone VectorDB
- Creates new mtb_route_dataset.pkl pkl file with updated data
- Run PineConeCreateServerless.py to create the serverless index if it DNE
- Run PineConeDatasetUpload.py that takes text data, embeds and uploads to PineCone
- Reads the mtb_route_dataset.pkl pkl file in app/pkl_data
- Uses PineCone upsert to upload batches of vector data to PineCone
- Run PineConeSearch.py that searches for trails using NLP
- Uses an embedded query search for natural text search
- Uses conditional filter for searching based on metadata
- MapBox API for showing the trail route data
- PineConeSearchLoader.py - contains pine cone connections
- MTBLoadDataset.py - load the dataset for the mtb data
- See the app/README.md file for all GCloud commands
- Follow all suceeding directions on setting up and mapping domains