Cuisine Predictor is a python based tool wrapping ML models which use Scikit Learn's LinearSVC and kNeighborsClassifier to predict the cuisine and similar dishes from Yummly catlog.
The project's python code follows PEP8 Style Guide
- Scikit-learn - scikit-learn is a Python module for machine learning built on top of SciPy
- Pandas - Flexible and powerful data analysis / manipulation library for Python
- Nltk - NLTK is a platform for building Python programs to work with human language data
- Pytest - Testing framework that supports complex functional testing
- Pytest-cov - Coverage plugin for pytest
- autopep8 - Tool that automatically formats Python code to conform to the PEP 8 style guide
- Jupyterlab - Browser-based computational environment for python
- Matplotlib - Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python
- Seaborn - Seaborn is a Python data visualization library based on matplotlib
-
Clone this repository and move into the folder.
$ git clone https://github.com/Biswas-N/cs5293sp22-project2.git $ cd cs5293sp22-project2
-
Install dependencies using Pipenv.
$ pipenv install
-
Run the utility tool
$ make
Note: Project includes a
Makefile
which has commonly used commands. By runningmake
the following commandpipenv run python project2.py --N 5 --ingredient "chili powder" --ingredient "crushed red pepper flakes" --ingredient "garlic powder" --ingredient "sea salt" --ingredient "ground cumin" --ingredient "onion powder" --ingredient "dried oregano" --ingredient paprika
is executed.Note on Model: If pre-fitted models does not exist in
models
folder, this tool creates fitted-models based onassets/yummly.json
data and stores them inmodels
folder usingjoblib
. So the first execution may take more time than typical execution times.
The documentation about code structure, model building and prediction algorithm can be found here.
This utility is tested using pytest.
Documentation about the tests can be found here. Follow the below commands to run tests on your local system.
- Install dev-dependencies.
$ pipenv install --dev
- Run tests using
Makefile
.$ make test
- Run test coverage.
$ make cov
- This tool assumes that the
yummly.json
is present inassets
folder. So if the data file is not present in the folder, this tool may fail. - Similarly, this tool assumes there are three keys in each JSON object in
yummly.json
file calledid
,cuisine
andingredients
(should be a list of ingredients). If the data insideyummly.json
is not as expected, the tool may fail. - This tool is built using LinearSVC and KNeighborsClassifier, and trained using the given
yummly.json
data. So this tools accuracy is based on the data quality provided and statistical techniques used in the above said models. I tried using the best approaches possible for pre-processing and model fitting in the given time constraints. But there is always more to do, so there might be cases in which the tool can predict less-accurate results.