Skip to content

Estimating Glycemic Impact of Cooking Recipes via Online Crowdsourcing and Machine Learning (DPH’ 19)

Notifications You must be signed in to change notification settings

LARC-CMU-SMU/dph19-glycemic-impact

Repository files navigation

Overview

This is the github repo for
Lee, H., Achananuparp, P., Liu, Y., Lim, E. P., & Varshney, L. R. (2019). Estimating Glycemic Impact of Cooking Recipes via Online Crowdsourcing and Machine Learning. Short paper accepted at 9th International Digital Public Health Conference (DPH’ 19), ACM arxiv

Slides

Our slides and scripts presented in DPH'19 could be found at here

Project Structure

By default, the project assumes the following directory structure:

├── data                                    # Files that we save
│   ├── dic_20191203.pickle                 # The textual description of recipe, AMT annotation, and nutritional properties
│   ├── Recipe54k-trained embeddings        # Some pickle files
│   ├── combined.csv                        # 1000 recipes with crowdsourcing annotations
│   └── ... 
│ 
├── RQ1                                     # How we conduct data preprocessing and crowdsourcing to answer the Research Question 1
│   └── ... 
│ 
├── RQ2                                     # Models trained to answer the Research Question 2 and saved to csv/ and pickle/
│   └── RQ2_original                        # How we prepare the results on paper, it is not well-structured and requires dic_20190819.pickle
│   │   └── ...
│   └── RQ2_reproducible                    # We selectively reproduce the best models in our study and re-organize the notebooks. It requires dic_20191203.pickle
│   │   └── Best models in RQ2.ipynb        # Best non-nutritional model: NB-BoW + LR
│   │                                       # Best overall model:         Pre-trained GloVe + Nutritional information + LGBM
│   │                                       # Second best overall model:  NB-BoW + Nutritional information + LR
│   │                                       # Nutrition-only model:       Nutritional information + LR
├── pickle     
├── csv     
└── ...

Dataset

We crawled the 55k recipes from http://allrecipes.com and had 1000 recipes labelled by the AMT workers. However, we were not allowed to release the 55k dataset (i.e. The dic_20190819.pickle we used in most of the notebooks) As a result, we released the dic_20191203.pickle file instead, which should be enough for reproducing more of our experiments except training the embeddings. It contains 990 recipes instead of 55k recipes.

Recipe54k-trained embeddings

This work involves a lot of embeddings trained on Recipe54k. As a result, we share the trained embeddings

Python Version

We use python version 3.6.6 in this work

About

Estimating Glycemic Impact of Cooking Recipes via Online Crowdsourcing and Machine Learning (DPH’ 19)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published