GitHub - LARC-CMU-SMU/dph19-glycemic-impact: Estimating Glycemic Impact of Cooking Recipes via Online Crowdsourcing and Machine Learning (DPH’ 19)

Overview

This is the github repo for
Lee, H., Achananuparp, P., Liu, Y., Lim, E. P., & Varshney, L. R. (2019). Estimating Glycemic Impact of Cooking Recipes via Online Crowdsourcing and Machine Learning. Short paper accepted at 9th International Digital Public Health Conference (DPH’ 19), ACM arxiv

Slides

Our slides and scripts presented in DPH'19 could be found at here

Project Structure

By default, the project assumes the following directory structure:

├── data                                    # Files that we save
│   ├── dic_20191203.pickle                 # The textual description of recipe, AMT annotation, and nutritional properties
│   ├── Recipe54k-trained embeddings        # Some pickle files
│   ├── combined.csv                        # 1000 recipes with crowdsourcing annotations
│   └── ... 
│ 
├── RQ1                                     # How we conduct data preprocessing and crowdsourcing to answer the Research Question 1
│   └── ... 
│ 
├── RQ2                                     # Models trained to answer the Research Question 2 and saved to csv/ and pickle/
│   └── RQ2_original                        # How we prepare the results on paper, it is not well-structured and requires dic_20190819.pickle
│   │   └── ...
│   └── RQ2_reproducible                    # We selectively reproduce the best models in our study and re-organize the notebooks. It requires dic_20191203.pickle
│   │   └── Best models in RQ2.ipynb        # Best non-nutritional model: NB-BoW + LR
│   │                                       # Best overall model:         Pre-trained GloVe + Nutritional information + LGBM
│   │                                       # Second best overall model:  NB-BoW + Nutritional information + LR
│   │                                       # Nutrition-only model:       Nutritional information + LR
├── pickle     
├── csv     
└── ...

Dataset

We crawled the 55k recipes from http://allrecipes.com and had 1000 recipes labelled by the AMT workers. However, we were not allowed to release the 55k dataset (i.e. The dic_20190819.pickle we used in most of the notebooks) As a result, we released the dic_20191203.pickle file instead, which should be enough for reproducing more of our experiments except training the embeddings. It contains 990 recipes instead of 55k recipes.

Recipe54k-trained embeddings

This work involves a lot of embeddings trained on Recipe54k. As a result, we share the trained embeddings

Python Version

We use python version 3.6.6 in this work

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.ipynb_checkpoints		.ipynb_checkpoints
RQ1		RQ1
RQ2_original		RQ2_original
RQ2_reproducible		RQ2_reproducible
data		data
david_kwene_dph		david_kwene_dph
models		models
pickle		pickle
skip_thoughts		skip_thoughts
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Slides

Project Structure

Dataset

Recipe54k-trained embeddings

Python Version

About

Releases

Packages

Languages

LARC-CMU-SMU/dph19-glycemic-impact

Folders and files

Latest commit

History

Repository files navigation

Overview

Slides

Project Structure

Dataset

Recipe54k-trained embeddings

Python Version

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages