Coffee ratings

The data this week comes from Coffee Quality Database courtesy of Buzzfeed Data Scientist James LeDoux. The original data can be found on James' github. The data was re-posted to Kaggle.

"These data were collected from the Coffee Quality Institute's review pages in January 2018."

Thrillist has an article on the top coffee-producing countries.

Yorgos Askalidis analyzed this data as well.

There is data for both Arabica and Robusta beans, across many countries and professionally rated on a 0-100 scale. All sorts of scoring/ratings for things like acidity, sweetness, fragrance, balance, etc - may be useful for either separating into visualizations/categories or for modeling/recommenders.

Wikipedia on Coffee Beans:

The two most economically important varieties of coffee plant are the Arabica and the Robusta; ~60% of the coffee produced worldwide is Arabica and ~40% is Robusta. Arabica beans consist of 0.8–1.4% caffeine and Robusta beans consist of 1.7–4% caffeine.

Wiki on Cupping

Coffee cupping, or coffee tasting, is the practice of observing the tastes and aromas of brewed coffee. It is a professional practice but can be done informally by anyone or by professionals known as "Q Graders". A standard coffee cupping procedure involves deeply sniffing the coffee, then loudly slurping the coffee so it spreads to the back of the tongue. The coffee taster attempts to measure aspects of the coffee's taste, specifically the body (the texture or mouthfeel, such as oiliness), sweetness, acidity (a sharp and tangy feeling, like when biting into an orange), flavour (the characters in the cup), and aftertaste. Since coffee beans embody telltale flavours from the region where they were grown, cuppers may attempt to identify the coffee's origin.

Importantly - there is the concept of ethical or Fair Trade coffee - we'll be covering more of the production numbers of Coffee in a future dataset.

Based on the simple idea that the products bought and sold every day are connected to the livelihoods of others, fair trade is a way to make a conscious choice for a better world.

Fair Trade Coffee definition from Wikipedia:

Fair trade coffee is coffee that is certified as having been produced to fair trade standards by fair trade organizations, which create trading partnerships that are based on dialogue, transparency and respect, with the goal of achieving greater equity in international trade. These partnerships contribute to sustainable development by offering better trading conditions to coffee bean farmers. Fair trade organizations support producers and sustainable environmental farming practices and prohibit child labor or forced labor.

If you're looking to buy some coffee - check out this list of 12 Black-Owned Coffee Brands.

Get the data here

# Get the Data

# Read in with tidytuesdayR package 
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest

# Either ISO-8601 date or year/week works!

tuesdata <- tidytuesdayR::tt_load('2020-07-07')
tuesdata <- tidytuesdayR::tt_load(2020, week = 28)

coffee_ratings <- tuesdata$coffee_ratings

# Or read in the data manually

coffee_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv')

Data Dictionary

`coffee_ratings.csv`

Note full description/examples at: Coffee Quality Institute

variable	class	description
total_cup_points	double	Total rating/points (0 - 100 scale)
species	character	Species of coffee bean (arabica or robusta)
owner	character	Owner of the farm
country_of_origin	character	Where the bean came from
farm_name	character	Name of the farm
lot_number	character	Lot number of the beans tested
mill	character	Mill where the beans were processed
ico_number	character	International Coffee Organization number
company	character	Company name
altitude	character	Altitude - this is a messy column - I've left it for some cleaning
region	character	Region where bean came from
producer	character	Producer of the roasted bean
number_of_bags	double	Number of bags tested
bag_weight	character	Bag weight tested
in_country_partner	character	Partner for the country
harvest_year	character	When the beans were harvested (year)
grading_date	character	When the beans were graded
owner_1	character	Who owns the beans
variety	character	Variety of the beans
processing_method	character	Method for processing
aroma	double	Aroma grade
flavor	double	Flavor grade
aftertaste	double	Aftertaste grade
acidity	double	Acidity grade
body	double	Body grade
balance	double	Balance grade
uniformity	double	Uniformity grade
clean_cup	double	Clean cup grade
sweetness	double	Sweetness grade
cupper_points	double	Cupper Points
moisture	double	Moisture Grade
category_one_defects	double	Category one defects (count)
quakers	double	quakers
color	character	Color of bean
category_two_defects	double	Category two defects (count)
expiration	character	Expiration date of the beans
certification_body	character	Who certified it
certification_address	character	Certification body address
certification_contact	character	Certification contact
unit_of_measurement	character	Unit of measurement
altitude_low_meters	double	Altitude low meters
altitude_high_meters	double	Altitude high meters
altitude_mean_meters	double	Altitude mean meters

Cleaning Script

library(tidyverse)

raw_arabica <- read_csv("https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv") %>% 
  janitor::clean_names()

raw_robusta <- read_csv("https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/robusta_data_cleaned.csv",
                        col_types = cols(
                          X1 = col_double(),
                          Species = col_character(),
                          Owner = col_character(),
                          Country.of.Origin = col_character(),
                          Farm.Name = col_character(),
                          Lot.Number = col_character(),
                          Mill = col_character(),
                          ICO.Number = col_character(),
                          Company = col_character(),
                          Altitude = col_character(),
                          Region = col_character(),
                          Producer = col_character(),
                          Number.of.Bags = col_double(),
                          Bag.Weight = col_character(),
                          In.Country.Partner = col_character(),
                          Harvest.Year = col_character(),
                          Grading.Date = col_character(),
                          Owner.1 = col_character(),
                          Variety = col_character(),
                          Processing.Method = col_character(),
                          Fragrance...Aroma = col_double(),
                          Flavor = col_double(),
                          Aftertaste = col_double(),
                          Salt...Acid = col_double(),
                          Balance = col_double(),
                          Uniform.Cup = col_double(),
                          Clean.Cup = col_double(),
                          Bitter...Sweet = col_double(),
                          Cupper.Points = col_double(),
                          Total.Cup.Points = col_double(),
                          Moisture = col_double(),
                          Category.One.Defects = col_double(),
                          Quakers = col_double(),
                          Color = col_character(),
                          Category.Two.Defects = col_double(),
                          Expiration = col_character(),
                          Certification.Body = col_character(),
                          Certification.Address = col_character(),
                          Certification.Contact = col_character(),
                          unit_of_measurement = col_character(),
                          altitude_low_meters = col_double(),
                          altitude_high_meters = col_double(),
                          altitude_mean_meters = col_double()
                        )) %>% 
  janitor::clean_names() %>% 
  rename(acidity = salt_acid, sweetness = bitter_sweet,
         aroma = fragrance_aroma, body = mouthfeel,uniformity = uniform_cup)


all_ratings <- bind_rows(raw_arabica, raw_robusta) %>% 
  select(-x1) %>% 
  select(total_cup_points, species, everything())

all_ratings %>% 
  skimr::skim()

all_ratings %>% 
  write_csv("2020/2020-07-07/coffee_ratings.csv")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Coffee ratings

Get the data here

Data Dictionary

`coffee_ratings.csv`

Cleaning Script

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Coffee ratings

Get the data here

Data Dictionary

coffee_ratings.csv

Cleaning Script

`coffee_ratings.csv`