Skip to content

Commit

Permalink
Bob's Burgers (#768)
Browse files Browse the repository at this point in the history
* Bob's Burgers

Closes #767.

* Remove large original dataset.

* Newline at end of cleaning.

* Accept submission

---------

Co-authored-by: jonthegeek <[email protected]>
  • Loading branch information
jonthegeek and jonthegeek authored Nov 18, 2024
1 parent 5c99ca5 commit 696ff62
Show file tree
Hide file tree
Showing 8 changed files with 414 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ If you are using TidyTuesday to teach data-related skills, [please let us know](
| 44 | `2024-10-29` | [Monster Movies](data/2024/2024-10-29/readme.md) | [IMDb non-commercial datasets](https://developer.imdb.com/non-commercial-datasets/) | [Why Do People Like Horror Films? A Statistical Analysis](https://www.statsignificant.com/p/why-do-people-like-horror-films-a) |
| 45 | `2024-11-05` | [Democracy and Dictatorship](data/2024/2024-11-05/readme.md) | [democracyData R Package](https://xmarquez.github.io/democracyData/index.html) | [Regime types and regime change: A new dataset on democracy, coups, and political institutions](https://link.springer.com/article/10.1007/s11558-019-09345-1) |
| 46 | `2024-11-12` | [ISO Country Codes](data/2024/2024-11-12/readme.md) | [ISOcodes R Package](https://cran.r-project.org/package=ISOcodes) | [ISO 3166 on Wikipedia](https://en.wikipedia.org/wiki/ISO_3166) |
| 47 | `2024-11-19` | [Bob's Burgers Episodes](data/2024/2024-11-19/readme.md) | [bobsburgersR R Package](https://github.com/poncest/bobsburgersR) | [Bob’s Burgers Episode Fingerprints by Season](https://stevenponce.netlify.app/projects/standalone_visualizations/sa_2024-11-11.html) |

***

Expand Down
Binary file added data/2024/2024-11-19/bobs_seasons.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/2024/2024-11-19/bobsburgersR.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
273 changes: 273 additions & 0 deletions data/2024/2024-11-19/episode_metrics.csv

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions data/2024/2024-11-19/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
title: "Bob's Burgers Episodes"
article:
title: "Bob’s Burgers Episode Fingerprints by Season"
url: "https://stevenponce.netlify.app/projects/standalone_visualizations/sa_2024-11-11.html"
data_source:
title: "bobsburgersR R Package"
url: "https://github.com/poncest/bobsburgersR"
images:
# Please include at least one image, and up to three images
- file: "bobs_seasons.png"
alt: >
A series of radar charts showing dialogue patterns across 14 seasons of
Bob’s Burgers. Each season chart displays metrics including Dialogue
Density, Average Length, Sentiment Variance, Unique Words, Question Ratio,
and Exclamation Ratio. Light purple polygons represent individual episodes,
while dark purple lines show season averages, revealing how dialogue
patterns evolved throughout the series.
- file: "bobsburgersR.png"
alt: >
Hex logo for the bobsburgersR R package, featuring a cartoon burger in the
style of the show, and the letters "bob's burgers R" in the font used by
the show.
credit:
# We want to thank you for curating this dataset! If you do not want a
# particular type of credit, please delete the related line.
post: "Jon Harmon"
linkedin: "https://www.linkedin.com/in/jonthegeek"
bluesky: "https://bsky.app/profile/jonthegeek.com"
mastodon: "@fosstodon.org@jonthegeek"
github: "https://github.com/jonthegeek"
108 changes: 108 additions & 0 deletions data/2024/2024-11-19/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Bob's Burgers Episodes

This week we're exploring Bob's Burgers dialogue! Thank you to [Steven Ponce](https://github.com/poncest) for [the data](https://github.com/poncest/bobsburgersR), and a [blog post demonstrating how to visualize the data](https://stevenponce.netlify.app/projects/standalone_visualizations/sa_2024-11-11.html)!

See the [{bobsburgersR} R Package](https://github.com/poncest/bobsburgersR) for the original transcript data, as well as additional information about each episode!

- How have dialogue metrics changed over the seasons?
- Can you find any patterns not shown in Steven Ponce's original visualization?

Thank you to [Jon Harmon](https://github.com/jonthegeek) for curating this week's dataset.

## The Data

```r
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2024-11-19')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 47)

episode_metrics <- tuesdata$episode_metrics

# Option 2: Read directly from GitHub

episode_metrics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-11-19/episode_metrics.csv')
```

## How to Participate

- [Explore the data](https://r4ds.hadley.nz/), watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about **causation** in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a [shiny app](https://shiny.posit.co/), or some other piece of data-science-related output, using R or another programming language.
- [Share your output and the code used to generate it](../../../sharing.md) on social media with the #TidyTuesday hashtag.
- [Submit your own dataset!](../../../.github/pr_instructions.md)

### Data Dictionary

# `episode_metrics.csv`

|variable |class |description |
|:------------------|:-------|:-------------------------------------|
|season |integer |The season number in which the episode is part of the Bob's Burgers TV show. |
|episode |integer |The episode number within the specific season of Bob's Burgers. |
|dialogue_density |double |The number of non-blank lines in this episode. |
|avg_length |double |The average number of characters (technically codepoints, see `?stringr::str_length`) per line of dialogue. |
|sentiment_variance |double |The variance in the numeric AFINN sentiment of words in this episode. See `?textdata::lexicon_afinn` for further information. |
|unique_words |integer |The number of unique lowercase words in this episode. |
|question_ratio |double |The proportion of lines of dialogue that contain at least one question mark ("?"). |
|exclamation_ratio |double |The proportion of lines of dialogue that contain at least one exclamation point ("!"). |

### Cleaning Script

```r
# Mostly clean data provided by Steven Ponce (@poncest) via the {bobsburgersR} R
# package. Cleaning based on
# https://stevenponce.netlify.app/projects/standalone_visualizations/sa_2024-11-11.html

# pak::pak("poncest/bobsburgersR")
# pak::pak("tidytext")
# pak::pak("textdata")
library(bobsburgersR)
library(tidyverse)
library(tidytext)

transcript_data <-
bobsburgersR::transcript_data |>
dplyr::mutate(
dplyr::across(
c(season, episode),
as.integer
)
)

# Calculate metrics. You will have to acknowledge downloading of afinn data if
# you have not used it before.
episode_metrics <-
transcript_data |>
dplyr::filter(!is.na(dialogue)) |>
dplyr::summarize(
# Basic dialogue metrics
dialogue_density = dplyr::n() / max(line),
avg_length = mean(stringr::str_length(dialogue)),

# Sentiment analysis - AFINN Sentiment Lexicon
sentiment_variance = dialogue |>
tibble::tibble(text = _) |>
tidytext::unnest_tokens(word, text) |>
dplyr::inner_join(tidytext::get_sentiments("afinn"), by = "word") |>
dplyr::pull(value) |>
var(na.rm = TRUE),

# Word and punctuation metrics
unique_words = dialogue |>
# Using boundary() instead of "\\s+" as in the blog results in differences
# in unique word counts, since punctuation doesn't get grouped with the
# word it touches. See ?stringr::boundary for details. I also converted
# all text to lowercase before counting.
stringr::str_split(stringr::boundary("word")) |>
unlist() |>
tolower() |>
dplyr::n_distinct(),
question_ratio = mean(stringr::str_detect(dialogue, "\\?")),
exclamation_ratio = mean(stringr::str_detect(dialogue, "!")),
.by = c(season, episode)
)

# You may wish to see the blog post for further preparation steps!
```
1 change: 1 addition & 0 deletions data/2024/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,4 @@ Archive of datasets and articles from the 2024 series of `#TidyTuesday` events.
| 44 | `2024-10-29` | [Monster Movies](2024-10-29/readme.md) | [IMDb non-commercial datasets](https://developer.imdb.com/non-commercial-datasets/) | [Why Do People Like Horror Films? A Statistical Analysis](https://www.statsignificant.com/p/why-do-people-like-horror-films-a) |
| 45 | `2024-11-05` | [Democracy and Dictatorship](2024-11-05/readme.md) | [democracyData R Package](https://xmarquez.github.io/democracyData/index.html) | [Regime types and regime change: A new dataset on democracy, coups, and political institutions](https://link.springer.com/article/10.1007/s11558-019-09345-1) |
| 46 | `2024-11-12` | [ISO Country Codes](2024-11-12/readme.md) | [ISOcodes R Package](https://cran.r-project.org/package=ISOcodes) | [ISO 3166 on Wikipedia](https://en.wikipedia.org/wiki/ISO_3166) |
| 47 | `2024-11-19` | [Bob's Burgers Episodes](2024-11-19/readme.md) | [bobsburgersR R Package](https://github.com/poncest/bobsburgersR) | [Bob’s Burgers Episode Fingerprints by Season](https://stevenponce.netlify.app/projects/standalone_visualizations/sa_2024-11-11.html) |
1 change: 1 addition & 0 deletions static/tt_data_type.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Week,Date,year,data_files,data_type,delim
47,2024-11-19,2024,episode_metrics.csv,csv,","
46,2024-11-12,2024,countries.csv,csv,","
46,2024-11-12,2024,country_subdivisions.csv,csv,","
46,2024-11-12,2024,former_countries.csv,csv,","
Expand Down

0 comments on commit 696ff62

Please sign in to comment.