Task 2 - Exploratory analysis Help Center

The first step in building a predictive model for text is understanding the distribution and relationship between the words, tokens, and phrases in the text. The goal of this task is to understand the basic relationships you observe in the data and prepare to build your first linguistic models.

Tasks to accomplish

Exploratory analysis - perform a thorough exploratory analysis of the data, understanding the distribution of words and relationship between the words in the corpora.
Understand frequencies of words and word pairs - build figures and tables to understand variation in the frequencies of words and word pairs in the data.

Questions to consider

Some words are more frequent than others - what are the distributions of word frequencies?
What are the frequencies of 2-grams and 3-grams in the dataset?
How many unique words do you need in a frequency sorted dictionary to cover 50% of all word instances in the language? 90%?
How do you evaluate how many of the words come from foreign languages?
Can you think of a way to increase the coverage -- identifying words that may not be in the corpora or using a smaller number of words in the dictionary to cover the same number of phrases?

Introductory Video

Exploratory Analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task2.md

Task2.md

Task 2 - Exploratory analysis Help Center

Tasks to accomplish

Questions to consider

Introductory Video

Files

Task2.md

Latest commit

History

Task2.md

File metadata and controls

Task 2 - Exploratory analysis Help Center

Tasks to accomplish

Questions to consider

Introductory Video