Skip to content

Commit

Permalink
Finishing draft of Lesson 04
Browse files Browse the repository at this point in the history
  • Loading branch information
mattahrens committed Oct 7, 2023
1 parent 59453b4 commit a0b96f3
Showing 1 changed file with 33 additions and 1 deletion.
34 changes: 33 additions & 1 deletion docs/04-Loading-data-in-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,37 @@ Let's go through this code line by line to understand what is happening. The fi

## Practice: Loading a dataset in Python using Google Colab

## Summary
Now we're ready to move one from creating a dataset to loading a dataset. In this practice, we will be loading a dataset containing book reviews. Before loading the dataset into dataframes we first have to download it with this code in a new cell in your notebook:

```
!wget https://cdn.freecodecamp.org/project-data/books/book-crossings.zip
!unzip book-crossings.zip
```

The `book-crossings.zip` file that you downloaded will contain 3 different comma-separated values (CSV) files, each of one contains a dataset:
- **BX-Book-Ratings.csv**: contains ratings of books
- **BX-Books.csv**: contains information about the books
- **BX-Users.csv**: contains information about the users who provided ratings of books

Now we can load the CSV files into dataframes using the `read_csv` function. The `read_csv` function will convert the CSV files into the dataframe format so we can do other functions on the data.

```
ratings_df = pd.read_csv('BX-Book-Ratings.csv', delimiter=';', encoding='latin-1', on_bad_lines='skip')
books_df = pd.read_csv('BX-Books.csv', delimiter=';', encoding='latin-1', on_bad_lines='skip')
users_df = pd.read_csv('BX-Users.csv', delimiter=';', encoding='latin-1', on_bad_lines='skip')
```

The read_csv function takes in parameters on how to load data, including what separates the fields (a semicolon instead of a comma), what type of character encoding (latin-1) and if you should fail the program on bad lines or skip (we want to skip). After we've loaded the datasets into dataframes, we can look print out the contents of each dataframe with the print function.

```
print(ratings_df)
```

You can add similar cells for the `books_df` and `user_df` dataframes as well. To see full information about the fields in a dataframe, you can use the `describe()` function.

```
ratings_df.describe()
```

## Summary
In this lesson, we learned how to use Python to create our own dataset and to load a dataset from existing files that we downloaded. We were introduced to the concept of a dataframe with is how data is represented in Python. We also learned about functions which are modules of code that allow us to accomplish a specific task. Some of the functions we practices were print() and describe().

0 comments on commit a0b96f3

Please sign in to comment.