Moving lesson answers to answer key

mattahrens · Nov 10, 2023 · 6abcf21 · 6abcf21
1 parent d9da249
commit 6abcf21
Show file tree

Hide file tree

Showing 6 changed files with 187 additions and 190 deletions.
diff --git a/docs/05-Querying-data-in-python.md b/docs/05-Querying-data-in-python.md
@@ -88,29 +88,3 @@ If you have successfully built all of those queries to answer the questions, the
 
 ## Summary
 In this lesson, we learned how to write queries in Python using functions.  We explored our book ratings datasets to ask questions of the data.  We used different functions to help us get the answers we wanteds.  Some of the functions included: `count()`, `groupby()`, `sort_values()`, and `head()`.
-
-## Answer key
-1. What is the age of the users who did reviews grouped by each age?  Hint: you will have to use the users dataset for this query.
-```
-users_df.groupby('Age').count().sort_values(by=['Age'])
-```
-
-2. What is the overall average age of users?  Hint: you will have to use the `mean()` function.
-```
-users_df['Age'].mean()
-```
-
-3. What is the number of ratings at each ratings (0 - 10)?  Hint: you will have to the use the ratings dataset.
-```
-ratings_df.groupby('Book-Rating').count().sort_values(by=['Book-Rating'])
-```
-
-4. What is the overall average book rating from all ratings?  Hint: you will have to use the `mean()` function.
-```
-ratings_df['Book-Rating'].mean()
-```
-
-5. How many distinct authors are in the dataset?  Hint: you will have to use the books dataset and the `nunique()` function.
-```
-books_df['Book-Author'].nunique()
-```
diff --git a/docs/06-Writing-sql-query.md b/docs/06-Writing-sql-query.md
@@ -99,30 +99,3 @@ Now you can try to build your own SQL queries.  Here are a few to start with:
 
 ## Summary
 In this lesson, we learned about SQL and what the main keywords in SQL mean.  We then were able to write our own SQL queries in Python to ask questions of our data.  We saw how writing a SQL query is similar to using functions.
-
-## Answer key
-1. How many users are in the dataset?
-```
-query = """
-   SELECT count(*)
-   FROM users_df
-   """
-sqldf(query)
-```
-2. How many books are in the dataset?
-```
-query = """
-   SELECT count(*)
-   FROM books_df
-   """
-sqldf(query)
-```
-3. What are the minimum and maximum ratings that can be given for a book?  (Hint: use `MIN()` and `MAX()` functions in the SELECT part of your query.)
-```
-query = """
-   SELECT MIN(`Book-Rating`), MAX(`Book-Rating`)
-   FROM ratings_df
-   """
-sqldf(query)
-```
-
diff --git a/docs/07-Advanced-sql-queries.md b/docs/07-Advanced-sql-queries.md
@@ -84,45 +84,3 @@ Now try to build your own advacned SQL queries.  Here are a few to start with:
 
 ## Summary
 In this lesson, we learned about how to do more advanced queries, specifically in how to filter records for number and string fields.  We also learned about how to count unique values in a dataset with the **DISTINCT** keyword.
-
-## Answer key
-1. What book (ISBN) has the most ratings = 10 and which book (ISBN) has the most ratings = 0?
-```
-query = """
-  SELECT `ISBN`, count(*) as total
-  FROM ratings_df
-  WHERE `Book-Rating` = 10
-  GROUP BY `ISBN`
-  ORDER BY total desc
-"""
-sqldf(query)
-
-query = """
-  SELECT `ISBN`, count(*) as total
-  FROM ratings_df
-  WHERE `Book-Rating` = 0
-  GROUP BY `ISBN`
-  ORDER BY total desc
-"""
-sqldf(query)
-```
-
-2. What is the average age for the top cities in the United States for users in the dataset? (Hint: use the **AVG** keyword in your SQL query.)
-```
-query = """
-  SELECT AVG(`Age`)
-  FROM users_df
-  WHERE `Location` LIKE "%usa%"
-"""
-sqldf(query)
-```
-
-3. How many unique publishers did J.K. Rowling use for her Harry Potter books?
-```
-query = """
-  SELECT count(distinct `Publisher`)
-  FROM books_df
-  WHERE `Book-Title` LIKE "%Harry Potter%" and `Book-Author` LIKE "%Rowling%"
-"""
-sqldf(query)
-```
diff --git a/docs/09-Advanced-join-queries.md b/docs/09-Advanced-join-queries.md
@@ -45,49 +45,3 @@ Now you're ready to write your own advanced join queries in SQL with our books d
 
 ## Summary
 In this lesson, we wrote more advanced queries including a multi-join query to join three datasets together and also a left join.  We also saw how join queries can include other functions and filters to get the answer desired from the query.
-
-## Answer key
-1. What user location has the most number of book ratings?
-```
-query = """
-  SELECT `Location`, count(`Book-Rating`) as rating_cnt
-  FROM ratings_df
-  INNER JOIN users_df
-  ON ratings_df.`User-ID` = users_df.`User-ID`
-  GROUP BY users_df.`Location`
-  ORDER BY rating_cnt desc
-"""
-sqldf(query)
-```
-
-2. What publication year has the least popular books by average rating that has more than 10 ratings?
-```
-query = """
-  SELECT `Year-Of-Publication`, AVG(`Book-Rating`) as rating_avg
-  FROM books_df
-  INNER JOIN ratings_df
-  ON books_df.`ISBN` = ratings_df.`ISBN`
-  GROUP BY `Year-Of-Publication`
-  HAVING COUNT(`Book-Rating`) > 10
-  ORDER BY rating_avg
-"""
-sqldf(query)
-```
-
-3. What age of users has the highest average rating for books that were published between 2000 and 2003?
-```
-query = """
-  SELECT `Age`, AVG(`Book-Rating`) as rating_avg
-  FROM ratings_df
-  INNER JOIN users_df
-  ON ratings_df.`User-ID` = users_df.`User-ID`
-  INNER JOIN books_df
-  ON ratings_df.`ISBN` = books_df.`ISBN`
-  WHERE `Year-Of-Publication` >= 2000 and `Year-Of-Publication` <= 2003
-  GROUP BY users_df.`Age`
-  ORDER BY rating_avg desc
-"""
-sqldf(query)
-```
-
-
diff --git a/docs/10-Data-visualization-in-python.md b/docs/10-Data-visualization-in-python.md
@@ -85,51 +85,3 @@ Here are some visualization challenges for you to try out:
 
 ## Summary
 In this lesson, we explored 4 basic data visualizations and how they differ in displaying information about a dataset.  We then used various plot functions in Python to display different types of data from the books datasets.
-
-## Answer key
-1. Create a line chart to show the number of unique users who gave ratings per year of publication from 1992 to 2002.  Hint: you will have to use the `DISTINCT` keyword.
-```
-query = """
-  SELECT `Year-Of-Publication` as year, count(distinct(users_df.`User-ID`)) as users
-  FROM ratings_df
-  INNER JOIN users_df
-  ON ratings_df.`User-ID` = users_df.`User-ID`
-  INNER JOIN books_df
-  ON ratings_df.`ISBN` = books_df.`ISBN`
-  WHERE year >= 1992 and year <= 2002
-  GROUP BY year
-  ORDER BY year
-"""
-year_counts = sqldf(query)
-year_counts.plot.line(x='year', y='users')
-```
-
-2. Create a pie chart for the number of books per year of publication from 1992 to 2002.  
-```
-query = """
-  SELECT `Year-Of-Publication` as year, count(books_df.`ISBN`) as books
-  FROM ratings_df
-  INNER JOIN books_df
-  ON ratings_df.`ISBN` = books_df.`ISBN`
-  WHERE year >= 1992 and year <= 2002
-  GROUP BY year
-  ORDER BY year
-"""
-year_counts = sqldf(query)
-year_counts.plot.pie(x='year', y='books')
-```
-
-3. Create a scatter plot to show the relationship between year of publication and average book rating (for 1992 - 2002).  Each book should be a single point in the plot.
-```
-query = """
-  SELECT `Year-Of-Publication` as year, avg(`Book-Rating`) as rating_avg
-  FROM ratings_df
-  INNER JOIN books_df
-  ON ratings_df.`ISBN` = books_df.`ISBN`
-  WHERE year >= 1992 and year <= 2002
-  GROUP BY year
-  ORDER BY year
-"""
-year_counts = sqldf(query)
-year_counts.plot.scatter(x='year', y='rating_avg')
-```