From ae94175140044cf7d77c6ba4e28cc361a952292c Mon Sep 17 00:00:00 2001 From: Conor Smith Date: Tue, 18 Oct 2022 19:05:48 -0400 Subject: [PATCH 1/7] completed mini-project --- mec-3.4.1-api-mini-project/.env | 2 +- .../api_data_wrangling_mini_project.ipynb | 576 +++++++++++++++--- mec-3.4.1-api-mini-project/python.gitignore | 160 +++++ 3 files changed, 660 insertions(+), 78 deletions(-) create mode 100644 mec-3.4.1-api-mini-project/python.gitignore diff --git a/mec-3.4.1-api-mini-project/.env b/mec-3.4.1-api-mini-project/.env index 5d011ea4..5080b71a 100644 --- a/mec-3.4.1-api-mini-project/.env +++ b/mec-3.4.1-api-mini-project/.env @@ -1 +1 @@ -NASDAQ_API_KEY=KRfk96yoWvruWZ-LjPb +API_KEY='7MadrSm5uJz-31r7rF4z' \ No newline at end of file diff --git a/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb b/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb index 0d34bd5c..e0b34166 100755 --- a/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb +++ b/mec-3.4.1-api-mini-project/api_data_wrangling_mini_project.ipynb @@ -2,147 +2,151 @@ "cells": [ { "cell_type": "markdown", + "metadata": {}, "source": [ "This exercise will require you to pull some data from https://data.nasdaq.com/ (formerly Quandl API)." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "As a first step, you will need to register a free account on the https://data.nasdaq.com/ website." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "After you register, you will be provided with a unique API key, that you should store:\r\n", - "\r\n", - "*Note*: Use a `.env` file and put your key in there and `python-dotenv` to access it in this notebook. \r\n", - "\r\n", - "The code below uses a key that was used when generating this project but has since been deleted. Never submit your keys to source control. There is a `.env-example` file in this repository to illusrtate what you need. Copy that to a file called `.env` and use your own api key in that `.env` file. Make sure you also have a `.gitignore` file with a line for `.env` added to it. \r\n", - "\r\n", + "After you register, you will be provided with a unique API key, that you should store:\n", + "\n", + "*Note*: Use a `.env` file and put your key in there and `python-dotenv` to access it in this notebook. \n", + "\n", + "The code below uses a key that was used when generating this project but has since been deleted. Never submit your keys to source control. There is a `.env-example` file in this repository to illusrtate what you need. Copy that to a file called `.env` and use your own api key in that `.env` file. Make sure you also have a `.gitignore` file with a line for `.env` added to it. \n", + "\n", "The standard Python gitignore is [here](https://github.com/github/gitignore/blob/master/Python.gitignore) you can just copy that. " - ], - "metadata": {} + ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7MadrSm5uJz-31r7rF4z\n" + ] + } + ], "source": [ "# get api key from your .env file\n", "import os\n", "from dotenv import load_dotenv # if missing this module, simply run `pip install python-dotenv`\n", "\n", "load_dotenv()\n", - "API_KEY = os.getenv('NASDAQ_API_KEY')\n", + "API_KEY = os.getenv('API_KEY')\n", "\n", "print(API_KEY)" - ], - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "KRfk96yoWvruWZ-LjPbo\n" - ] - } - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Nasdaq Data has a large number of data sources, but, unfortunately, most of them require a Premium subscription. Still, there are also a good number of free datasets." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "For this mini project, we will focus on equities data from the Frankfurt Stock Exhange (FSE), which is available for free. We'll try and analyze the stock prices of a company called Carl Zeiss Meditec, which manufactures tools for eye examinations, as well as medical lasers for laser eye surgery: https://www.zeiss.com/meditec/int/home.html. The company is listed under the stock ticker AFX_X." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "You can find the detailed Nasdaq Data API instructions here: https://docs.data.nasdaq.com/docs/in-depth-usage" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "While there is a dedicated Python package for connecting to the Nasdaq API, we would prefer that you use the *requests* package, which can be easily downloaded using *pip* or *conda*. You can find the documentation for the package here: http://docs.python-requests.org/en/master/ " - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Finally, apart from the *requests* package, you are encouraged to not use any third party Python packages, such as *pandas*, and instead focus on what's available in the Python Standard Library (the *collections* module might come in handy: https://pymotw.com/3/collections/).\n", "Also, since you won't have access to DataFrames, you are encouraged to us Python's native data structures - preferably dictionaries, though some questions can also be answered using lists.\n", "You can read more on these data structures here: https://docs.python.org/3/tutorial/datastructures.html" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Keep in mind that the JSON responses you will be getting from the API map almost one-to-one to Python's dictionaries. Unfortunately, they can be very nested, so make sure you read up on indexing dictionaries in the documentation provided above." - ], - "metadata": {} + ] }, { "cell_type": "code", - "execution_count": 6, - "source": [ - "# First, import the relevant modules" - ], + "execution_count": 4, + "metadata": {}, "outputs": [], - "metadata": {} + "source": [ + "import requests" + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ - "Note: API's can change a bit with each version, for this exercise it is reccomended to use the nasdaq api at `https://data.nasdaq.com/api/v3/`. This is the same api as what used to be quandl so `https://www.quandl.com/api/v3/` should work too.\r\n", - "\r\n", + "Note: API's can change a bit with each version, for this exercise it is reccomended to use the nasdaq api at `https://data.nasdaq.com/api/v3/`. This is the same api as what used to be quandl so `https://www.quandl.com/api/v3/` should work too.\n", + "\n", "Hint: We are looking for the `AFX_X` data on the `datasets/FSE/` dataset." - ], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": 7, - "source": [ - "# Now, call the Nasdaq API and pull out a small sample of the data (only one day) to get a glimpse\n", - "# into the JSON structure that will be returned" - ], - "outputs": [], - "metadata": {} + ] }, { "cell_type": "code", - "execution_count": 9, - "source": [ - "# Inspect the JSON structure of the object you created, and take note of how nested it is,\n", - "# as well as the overall structure" - ], + "execution_count": 50, + "metadata": {}, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ - "{'dataset': {'id': 10095370, 'dataset_code': 'AFX_X', 'database_code': 'FSE', 'name': 'Carl Zeiss Meditec (AFX_X)', 'description': 'Stock Prices for Carl Zeiss Meditec (2020-11-02) from the Frankfurt Stock Exchange.

Trading System: Xetra

ISIN: DE0005313704', 'refreshed_at': '2020-12-01T14:48:09.907Z', 'newest_available_date': '2020-12-01', 'oldest_available_date': '2000-06-07', 'column_names': ['Date', 'Open', 'High', 'Low', 'Close', 'Change', 'Traded Volume', 'Turnover', 'Last Price of the Day', 'Daily Traded Units', 'Daily Turnover'], 'frequency': 'daily', 'type': 'Time Series', 'premium': False, 'limit': None, 'transform': None, 'column_index': None, 'start_date': '2021-01-03', 'end_date': '2020-12-01', 'data': [], 'collapse': None, 'order': None, 'database_id': 6129}}\n" + "\n" ] } ], - "metadata": {} + "source": [ + "#1. Collect data from the Franfurt Stock Exchange, for the ticker AFX_X, for the whole year 2017 (keep in mind that the date format is YYYY-MM-DD).\n", + "fse = requests.get(f'https://data.nasdaq.com/api/v3/datasets/FSE/AFX_X.json?api_key={API_KEY}')\n", + "fse = requests.get(f'https://data.nasdaq.com/api/v3/datasets/FSE/VNA_X?start_date=2017-01-01&end_date=2017-12-31&api_key={API_KEY}')\n", + "#2. Convert the returned JSON object into a Python dictionary.\n", + "json = fse.json()\n", + "print(type(json))" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "These are your tasks for this mini project:\n", "\n", @@ -153,28 +157,449 @@ "5. What was the largest change between any two days (based on Closing Price)?\n", "6. What was the average daily trading volume during this year?\n", "7. (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Date', 'Open', 'High', 'Low', 'Close', 'Change', 'Traded Volume', 'Turnover', 'Last Price of the Day', 'Daily Traded Units', 'Daily Turnover']\n" + ] + } ], - "metadata": {} + "source": [ + "print(json['dataset']['column_names'])" + ] }, { "cell_type": "code", - "execution_count": null, - "source": [], + "execution_count": 81, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[['2017-12-29', 41.225, 41.425, 41.145, 41.39, None, 601057.0, 24840221.0, None, None, None], ['2017-12-28', 41.3, 41.34, 41.095, 41.22, None, 608053.0, 25062545.0, None, None, None], ['2017-12-27', 41.01, 41.335, 40.815, 41.335, None, 732911.0, 30168070.0, None, None, None], ['2017-12-22', 40.64, 40.975, 40.585, 40.975, None, 843468.0, 34444774.0, None, None, None], ['2017-12-21', 41.085, 41.1, 40.565, 40.7, None, 1384516.0, 56441896.0, None, None, None], ['2017-12-20', 41.715, 41.895, 40.935, 41.035, None, 1411562.0, 58322057.0, None, None, None], ['2017-12-19', 41.95, 42.215, 41.625, 41.64, None, 1314959.0, 55010330.0, None, None, None], ['2017-12-18', 41.5, 42.05, 40.92, 41.88, None, 2098187.0, 87316028.0, None, None, None], ['2017-12-15', 40.72, 41.35, 40.68, 41.35, None, 2733044.0, 112478446.0, None, None, None], ['2017-12-14', 40.79, 41.0, 40.63, 40.94, None, 1243035.0, 50820573.0, None, None, None], ['2017-12-13', 41.205, 41.22, 40.81, 40.845, None, 1036110.0, 42434183.0, None, None, None], ['2017-12-12', 41.36, 41.435, 40.845, 41.08, None, 1477381.0, 60712041.0, None, None, None], ['2017-12-11', 41.175, 41.355, 41.04, 41.125, None, 1043727.0, 42973562.0, None, None, None], ['2017-12-08', 41.13, 41.45, 40.99, 41.22, None, 1289642.0, 53150892.0, None, None, None], ['2017-12-07', 40.74, 41.23, 40.57, 40.84, None, 1376033.0, 56276968.0, None, None, None], ['2017-12-06', 40.54, 40.64, 40.165, 40.6, None, 1205912.0, 48764021.0, None, None, None], ['2017-12-05', 40.05, 40.775, 39.97, 40.665, None, 1948545.0, 78968966.0, None, None, None], ['2017-12-04', 39.79, 40.11, 39.585, 39.77, None, 1321182.0, 52627362.0, None, None, None], ['2017-12-01', 39.545, 39.845, 39.35, 39.44, None, 1495815.0, 59136311.0, None, None, None], ['2017-11-30', 38.95, 39.7, 38.915, 39.545, None, 3040962.0, 120157342.0, None, None, None], ['2017-11-29', 39.95, 40.025, 38.91, 39.025, None, 1493605.0, 58739775.0, None, None, None], ['2017-11-28', 39.825, 39.905, 39.5, 39.7, None, 1018883.0, 40423811.0, None, None, None], ['2017-11-27', 39.7, 40.04, 39.63, 39.8, None, 1095660.0, 43654747.0, None, None, None], ['2017-11-24', 39.595, 39.92, 39.355, 39.65, None, 999933.0, 39641078.0, None, None, None], ['2017-11-23', 39.275, 39.62, 39.21, 39.52, None, 1161907.0, 45849993.0, None, None, None], ['2017-11-22', 39.955, 40.03, 39.365, 39.365, None, 985055.0, 39043700.0, None, None, None], ['2017-11-21', 39.445, 40.23, 39.33, 39.93, None, 1531366.0, 61107977.0, None, None, None], ['2017-11-20', 39.245, 39.43, 39.14, 39.3, None, 1057105.0, 41546170.0, None, None, None], ['2017-11-17', 39.7, 39.79, 39.375, 39.375, None, 1048127.0, 41459027.0, None, None, None], ['2017-11-16', 39.45, 39.785, 39.3, 39.705, None, 1010748.0, 40046253.0, None, None, None], ['2017-11-15', 39.495, 39.495, 38.77, 39.265, None, 1408279.0, 55104780.0, None, None, None], ['2017-11-14', 39.53, 39.63, 39.285, 39.5, None, 1103999.0, 43589611.0, None, None, None], ['2017-11-13', 39.26, 39.465, 39.01, 39.43, None, 1423857.0, 55975814.0, None, None, None], ['2017-11-10', 39.24, 39.445, 38.935, 39.145, None, 2133301.0, 83564952.0, None, None, None], ['2017-11-09', 39.245, 39.29, 38.805, 39.115, None, 1762497.0, 68815373.0, None, None, None], ['2017-11-08', 39.22, 39.54, 38.205, 39.165, None, 1841312.0, 72049610.0, None, None, None], ['2017-11-07', 39.0, 39.165, 38.73, 39.05, None, 1378274.0, 53770377.0, None, None, None], ['2017-11-06', 38.8, 38.975, 38.68, 38.81, None, 773139.0, 30002877.0, None, None, None], ['2017-11-03', 38.72, 38.815, 38.55, 38.73, None, 1147296.0, 44421907.0, None, None, None], ['2017-11-02', 38.165, 38.69, 38.145, 38.49, None, 1744271.0, 67156177.0, None, None, None], ['2017-11-01', 38.1, 38.315, 37.54, 38.2, None, 1810355.0, 69008167.0, None, None, None], ['2017-10-30', 37.32, 37.76, 37.25, 37.76, None, 1259359.0, 47379195.0, None, None, None], ['2017-10-27', 36.9, 37.33, 36.79, 37.18, None, 1362626.0, 50636915.0, None, None, None], ['2017-10-26', 36.425, 36.775, 36.355, 36.67, None, 1773417.0, 64876374.0, None, None, None], ['2017-10-25', 36.5, 36.54, 36.21, 36.28, None, 1146901.0, 41641699.0, None, None, None], ['2017-10-24', 36.755, 37.1, 36.515, 36.64, None, 1182044.0, 43411836.0, None, None, None], ['2017-10-23', 37.63, 37.635, 36.855, 36.875, None, 1281333.0, 47497515.0, None, None, None], ['2017-10-20', 37.93, 37.96, 37.27, 37.45, None, 1180003.0, 44288477.0, None, None, None], ['2017-10-19', 37.885, 38.13, 37.55, 37.77, None, 1481919.0, 55992823.0, None, None, None], ['2017-10-18', 37.285, 37.835, 37.2, 37.685, None, 1179431.0, 44414462.0, None, None, None], ['2017-10-17', 37.19, 37.25, 36.99, 37.125, None, 959055.0, 35623810.0, None, None, None], ['2017-10-16', 36.91, 37.145, 36.695, 37.12, None, 796432.0, 29483832.0, None, None, None], ['2017-10-13', 36.93, 36.955, 36.665, 36.77, None, 1030081.0, 37900999.0, None, None, None], ['2017-10-12', 36.5, 36.995, 36.425, 36.88, None, 1131777.0, 41704147.0, None, None, None], ['2017-10-11', 36.28, 36.49, 36.095, 36.45, None, 741342.0, 26962219.0, None, None, None], ['2017-10-10', 36.36, 36.5, 36.08, 36.31, None, 1184571.0, 43063747.0, None, None, None], ['2017-10-09', 36.045, 36.145, 35.895, 36.125, None, 798410.0, 28795537.0, None, None, None], ['2017-10-06', 36.55, 36.585, 35.885, 35.97, None, 1780610.0, 64285160.0, None, None, None], ['2017-10-05', 36.82, 36.84, 36.35, 36.56, None, 1088678.0, 39791298.0, None, None, None], ['2017-10-04', 36.435, 36.945, 36.135, 36.745, None, 2021500.0, 74201081.0, None, None, None], ['2017-10-02', 36.17, 36.325, 36.0, 36.235, None, 981385.0, 35516802.0, None, None, None], ['2017-09-29', 35.845, 36.01, 35.75, 36.0, None, 1574374.0, 56587390.0, None, None, None], ['2017-09-28', 35.85, 35.885, 35.455, 35.805, None, 1391082.0, 49641573.0, None, None, None], ['2017-09-27', 36.4, 36.4, 35.675, 35.7, None, 1265947.0, 45369154.0, None, None, None], ['2017-09-26', 35.82, 36.515, 35.805, 36.25, None, 1461266.0, 53049635.0, None, None, None], ['2017-09-25', 35.505, 35.93, 35.45, 35.795, None, 864934.0, 30942056.0, None, None, None], ['2017-09-22', 35.41, 35.66, 35.32, 35.505, None, 848527.0, 30112639.0, None, None, None], ['2017-09-21', 35.815, 35.815, 35.35, 35.45, None, 1169809.0, 41477784.0, None, None, None], ['2017-09-20', 35.91, 36.01, 35.7, 35.91, None, 772531.0, 27712890.0, None, None, None], ['2017-09-19', 36.215, 36.26, 35.915, 35.97, None, 867630.0, 31271571.0, None, None, None], ['2017-09-18', 36.635, 36.69, 36.2, 36.22, None, 897147.0, 32668669.0, None, None, None], ['2017-09-15', 36.5, 36.52, 36.125, 36.435, None, 3900972.0, 142003486.0, None, None, None], ['2017-09-14', 36.395, 36.625, 36.275, 36.41, None, 1326481.0, 48313680.0, None, None, None], ['2017-09-13', 36.365, 36.575, 36.25, 36.48, None, 1222826.0, 44582558.0, None, None, None], ['2017-09-12', 36.48, 36.595, 36.36, 36.475, None, 1132496.0, 41307182.0, None, None, None], ['2017-09-11', 36.375, 36.52, 36.255, 36.4, None, 873615.0, 31805844.0, None, None, None], ['2017-09-08', 35.935, 36.305, 35.935, 36.195, None, 980317.0, 35462482.0, None, None, None], ['2017-09-07', 35.8, 35.995, 35.585, 35.935, None, 1244383.0, 44616052.0, None, None, None], ['2017-09-06', 35.0, 35.675, 34.925, 35.55, None, 1262464.0, 44764575.0, None, None, None], ['2017-09-05', 35.425, 35.455, 35.01, 35.195, None, 843906.0, 29727013.0, None, None, None], ['2017-09-04', 35.13, 35.385, 35.1, 35.3, None, 556925.0, 19645220.0, None, None, None], ['2017-09-01', 35.605, 35.685, 35.36, 35.42, None, 786143.0, 27887648.0, None, None, None], ['2017-08-31', 35.435, 35.69, 35.435, 35.505, None, 1190945.0, 42323783.0, None, None, None], ['2017-08-30', 35.16, 35.5, 35.005, 35.285, None, 1117782.0, 39481349.0, None, None, None], ['2017-08-29', 34.885, 35.07, 34.76, 34.945, None, 1153084.0, 40294995.0, None, None, None], ['2017-08-28', 35.02, 35.095, 34.765, 35.035, None, 575476.0, 20128131.0, None, None, None], ['2017-08-25', 35.25, 35.31, 34.985, 35.065, None, 978790.0, 34347195.0, None, None, None], ['2017-08-24', 35.25, 35.525, 35.21, 35.21, None, 808758.0, 28566900.0, None, None, None], ['2017-08-23', 35.445, 35.49, 35.185, 35.265, None, 762435.0, 26899770.0, None, None, None], ['2017-08-22', 35.45, 35.605, 35.4, 35.405, None, 765764.0, 27158693.0, None, None, None], ['2017-08-21', 35.115, 35.36, 34.97, 35.305, None, 778364.0, 27429654.0, None, None, None], ['2017-08-18', 35.305, 35.48, 35.19, 35.255, None, 1156950.0, 40858779.0, None, None, None], ['2017-08-17', 35.425, 35.81, 35.425, 35.49, None, 948061.0, 33743508.0, None, None, None], ['2017-08-16', 35.73, 35.925, 35.41, 35.47, None, 912185.0, 32414293.0, None, None, None], ['2017-08-15', 35.76, 35.885, 35.445, 35.605, None, 1031791.0, 36719404.0, None, None, None], ['2017-08-14', 35.275, 35.89, 35.2, 35.7, None, 1268298.0, 45247331.0, None, None, None], ['2017-08-11', 35.5, 35.64, 34.88, 35.09, None, 1376672.0, 48398546.0, None, None, None], ['2017-08-10', 35.64, 35.7, 35.315, 35.55, None, 954629.0, 33905737.0, None, None, None], ['2017-08-09', 35.54, 35.81, 35.415, 35.64, None, 1032164.0, 36780128.0, None, None, None], ['2017-08-08', 35.565, 35.79, 35.405, 35.71, None, 813306.0, 28992490.0, None, None, None], ['2017-08-07', 36.0, 36.0, 35.4, 35.625, None, 993748.0, 35384039.0, None, None, None], ['2017-08-04', 35.58, 36.005, 35.56, 35.9, None, 1048069.0, 37610121.0, None, None, None], ['2017-08-03', 35.4, 35.845, 35.335, 35.585, None, 1232461.0, 43873041.0, None, None, None], ['2017-08-02', 35.5, 35.77, 35.23, 35.44, None, 2159905.0, 76652742.0, None, None, None], ['2017-08-01', 34.185, 35.03, 34.185, 35.015, None, 1428589.0, 49719043.0, None, None, None], ['2017-07-31', 34.525, 34.625, 34.255, 34.255, None, 1226265.0, 42149590.0, None, None, None], ['2017-07-28', 34.91, 34.91, 34.32, 34.56, None, 1291761.0, 44621693.0, None, None, None], ['2017-07-27', 34.48, 35.345, 34.48, 35.005, None, 1345183.0, 47108133.0, None, None, None], ['2017-07-26', 34.315, 34.55, 34.19, 34.49, None, 1100747.0, 37890368.0, None, None, None], ['2017-07-25', 34.465, 34.57, 34.275, 34.39, None, 831703.0, 28627354.0, None, None, None], ['2017-07-24', 34.475, 34.705, 34.295, 34.38, None, 1065404.0, 36708290.0, None, None, None], ['2017-07-21', 34.42, 34.65, 34.275, 34.465, None, 1219659.0, 42017779.0, None, None, None], ['2017-07-20', 34.67, 34.845, 34.38, 34.43, None, 1120392.0, 38692096.0, None, None, None], ['2017-07-19', 34.585, 34.675, 34.425, 34.555, None, 933429.0, 32231381.0, None, None, None], ['2017-07-18', 34.575, 34.78, 34.48, 34.595, None, 1178023.0, 40774008.0, None, None, None], ['2017-07-17', 35.05, 35.125, 34.55, 34.65, None, 994306.0, 34508852.0, None, None, None], ['2017-07-14', 34.9, 35.175, 34.715, 35.01, None, 1164391.0, 40704830.0, None, None, None], ['2017-07-13', 34.87, 34.995, 34.7, 34.84, None, 909972.0, 31695835.0, None, None, None], ['2017-07-12', 34.16, 34.92, 34.13, 34.815, None, 1203034.0, 41655881.0, None, None, None], ['2017-07-11', 34.59, 34.59, 33.96, 34.17, None, 1193468.0, 40775450.0, None, None, None], ['2017-07-10', 34.145, 34.585, 34.145, 34.43, None, 1113906.0, 38372776.0, None, None, None], ['2017-07-07', 34.07, 34.08, 33.78, 33.95, None, 1155041.0, 39171093.0, None, None, None], ['2017-07-06', 34.525, 34.59, 33.74, 34.1, None, 1661288.0, 56571459.0, None, None, None], ['2017-07-05', 34.34, 34.45, 34.055, 34.445, None, 907138.0, 31133385.0, None, None, None], ['2017-07-04', 34.415, 34.54, 34.25, 34.405, None, 929186.0, 31955883.0, None, None, None], ['2017-07-03', 34.895, 34.91, 34.3, 34.53, None, 1081145.0, 37334057.0, None, None, None], ['2017-06-30', 34.49, 34.83, 34.205, 34.765, None, 1743689.0, 60419168.0, None, None, None], ['2017-06-29', 34.92, 35.215, 34.215, 34.545, None, 1702233.0, 58941623.0, None, None, None], ['2017-06-28', 34.96, 35.15, 34.6, 34.8, None, 1352427.0, 47155417.0, None, None, None], ['2017-06-27', 35.5, 35.59, 35.04, 35.125, None, 971323.0, 34191541.0, None, None, None], ['2017-06-26', 35.64, 35.78, 35.45, 35.5, None, 711806.0, 25324155.0, None, None, None], ['2017-06-23', 35.375, 35.74, 35.365, 35.6, None, 605987.0, 21578659.0, None, None, None], ['2017-06-22', 35.52, 35.555, 34.55, 35.4, None, 1133172.0, 40077360.0, None, None, None], ['2017-06-21', 35.995, 35.995, 35.42, 35.5, None, 1109333.0, 39488323.0, None, None, None], ['2017-06-20', 36.38, 36.385, 35.86, 35.965, None, 1102387.0, 39733201.0, None, None, None], ['2017-06-19', 36.45, 36.565, 36.155, 36.27, None, 751509.0, 27310028.0, None, None, None], ['2017-06-16', 36.285, 36.495, 36.035, 36.35, None, 2708923.0, 98420026.0, None, None, None], ['2017-06-15', 36.345, 36.7, 36.125, 36.185, None, 1394942.0, 50666215.0, None, None, None], ['2017-06-14', 35.95, 36.47, 35.81, 36.275, None, 1474349.0, 53508257.0, None, None, None], ['2017-06-13', 35.775, 36.085, 35.65, 35.83, None, 899809.0, 32285012.0, None, None, None], ['2017-06-12', 35.905, 36.0, 35.405, 35.61, None, 964494.0, 34340062.0, None, None, None], ['2017-06-09', 35.8, 36.03, 35.67, 35.995, None, 855293.0, 30725496.0, None, None, None], ['2017-06-08', 36.26, 36.43, 35.855, 35.89, None, 910894.0, 32860828.0, None, None, None], ['2017-06-07', 36.165, 36.56, 36.06, 36.28, None, 1522772.0, 55358099.0, None, None, None], ['2017-06-06', 35.87, 36.25, 35.535, 36.18, None, 2249405.0, 80988781.0, None, None, None], ['2017-06-02', 35.4, 35.915, 35.32, 35.905, None, 1509212.0, 53890609.0, None, None, None], ['2017-06-01', 34.805, 35.625, 34.635, 35.4, None, 2670943.0, 94268991.0, None, None, None], ['2017-05-31', 34.97, 35.275, 34.895, 34.975, None, 1606543.0, 56280617.0, None, None, None], ['2017-05-30', 34.815, 35.245, 34.745, 35.03, None, 1040767.0, 36479165.0, None, None, None], ['2017-05-29', 35.05, 35.065, 34.77, 34.85, None, 479585.0, 16721712.0, None, None, None], ['2017-05-26', 34.99, 35.28, 34.81, 34.98, None, 1060107.0, 37132136.0, None, None, None], ['2017-05-25', 35.1, 35.25, 34.835, 34.965, None, 744523.0, 26040310.0, None, None, None], ['2017-05-24', 34.955, 35.015, 34.305, 34.92, None, 1710780.0, 59474808.0, None, None, None], ['2017-05-23', 34.83, 34.87, 34.28, 34.495, None, 1399576.0, 48264736.0, None, None, None], ['2017-05-22', 34.39, 34.94, 34.38, 34.825, None, 1222210.0, 42488151.0, None, None, None], ['2017-05-19', 34.37, 34.73, 34.27, 34.35, None, 1819807.0, 62695579.0, None, None, None], ['2017-05-18', 34.45, 34.82, 34.185, 34.34, None, 2155789.0, 74159697.0, None, None, None], ['2017-05-17', 34.57, 34.78, 34.255, 34.66, None, 1915980.0, 66259874.0, None, None, None], ['2017-05-16', 35.655, 35.965, 35.54, 35.865, None, 1829267.0, 65483151.0, None, None, None], ['2017-05-15', 36.1, 36.185, 35.445, 35.655, None, 1621456.0, 57963632.0, None, None, None], ['2017-05-12', 35.55, 35.95, 35.505, 35.95, None, 2399764.0, 86010930.0, None, None, None], ['2017-05-11', 34.82, 35.765, 34.78, 35.55, None, 3126178.0, 110720040.0, None, None, None], ['2017-05-10', 34.65, 34.785, 34.415, 34.785, None, 1419844.0, 49222455.0, None, None, None], ['2017-05-09', 34.48, 34.97, 34.35, 34.64, None, 1750992.0, 60807785.0, None, None, None], ['2017-05-08', 34.15, 34.775, 34.15, 34.47, None, 1968354.0, 67951441.0, None, None, None], ['2017-05-05', 33.83, 34.145, 33.675, 34.085, None, 1485165.0, 50496085.0, None, None, None], ['2017-05-04', 33.685, 33.99, 33.42, 33.835, None, 1603664.0, 54226513.0, None, None, None], ['2017-05-03', 33.595, 33.765, 33.465, 33.59, None, 1248177.0, 41940157.0, None, None, None], ['2017-05-02', 33.39, 33.64, 33.045, 33.57, None, 2020909.0, 67578132.0, None, None, None], ['2017-04-28', 33.965, 33.965, 33.125, 33.235, None, 2250037.0, 74945444.0, None, None, None], ['2017-04-27', 33.75, 34.0, 33.615, 33.85, None, 1189265.0, 40232795.0, None, None, None], ['2017-04-26', 33.775, 33.985, 33.44, 33.89, None, 1173019.0, 39739022.0, None, None, None], ['2017-04-25', 33.43, 34.135, 33.33, 33.89, None, 1668864.0, 56484780.0, None, None, None], ['2017-04-24', 33.505, 33.61, 33.225, 33.38, None, 2505690.0, 83631889.0, None, None, None], ['2017-04-21', 33.645, 33.755, 33.195, 33.505, None, 1830004.0, 61197812.0, None, None, None], ['2017-04-20', 34.125, 34.175, 33.66, 33.7, None, 1464324.0, 49521743.0, None, None, None], ['2017-04-19', 34.315, 34.34, 33.98, 34.16, None, 1497935.0, 51127941.0, None, None, None], ['2017-04-18', 34.42, 34.585, 34.105, 34.375, None, 1456744.0, 49993484.0, None, None, None], ['2017-04-13', 34.3, 34.55, 34.25, 34.55, None, 1130793.0, 39005532.0, None, None, None], ['2017-04-12', 34.155, 34.525, 34.1, 34.355, None, 1418600.0, 48742545.0, None, None, None], ['2017-04-11', 33.905, 34.175, 33.89, 34.15, None, 1334812.0, 45501164.0, None, None, None], ['2017-04-10', 33.9, 33.95, 33.605, 33.945, None, 976008.0, 33022431.0, None, None, None], ['2017-04-07', 33.555, 33.89, 33.55, 33.81, None, 1245354.0, 42058867.0, None, None, None], ['2017-04-06', 33.44, 33.88, 33.44, 33.655, None, 1279422.0, 43062063.0, None, None, None], ['2017-04-05', 33.595, 33.67, 33.405, 33.51, None, 1247907.0, 41841367.0, None, None, None], ['2017-04-04', 33.14, 33.625, 33.12, 33.52, None, 1412273.0, 47301542.0, None, None, None], ['2017-04-03', 33.2, 33.2, 32.95, 33.185, None, 1098924.0, 36377790.0, None, None, None], ['2017-03-31', 32.585, 33.045, 32.55, 33.03, None, 1400053.0, 46049974.0, None, None, None], ['2017-03-30', 32.8, 32.805, 32.555, 32.715, None, 1138883.0, 37212643.0, None, None, None], ['2017-03-29', 32.45, 32.765, 32.295, 32.765, None, 1366373.0, 44540644.0, None, None, None], ['2017-03-28', 32.67, 32.765, 32.32, 32.46, None, 1358644.0, 44132290.0, None, None, None], ['2017-03-27', 32.6, 32.77, 32.405, 32.595, None, 990944.0, 32283093.0, None, None, None], ['2017-03-24', 32.52, 32.76, 32.485, 32.68, None, 948258.0, 30996490.0, None, None, None], ['2017-03-23', 32.445, 32.59, 32.32, 32.59, None, 1372593.0, 44624296.0, None, None, None], ['2017-03-22', 32.395, 32.61, 32.395, 32.445, None, 1331064.0, 43221125.0, None, None, None], ['2017-03-21', 32.58, 32.58, 32.4, 32.505, None, 1034376.0, 33593372.0, None, None, None], ['2017-03-20', 32.645, 32.745, 32.52, 32.55, None, 933804.0, 30443315.0, None, None, None], ['2017-03-17', 32.525, 32.745, 32.38, 32.71, None, 2486474.0, 81149012.0, None, None, None], ['2017-03-16', 32.49, 32.65, 32.43, 32.585, None, 2091661.0, 68106355.0, None, None, None], ['2017-03-15', 32.27, 32.27, 31.985, 32.215, None, 1154450.0, 37132114.0, None, None, None], ['2017-03-14', 32.025, 32.29, 31.92, 32.185, None, 1367474.0, 43883777.0, None, None, None], ['2017-03-13', 32.0, 32.28, 32.0, 32.1, None, 1344172.0, 43149844.0, None, None, None], ['2017-03-10', 32.765, 32.765, 32.005, 32.05, None, 2805533.0, 90397453.0, None, None, None], ['2017-03-09', 32.55, 32.87, 32.43, 32.655, None, 1521825.0, 49731230.0, None, None, None], ['2017-03-08', 32.59, 32.59, 32.27, 32.55, None, 1583434.0, 51386372.0, None, None, None], ['2017-03-07', 32.8, 33.18, 32.435, 32.62, None, 2264252.0, 74099675.0, None, None, None], ['2017-03-06', 32.665, 32.905, 32.575, 32.895, None, 1485150.0, 48649782.0, None, None, None], ['2017-03-03', 33.025, 33.04, 32.66, 32.82, None, 1780099.0, 58419579.0, None, None, None], ['2017-03-02', 33.095, 33.15, 32.89, 33.1, None, 1392995.0, 46026980.0, None, None, None], ['2017-03-01', 33.0, 33.09, 32.81, 33.06, None, 1587643.0, 52361412.0, None, None, None], ['2017-02-28', 33.025, 33.18, 32.625, 32.89, None, 2165586.0, 71214194.0, None, None, None], ['2017-02-27', 33.31, 33.315, 32.87, 33.04, None, 1242571.0, 41015807.0, None, None, None], ['2017-02-24', 33.35, 33.46, 33.085, 33.25, None, 1569880.0, 52174520.0, None, None, None], ['2017-02-23', 33.345, 33.405, 33.15, 33.35, None, 1112515.0, 37041481.0, None, None, None], ['2017-02-22', 33.0, 33.485, 32.97, 33.31, None, 1981963.0, 65992846.0, None, None, None], ['2017-02-21', 32.78, 33.075, 32.75, 32.95, None, 1321345.0, 43526332.0, None, None, None], ['2017-02-20', 33.0, 33.05, 32.835, 32.86, None, 701112.0, 23076010.0, None, None, None], ['2017-02-17', 32.83, 33.065, 32.62, 32.88, None, 1609227.0, 52901994.0, None, None, None], ['2017-02-16', 32.76, 32.94, 32.6, 32.865, None, 1368754.0, 44911948.0, None, None, None], ['2017-02-15', 32.615, 32.885, 32.535, 32.86, None, 1999129.0, 65480253.0, None, None, None], ['2017-02-14', 32.35, 32.515, 32.145, 32.505, None, 1588415.0, 51460458.0, None, None, None], ['2017-02-13', 32.355, 32.395, 32.15, 32.18, None, 1471864.0, 47486838.0, None, None, None], ['2017-02-10', 32.255, 32.29, 32.06, 32.29, None, 1471607.0, 47422355.0, None, None, None], ['2017-02-09', 32.365, 32.385, 31.935, 32.2, None, 2312232.0, 74509222.0, None, None, None], ['2017-02-08', 31.32, 32.32, 31.215, 32.2, None, 2809351.0, 89874852.0, None, None, None], ['2017-02-07', 30.67, 31.375, 30.59, 31.355, None, 1724119.0, 53663330.0, None, None, None], ['2017-02-06', 30.665, 30.835, 30.55, 30.58, None, 1514442.0, 46432461.0, None, None, None], ['2017-02-03', 30.795, 31.015, 30.66, 30.78, None, 1105674.0, 34038970.0, None, None, None], ['2017-02-02', 30.765, 30.915, 30.605, 30.665, None, 1362135.0, 41892577.0, None, None, None], ['2017-02-01', 30.63, 30.95, 30.505, 30.755, None, 1953175.0, 60079315.0, None, None, None], ['2017-01-31', 30.01, 30.365, 29.905, 30.27, None, 1571973.0, 47483499.0, None, None, None], ['2017-01-30', 30.05, 30.19, 29.745, 30.06, None, 1261362.0, 37810228.0, None, None, None], ['2017-01-27', 30.045, 30.08, 29.73, 30.015, None, 1361939.0, 40718988.0, None, None, None], ['2017-01-26', 30.07, 30.12, 29.81, 29.985, None, 1881781.0, 56384464.0, None, None, None], ['2017-01-25', 30.11, 30.145, 29.825, 29.955, None, 1918842.0, 57497812.0, None, None, None], ['2017-01-24', 30.46, 30.515, 30.05, 30.05, None, 1782575.0, 53883315.0, None, None, None], ['2017-01-23', 30.445, 30.555, 30.15, 30.385, None, 1923439.0, 58375436.0, None, None, None], ['2017-01-20', 30.42, 30.54, 30.005, 30.38, None, 1727697.0, 52422042.0, None, None, None], ['2017-01-19', 30.955, 30.955, 30.345, 30.45, None, 1737477.0, 53047844.0, None, None, None], ['2017-01-18', 30.95, 31.065, 30.775, 30.995, None, 1194228.0, 36964666.0, None, None, None], ['2017-01-17', 31.165, 31.17, 30.79, 30.9, None, 1208729.0, 37359507.0, None, None, None], ['2017-01-16', 30.93, 31.35, 30.93, 31.19, None, 920478.0, 28727634.0, None, None, None], ['2017-01-13', 31.215, 31.38, 30.88, 30.95, None, 1134887.0, 35256121.0, None, None, None], ['2017-01-12', 31.4, 31.44, 31.095, 31.16, None, 1269233.0, 39595118.0, None, None, None], ['2017-01-11', 31.39, 31.535, 31.255, 31.4, None, 1181293.0, 37089422.0, None, None, None], ['2017-01-10', 31.09, 31.525, 31.02, 31.475, None, 1241573.0, 38995926.0, None, None, None], ['2017-01-09', 31.38, 31.475, 31.025, 31.025, None, 1007257.0, 31355905.0, None, None, None], ['2017-01-06', 31.425, 31.75, 31.175, 31.25, None, 1236453.0, 38846256.0, None, None, None], ['2017-01-05', 31.065, 31.45, 31.03, 31.405, None, 1652789.0, 51686972.0, None, None, None], ['2017-01-04', 30.85, 30.96, 30.44, 30.8, None, 1265640.0, 38936241.0, None, None, None], ['2017-01-03', 31.48, 31.48, 30.745, 30.8, None, 1613584.0, 49922671.0, None, None, None], ['2017-01-02', 31.05, 31.48, 30.865, 31.35, None, 574317.0, 17953577.0, None, None, None]]\n" + ] + } + ], + "source": [ + "dataset = json['dataset']['data']\n", + "print(dataset)" + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "metadata": {}, "outputs": [], - "metadata": {} + "source": [ + "#3. Calculate what the highest and lowest opening prices were for the stock in this period.\n", + "open_prices = {}\n", + "for data in dataset:\n", + " open_prices[data[0]] = data[1]\n", + "open_max = max(open_prices, key=open_prices.get), ':', max(open_prices.values())\n", + "open_min = min(open_prices, key=open_prices.get), ':', min(open_prices.values())" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('2017-12-19', ':', 41.95)\n", + "('2017-01-31', ':', 30.01)\n" + ] + } + ], + "source": [ + "print(open_max)\n", + "print(open_min)" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('2017-11-08', ':', 1.3350000000000009)\n" + ] + } + ], + "source": [ + "#4. What was the largest change in any one day (based on High and Low price)?\n", + "change_in_day = {}\n", + "for data in dataset:\n", + " change_in_day[data[0]] = data[2] - data[3]\n", + "max_change_in_day = max(change_in_day, key=change_in_day.get), ':', max(change_in_day.values())\n", + "print(max_change_in_day)" + ] + }, + { + "cell_type": "code", + "execution_count": 97, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(('2017-05-17', 'to', '2017-05-16'), ':', 1.2050000000000054)\n" + ] + } + ], + "source": [ + "#5. What was the largest change between any two days (based on Closing Price)?\n", + "change_btw_days = {}\n", + "for i in range(len(dataset) - 1):\n", + " change_btw_days[dataset[i][0], 'to', dataset[i+1][0]] = abs(dataset[i][4] - dataset[i+1][4])\n", + "max_change_btw_days = max(change_btw_days, key=change_btw_days.get), \":\", max(change_btw_days.values())\n", + "print(max_change_btw_days)" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1356351.1746031747\n" + ] + } + ], + "source": [ + "#6. What was the average daily trading volume during this year?\n", + "volumes = []\n", + "for data in dataset:\n", + " volumes.append(data[6])\n", + "total_vol = sum(volumes)\n", + "trading_days = len(volumes)\n", + "average_vol = total_vol / trading_days\n", + "print(average_vol)" + ] + }, + { + "cell_type": "code", + "execution_count": 116, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1253768.0\n" + ] + } + ], + "source": [ + "#7. Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)\n", + "volumes.sort()\n", + "median = (volumes[int(len(volumes)/ 2)] + volumes[int(len(volumes) / 2) - 1]) / 2\n", + "print(median)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[479585.0,\n", + " 556925.0,\n", + " 574317.0,\n", + " 575476.0,\n", + " 601057.0,\n", + " 605987.0,\n", + " 608053.0,\n", + " 701112.0,\n", + " 711806.0,\n", + " 732911.0,\n", + " 741342.0,\n", + " 744523.0,\n", + " 751509.0,\n", + " 762435.0,\n", + " 765764.0,\n", + " 772531.0,\n", + " 773139.0,\n", + " 778364.0,\n", + " 786143.0,\n", + " 796432.0,\n", + " 798410.0,\n", + " 808758.0,\n", + " 813306.0,\n", + " 831703.0,\n", + " 843468.0,\n", + " 843906.0,\n", + " 848527.0,\n", + " 855293.0,\n", + " 864934.0,\n", + " 867630.0,\n", + " 873615.0,\n", + " 897147.0,\n", + " 899809.0,\n", + " 907138.0,\n", + " 909972.0,\n", + " 910894.0,\n", + " 912185.0,\n", + " 920478.0,\n", + " 929186.0,\n", + " 933429.0,\n", + " 933804.0,\n", + " 948061.0,\n", + " 948258.0,\n", + " 954629.0,\n", + " 959055.0,\n", + " 964494.0,\n", + " 971323.0,\n", + " 976008.0,\n", + " 978790.0,\n", + " 980317.0,\n", + " 981385.0,\n", + " 985055.0,\n", + " 990944.0,\n", + " 993748.0,\n", + " 994306.0,\n", + " 999933.0,\n", + " 1007257.0,\n", + " 1010748.0,\n", + " 1018883.0,\n", + " 1030081.0,\n", + " 1031791.0,\n", + " 1032164.0,\n", + " 1034376.0,\n", + " 1036110.0,\n", + " 1040767.0,\n", + " 1043727.0,\n", + " 1048069.0,\n", + " 1048127.0,\n", + " 1057105.0,\n", + " 1060107.0,\n", + " 1065404.0,\n", + " 1081145.0,\n", + " 1088678.0,\n", + " 1095660.0,\n", + " 1098924.0,\n", + " 1100747.0,\n", + " 1102387.0,\n", + " 1103999.0,\n", + " 1105674.0,\n", + " 1109333.0,\n", + " 1112515.0,\n", + " 1113906.0,\n", + " 1117782.0,\n", + " 1120392.0,\n", + " 1130793.0,\n", + " 1131777.0,\n", + " 1132496.0,\n", + " 1133172.0,\n", + " 1134887.0,\n", + " 1138883.0,\n", + " 1146901.0,\n", + " 1147296.0,\n", + " 1153084.0,\n", + " 1154450.0,\n", + " 1155041.0,\n", + " 1156950.0,\n", + " 1161907.0,\n", + " 1164391.0,\n", + " 1169809.0,\n", + " 1173019.0,\n", + " 1178023.0,\n", + " 1179431.0,\n", + " 1180003.0,\n", + " 1181293.0,\n", + " 1182044.0,\n", + " 1184571.0,\n", + " 1189265.0,\n", + " 1190945.0,\n", + " 1193468.0,\n", + " 1194228.0,\n", + " 1203034.0,\n", + " 1205912.0,\n", + " 1208729.0,\n", + " 1219659.0,\n", + " 1222210.0,\n", + " 1222826.0,\n", + " 1226265.0,\n", + " 1232461.0,\n", + " 1236453.0,\n", + " 1241573.0,\n", + " 1242571.0,\n", + " 1243035.0,\n", + " 1244383.0,\n", + " 1245354.0,\n", + " 1247907.0,\n", + " 1248177.0,\n", + " 1259359.0,\n", + " 1261362.0,\n", + " 1262464.0,\n", + " 1265640.0,\n", + " 1265947.0,\n", + " 1268298.0,\n", + " 1269233.0,\n", + " 1279422.0,\n", + " 1281333.0,\n", + " 1289642.0,\n", + " 1291761.0,\n", + " 1314959.0,\n", + " 1321182.0,\n", + " 1321345.0,\n", + " 1326481.0,\n", + " 1331064.0,\n", + " 1334812.0,\n", + " 1344172.0,\n", + " 1345183.0,\n", + " 1352427.0,\n", + " 1358644.0,\n", + " 1361939.0,\n", + " 1362135.0,\n", + " 1362626.0,\n", + " 1366373.0,\n", + " 1367474.0,\n", + " 1368754.0,\n", + " 1372593.0,\n", + " 1376033.0,\n", + " 1376672.0,\n", + " 1378274.0,\n", + " 1384516.0,\n", + " 1391082.0,\n", + " 1392995.0,\n", + " 1394942.0,\n", + " 1399576.0,\n", + " 1400053.0,\n", + " 1408279.0,\n", + " 1411562.0,\n", + " 1412273.0,\n", + " 1418600.0,\n", + " 1419844.0,\n", + " 1423857.0,\n", + " 1428589.0,\n", + " 1456744.0,\n", + " 1461266.0,\n", + " 1464324.0,\n", + " 1471607.0,\n", + " 1471864.0,\n", + " 1474349.0,\n", + " 1477381.0,\n", + " 1481919.0,\n", + " 1485150.0,\n", + " 1485165.0,\n", + " 1493605.0,\n", + " 1495815.0,\n", + " 1497935.0,\n", + " 1509212.0,\n", + " 1514442.0,\n", + " 1521825.0,\n", + " 1522772.0,\n", + " 1531366.0,\n", + " 1569880.0,\n", + " 1571973.0,\n", + " 1574374.0,\n", + " 1583434.0,\n", + " 1587643.0,\n", + " 1588415.0,\n", + " 1603664.0,\n", + " 1606543.0,\n", + " 1609227.0,\n", + " 1613584.0,\n", + " 1621456.0,\n", + " 1652789.0,\n", + " 1661288.0,\n", + " 1668864.0,\n", + " 1702233.0,\n", + " 1710780.0,\n", + " 1724119.0,\n", + " 1727697.0,\n", + " 1737477.0,\n", + " 1743689.0,\n", + " 1744271.0,\n", + " 1750992.0,\n", + " 1762497.0,\n", + " 1773417.0,\n", + " 1780099.0,\n", + " 1780610.0,\n", + " 1782575.0,\n", + " 1810355.0,\n", + " 1819807.0,\n", + " 1829267.0,\n", + " 1830004.0,\n", + " 1841312.0,\n", + " 1881781.0,\n", + " 1915980.0,\n", + " 1918842.0,\n", + " 1923439.0,\n", + " 1948545.0,\n", + " 1953175.0,\n", + " 1968354.0,\n", + " 1981963.0,\n", + " 1999129.0,\n", + " 2020909.0,\n", + " 2021500.0,\n", + " 2091661.0,\n", + " 2098187.0,\n", + " 2133301.0,\n", + " 2155789.0,\n", + " 2159905.0,\n", + " 2165586.0,\n", + " 2249405.0,\n", + " 2250037.0,\n", + " 2264252.0,\n", + " 2312232.0,\n", + " 2399764.0,\n", + " 2486474.0,\n", + " 2505690.0,\n", + " 2670943.0,\n", + " 2708923.0,\n", + " 2733044.0,\n", + " 2805533.0,\n", + " 2809351.0,\n", + " 3040962.0,\n", + " 3126178.0,\n", + " 3900972.0]" + ] + }, + "execution_count": 117, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [] }, { "cell_type": "code", "execution_count": null, - "source": [], + "metadata": {}, "outputs": [], - "metadata": {} + "source": [] } ], "metadata": { + "interpreter": { + "hash": "4885f37acae9217c235118400878352aafa7b76e66df698a1f601374f86939a7" + }, "kernelspec": { - "name": "python3", - "display_name": "Python 3.7.9 64-bit ('springboard': conda)" + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -186,12 +611,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.9" - }, - "interpreter": { - "hash": "4885f37acae9217c235118400878352aafa7b76e66df698a1f601374f86939a7" + "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/mec-3.4.1-api-mini-project/python.gitignore b/mec-3.4.1-api-mini-project/python.gitignore new file mode 100644 index 00000000..1c22fb78 --- /dev/null +++ b/mec-3.4.1-api-mini-project/python.gitignore @@ -0,0 +1,160 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ +cover/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +.pybuilder/ +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# py +# For a library or package, you might want to ignore these files since the code is +# intended to run in multiple environments; otherwise, check them in: +# .python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# poetry +# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. +# This is especially recommended for binary packages to ensure reproducibility, and is more +# commonly ignored for libraries. +# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control +#poetry.lock + +# pdm +# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. +#pdm.lock +# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it +# in version control. +# https://pdm.fming.dev/#use-with-ide +.pdm.toml + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# pytype static type analyzer +.pytype/ + +# Cython debug symbols +cython_debug/ + +# PyCharm +# JetBrains specific template is maintained in a separate JetBrains.gitignore that can +# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore +# and can be added to the global gitignore or merged into this file. For a more nuclear +# option (not recommended) you can uncomment the following to ignore the entire idea folder. +#.idea/ \ No newline at end of file From 0e1a6e3a647c2fcd61d01b1cdc3f0d46f3626b99 Mon Sep 17 00:00:00 2001 From: csmitty3 Date: Tue, 18 Oct 2022 19:06:46 -0400 Subject: [PATCH 2/7] Update .env --- mec-3.4.1-api-mini-project/.env | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mec-3.4.1-api-mini-project/.env b/mec-3.4.1-api-mini-project/.env index 5080b71a..8b137891 100644 --- a/mec-3.4.1-api-mini-project/.env +++ b/mec-3.4.1-api-mini-project/.env @@ -1 +1 @@ -API_KEY='7MadrSm5uJz-31r7rF4z' \ No newline at end of file + From 5d746f14954de9724c76dbf8c4c80c59cd265f8e Mon Sep 17 00:00:00 2001 From: Conor Smith Date: Mon, 24 Oct 2022 07:34:46 -0400 Subject: [PATCH 3/7] Mostly Finished --- .../Mini_Project_Data_Wrangling_Pandas.ipynb | 2026 +++++++++++++++-- 1 file changed, 1869 insertions(+), 157 deletions(-) diff --git a/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb b/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb index ed51607a..4c8beee7 100755 --- a/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb +++ b/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb @@ -36,14 +36,14 @@ "metadata": {}, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ - "'0.25.3'" + "'1.4.4'" ] }, + "execution_count": 2, "metadata": {}, - "execution_count": 2 + "output_type": "execute_result" } ], "source": [ @@ -162,6 +162,13 @@ "movies.head()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": {}, @@ -176,14 +183,26 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 5, "metadata": {}, "outputs": [ { - "output_type": "stream", "name": "stdout", + "output_type": "stream", "text": [ - "\nRangeIndex: 3786176 entries, 0 to 3786175\nData columns (total 6 columns):\ntitle object\nyear int64\nname object\ntype object\ncharacter object\nn float64\ndtypes: float64(1), int64(1), object(4)\nmemory usage: 173.3+ MB\n" + "\n", + "RangeIndex: 3786176 entries, 0 to 3786175\n", + "Data columns (total 6 columns):\n", + " # Column Dtype \n", + "--- ------ ----- \n", + " 0 title object \n", + " 1 year int64 \n", + " 2 name object \n", + " 3 type object \n", + " 4 character object \n", + " 5 n float64\n", + "dtypes: float64(1), int64(1), object(4)\n", + "memory usage: 173.3+ MB\n" ] } ], @@ -379,7 +398,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -407,7 +426,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -486,7 +505,7 @@ "4 #Ewankosau saranghaeyo 2015 Philippines 2015-01-21" ] }, - "execution_count": 8, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -511,9 +530,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "244914" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "len(movies)" ] @@ -527,9 +557,67 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total Batman Movies: 2\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyear
52734Batman1943
150621Batman1989
\n", + "
" + ], + "text/plain": [ + " title year\n", + "52734 Batman 1943\n", + "150621 Batman 1989" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "batman_df = movies[movies.title == 'Batman']\n", "print('Total Batman Movies:', len(batman_df))\n", @@ -545,9 +633,115 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total Batman Movies: 35\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyear
16813Batman: Anarchy2016
30236Batman Forever1995
31674Batman Untold2010
31711Scooby-Doo & Batman: the Brave and the Bold2018
41881Batman the Rise of Red Hood2018
43484Batman: Return of the Caped Crusaders2016
46333Batman & Robin1997
51811Batman Revealed2012
52734Batman1943
56029Batman Beyond: Rising Knight2014
\n", + "
" + ], + "text/plain": [ + " title year\n", + "16813 Batman: Anarchy 2016\n", + "30236 Batman Forever 1995\n", + "31674 Batman Untold 2010\n", + "31711 Scooby-Doo & Batman: the Brave and the Bold 2018\n", + "41881 Batman the Rise of Red Hood 2018\n", + "43484 Batman: Return of the Caped Crusaders 2016\n", + "46333 Batman & Robin 1997\n", + "51811 Batman Revealed 2012\n", + "52734 Batman 1943\n", + "56029 Batman Beyond: Rising Knight 2014" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "batman_df = movies[movies.title.str.contains('Batman', case=False)]\n", "print('Total Batman Movies:', len(batman_df))\n", @@ -563,9 +757,138 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyear
52734Batman1943
100056Batman and Robin1949
161439Batman Dracula1964
84327Alyas Batman at Robin1965
68364James Batman1966
161527Batman: The Movie1966
56159Batman Fights Dracula1967
168504Fight! Batman, Fight!1973
150621Batman1989
156239Alyas Batman en Robin1991
156755Batman Returns1992
63366Batman: Mask of the Phantasm1993
30236Batman Forever1995
46333Batman & Robin1997
208220Batman Begins2005
\n", + "
" + ], + "text/plain": [ + " title year\n", + "52734 Batman 1943\n", + "100056 Batman and Robin 1949\n", + "161439 Batman Dracula 1964\n", + "84327 Alyas Batman at Robin 1965\n", + "68364 James Batman 1966\n", + "161527 Batman: The Movie 1966\n", + "56159 Batman Fights Dracula 1967\n", + "168504 Fight! Batman, Fight! 1973\n", + "150621 Batman 1989\n", + "156239 Alyas Batman en Robin 1991\n", + "156755 Batman Returns 1992\n", + "63366 Batman: Mask of the Phantasm 1993\n", + "30236 Batman Forever 1995\n", + "46333 Batman & Robin 1997\n", + "208220 Batman Begins 2005" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "batman_df.sort_values(by=['year'], ascending=True).iloc[:15]" ] @@ -579,55 +902,182 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How many movies were made in the year 2017?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "len(movies[movies.year == 2017])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Section I - Q2 : How many movies were made in the year 2015?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", + "execution_count": 12, "metadata": {}, - "source": [ + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyear
143147Harry Potter and the Deathly Hallows: Part 22011
152831Harry Potter and the Deathly Hallows: Part 12010
109213Harry Potter and the Half-Blood Prince2009
50581Harry Potter and the Order of the Phoenix2007
187926Harry Potter and the Goblet of Fire2005
61957Harry Potter and the Prisoner of Azkaban2004
82791Harry Potter and the Chamber of Secrets2002
223087Harry Potter and the Sorcerer's Stone2001
\n", + "
" + ], + "text/plain": [ + " title year\n", + "143147 Harry Potter and the Deathly Hallows: Part 2 2011\n", + "152831 Harry Potter and the Deathly Hallows: Part 1 2010\n", + "109213 Harry Potter and the Half-Blood Prince 2009\n", + "50581 Harry Potter and the Order of the Phoenix 2007\n", + "187926 Harry Potter and the Goblet of Fire 2005\n", + "61957 Harry Potter and the Prisoner of Azkaban 2004\n", + "82791 Harry Potter and the Chamber of Secrets 2002\n", + "223087 Harry Potter and the Sorcerer's Stone 2001" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harry_potter_df = movies[movies.title.str.contains('Harry Potter', case=False)].sort_values(by='year', ascending=False)\n", + "harry_potter_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### How many movies were made in the year 2017?" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "11474" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(movies[movies.year == 2017])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Section I - Q2 : How many movies were made in the year 2015?" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8702" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(movies[movies['year'] == 2015])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "### Section I - Q3 : How many movies were made from 2000 till 2018?\n", "- You can chain multiple conditions using OR (`|`) as well as AND (`&`) depending on the condition" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "244914" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(movies[(movies['year'] >= 2000) & movies['year'] <= 2018])" + ] }, { "cell_type": "markdown", @@ -638,10 +1088,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "20" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(movies[movies['title'] == 'Hamlet'])" + ] }, { "cell_type": "markdown", @@ -654,10 +1117,94 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 23, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyear
55639Hamlet2000
1931Hamlet2009
227953Hamlet2011
178290Hamlet2014
186137Hamlet2015
191940Hamlet2016
244747Hamlet2017
\n", + "
" + ], + "text/plain": [ + " title year\n", + "55639 Hamlet 2000\n", + "1931 Hamlet 2009\n", + "227953 Hamlet 2011\n", + "178290 Hamlet 2014\n", + "186137 Hamlet 2015\n", + "191940 Hamlet 2016\n", + "244747 Hamlet 2017" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hamlet_df = movies[(movies['title'] == 'Hamlet') & (movies['year'] >= 2000)].sort_values(by='year')\n", + "hamlet_df" + ] }, { "cell_type": "markdown", @@ -670,10 +1217,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 26, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "27" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(cast[(cast['title'] == 'Inception') & (cast['n'].isna())])" + ] }, { "cell_type": "markdown", @@ -685,10 +1245,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 27, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "51" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(cast[(cast['title'] == 'Inception') & cast['n']])" + ] }, { "cell_type": "markdown", @@ -701,37 +1274,422 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 29, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyearnametypecharactern
590576Inception2010Leonardo DiCaprioactorCobb1.0
859993Inception2010Joseph Gordon-LevittactorArthur2.0
3387147Inception2010Ellen PageactressAriadne3.0
940923Inception2010Tom HardyactorEames4.0
2406531Inception2010Ken WatanabeactorSaito5.0
1876301Inception2010Dileep RaoactorYusuf6.0
1615709Inception2010Cillian MurphyactorRobert Fischer7.0
183937Inception2010Tom BerengeractorBrowning8.0
2765969Inception2010Marion CotillardactressMal9.0
1826027Inception2010Pete PostlethwaiteactorMaurice Fischer10.0
\n", + "
" + ], + "text/plain": [ + " title year name type character n\n", + "590576 Inception 2010 Leonardo DiCaprio actor Cobb 1.0\n", + "859993 Inception 2010 Joseph Gordon-Levitt actor Arthur 2.0\n", + "3387147 Inception 2010 Ellen Page actress Ariadne 3.0\n", + "940923 Inception 2010 Tom Hardy actor Eames 4.0\n", + "2406531 Inception 2010 Ken Watanabe actor Saito 5.0\n", + "1876301 Inception 2010 Dileep Rao actor Yusuf 6.0\n", + "1615709 Inception 2010 Cillian Murphy actor Robert Fischer 7.0\n", + "183937 Inception 2010 Tom Berenger actor Browning 8.0\n", + "2765969 Inception 2010 Marion Cotillard actress Mal 9.0\n", + "1826027 Inception 2010 Pete Postlethwaite actor Maurice Fischer 10.0" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topten_inception = cast[(cast['title'] == 'Inception')].sort_values(by='n').iloc[:10]\n", + "topten_inception" + ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Section I - Q9:\n", - "\n", - "(A) List all movies where there was a character 'Albus Dumbledore' \n", - "\n", - "(B) Now modify the above to show only the actors who played the character 'Albus Dumbledore'\n", - "- For Part (B) remember the same actor might play the same role in multiple movies" + "### Section I - Q9:\n", + "\n", + "(A) List all movies where there was a character 'Albus Dumbledore' \n", + "\n", + "(B) Now modify the above to show only the actors who played the character 'Albus Dumbledore'\n", + "- For Part (B) remember the same actor might play the same role in multiple movies" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyearnametypecharactern
704984Epic Movie2007Dane FarwellactorAlbus Dumbledore17.0
792421Harry Potter and the Goblet of Fire2005Michael GambonactorAlbus Dumbledore37.0
792423Harry Potter and the Order of the Phoenix2007Michael GambonactorAlbus Dumbledore36.0
792424Harry Potter and the Prisoner of Azkaban2004Michael GambonactorAlbus Dumbledore27.0
947789Harry Potter and the Chamber of Secrets2002Richard HarrisactorAlbus Dumbledore32.0
947790Harry Potter and the Sorcerer's Stone2001Richard HarrisactorAlbus Dumbledore1.0
1685537Ultimate Hero Project2013George (X) O'ConnoractorAlbus DumbledoreNaN
2248085Potter2015Timothy TedmansonactorAlbus DumbledoreNaN
\n", + "
" + ], + "text/plain": [ + " title year name \\\n", + "704984 Epic Movie 2007 Dane Farwell \n", + "792421 Harry Potter and the Goblet of Fire 2005 Michael Gambon \n", + "792423 Harry Potter and the Order of the Phoenix 2007 Michael Gambon \n", + "792424 Harry Potter and the Prisoner of Azkaban 2004 Michael Gambon \n", + "947789 Harry Potter and the Chamber of Secrets 2002 Richard Harris \n", + "947790 Harry Potter and the Sorcerer's Stone 2001 Richard Harris \n", + "1685537 Ultimate Hero Project 2013 George (X) O'Connor \n", + "2248085 Potter 2015 Timothy Tedmanson \n", + "\n", + " type character n \n", + "704984 actor Albus Dumbledore 17.0 \n", + "792421 actor Albus Dumbledore 37.0 \n", + "792423 actor Albus Dumbledore 36.0 \n", + "792424 actor Albus Dumbledore 27.0 \n", + "947789 actor Albus Dumbledore 32.0 \n", + "947790 actor Albus Dumbledore 1.0 \n", + "1685537 actor Albus Dumbledore NaN \n", + "2248085 actor Albus Dumbledore NaN " + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dumbledore = cast[cast['character'] == 'Albus Dumbledore']\n", + "dumbledore" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyearnametypecharactern
792421Harry Potter and the Goblet of Fire2005Michael GambonactorAlbus Dumbledore37.0
792423Harry Potter and the Order of the Phoenix2007Michael GambonactorAlbus Dumbledore36.0
792424Harry Potter and the Prisoner of Azkaban2004Michael GambonactorAlbus Dumbledore27.0
947789Harry Potter and the Chamber of Secrets2002Richard HarrisactorAlbus Dumbledore32.0
947790Harry Potter and the Sorcerer's Stone2001Richard HarrisactorAlbus Dumbledore1.0
\n", + "
" + ], + "text/plain": [ + " title year name \\\n", + "792421 Harry Potter and the Goblet of Fire 2005 Michael Gambon \n", + "792423 Harry Potter and the Order of the Phoenix 2007 Michael Gambon \n", + "792424 Harry Potter and the Prisoner of Azkaban 2004 Michael Gambon \n", + "947789 Harry Potter and the Chamber of Secrets 2002 Richard Harris \n", + "947790 Harry Potter and the Sorcerer's Stone 2001 Richard Harris \n", + "\n", + " type character n \n", + "792421 actor Albus Dumbledore 37.0 \n", + "792423 actor Albus Dumbledore 36.0 \n", + "792424 actor Albus Dumbledore 27.0 \n", + "947789 actor Albus Dumbledore 32.0 \n", + "947790 actor Albus Dumbledore 1.0 " + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hp_dumbledore = dumbledore[dumbledore['title'].str.contains('Harry Potter', case=False)]\n", + "hp_dumbledore" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": {}, @@ -745,17 +1703,243 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 34, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "62" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(cast[cast['name'] == 'Keanu Reeves'])" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 37, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyearnametypecharactern
1892390The Matrix1999Keanu ReevesactorNeo1.0
1892397The Replacements2000Keanu ReevesactorShane Falco1.0
1892358Hard Ball2001Keanu ReevesactorConor O'Neill1.0
1892383Sweet November2001Keanu ReevesactorNelson Moss1.0
1892348Constantine2005Keanu ReevesactorJohn Constantine1.0
1892388The Lake House2006Keanu ReevesactorAlex Wyler1.0
1892382Street Kings2008Keanu ReevesactorDetective Tom Ludlow1.0
1892385The Day the Earth Stood Still2008Keanu ReevesactorKlaatu1.0
1892359Henry's Crime2010Keanu ReevesactorHenry Torne1.0
189234247 Ronin2013Keanu ReevesactorKai1.0
1892361John Wick2014Keanu ReevesactorJohn Wick1.0
1892366Knock Knock2015Keanu ReevesactorEvan1.0
1892399The Whole Truth2016Keanu ReevesactorRamsey1.0
1892362John Wick: Chapter 22017Keanu ReevesactorJohn Wick1.0
1892378Siberia2018Keanu ReevesactorLucas Hill1.0
\n", + "
" + ], + "text/plain": [ + " title year name type \\\n", + "1892390 The Matrix 1999 Keanu Reeves actor \n", + "1892397 The Replacements 2000 Keanu Reeves actor \n", + "1892358 Hard Ball 2001 Keanu Reeves actor \n", + "1892383 Sweet November 2001 Keanu Reeves actor \n", + "1892348 Constantine 2005 Keanu Reeves actor \n", + "1892388 The Lake House 2006 Keanu Reeves actor \n", + "1892382 Street Kings 2008 Keanu Reeves actor \n", + "1892385 The Day the Earth Stood Still 2008 Keanu Reeves actor \n", + "1892359 Henry's Crime 2010 Keanu Reeves actor \n", + "1892342 47 Ronin 2013 Keanu Reeves actor \n", + "1892361 John Wick 2014 Keanu Reeves actor \n", + "1892366 Knock Knock 2015 Keanu Reeves actor \n", + "1892399 The Whole Truth 2016 Keanu Reeves actor \n", + "1892362 John Wick: Chapter 2 2017 Keanu Reeves actor \n", + "1892378 Siberia 2018 Keanu Reeves actor \n", + "\n", + " character n \n", + "1892390 Neo 1.0 \n", + "1892397 Shane Falco 1.0 \n", + "1892358 Conor O'Neill 1.0 \n", + "1892383 Nelson Moss 1.0 \n", + "1892348 John Constantine 1.0 \n", + "1892388 Alex Wyler 1.0 \n", + "1892382 Detective Tom Ludlow 1.0 \n", + "1892385 Klaatu 1.0 \n", + "1892359 Henry Torne 1.0 \n", + "1892342 Kai 1.0 \n", + "1892361 John Wick 1.0 \n", + "1892366 Evan 1.0 \n", + "1892399 Ramsey 1.0 \n", + "1892362 John Wick 1.0 \n", + "1892378 Lucas Hill 1.0 " + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "keanu_reeves = cast[(cast['name'] == 'Keanu Reeves') & (cast['year'] >= 1999) & (cast['n']) & (cast['n'] == 1.0)].sort_values(by='year')\n", + "keanu_reeves" + ] }, { "cell_type": "markdown", @@ -770,17 +1954,45 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 48, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "234635" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fifties = cast[(cast['year'] >= 1950) & (cast['year'] <= 1960)]\n", + "len(fifties[(fifties['type'] == 'actress') | (fifties['type'] == 'actor')])" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 49, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "1452413" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tenyears = cast[(cast['year'] >= 2007) & (cast['year'] <= 2017)]\n", + "len(tenyears[(tenyears['type'] == 'actress') | (tenyears['type'] == 'actor')])" + ] }, { "cell_type": "markdown", @@ -797,24 +2009,64 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 56, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "153233" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "millenium = cast[cast['year'] >= 200]\n", + "len(millenium[millenium['n'] == 1.0])" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 59, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "2174370" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(millenium[(millenium['n'].notnull()) & (millenium['n'] != 1.0)])" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 60, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "1458573" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(millenium[millenium['n'].isna()])" + ] }, { "cell_type": "markdown", @@ -832,9 +2084,30 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 61, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "Hamlet 20\n", + "Carmen 17\n", + "Macbeth 16\n", + "Maya 12\n", + "Temptation 12\n", + "The Outsider 12\n", + "Freedom 11\n", + "The Three Musketeers 11\n", + "Honeymoon 11\n", + "Othello 11\n", + "Name: title, dtype: int64" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "top_ten = movies.title.value_counts()[:10]\n", "top_ten" @@ -849,9 +2122,30 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 62, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "top_ten.plot(kind='barh')" ] @@ -865,10 +2159,28 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 70, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "year\n", + "2009 6125\n", + "2008 5151\n", + "2007 4467\n", + "Name: year, dtype: int64" + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "top3 = movies[(movies['year'] >= 2000) & (movies['year'] <= 2009)].groupby('year')['year'].count().sort_values(ascending=False).iloc[:3]\n", + "top3" + ] }, { "cell_type": "markdown", @@ -881,10 +2193,28 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 75, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "ename": "TypeError", + "evalue": "cannot convert the series to ", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "Input \u001b[0;32mIn [75]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmath\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m movies[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mdecade\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m math\u001b[38;5;241m.\u001b[39mfloor(\u001b[38;5;28;43mfloat\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mmovies\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43myear\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m/\u001b[39m \u001b[38;5;241m10\u001b[39m)\n\u001b[1;32m 3\u001b[0m movies\u001b[38;5;241m.\u001b[39mhead()\n", + "File \u001b[0;32m~/Desktop/mec-mini-projects/mec-5.3.10-data-wranging-with-pandas-mini-project/env/lib/python3.10/site-packages/pandas/core/series.py:191\u001b[0m, in \u001b[0;36m_coerce_method..wrapper\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 190\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m converter(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39miloc[\u001b[38;5;241m0\u001b[39m])\n\u001b[0;32m--> 191\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcannot convert the series to \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mconverter\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", + "\u001b[0;31mTypeError\u001b[0m: cannot convert the series to " + ] + } + ], + "source": [ + "import math\n", + "\n", + "movies['decade'] = math.floor(float(movies['year']) / 10)\n", + "movies.head()" + ] }, { "cell_type": "markdown", @@ -901,24 +2231,96 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 79, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "character\n", + "Himself 20746\n", + "Dancer 12477\n", + "Extra 11948\n", + "Reporter 8434\n", + "Student 7773\n", + "Doctor 7669\n", + "Party Guest 7245\n", + "Policeman 7029\n", + "Nurse 6999\n", + "Bartender 6802\n", + "Name: character, dtype: int64" + ] + }, + "execution_count": 79, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cast.groupby('character')['character'].count().sort_values(ascending=False).iloc[:10]" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 85, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "character name name \n", + "Herself Queen Elizabeth II Queen Elizabeth II 12\n", + " Joyce Brothers Joyce Brothers 9\n", + " Luisa Horga Luisa Horga 9\n", + " Mar?a Luisa (V) Mart?n Mar?a Luisa (V) Mart?n 9\n", + " Hillary Clinton Hillary Clinton 8\n", + " Margaret Thatcher Margaret Thatcher 8\n", + " In?s J. Southern In?s J. Southern 6\n", + " Marta Berrocal Marta Berrocal 6\n", + " Oprah Winfrey Oprah Winfrey 6\n", + " Marilyn Monroe Marilyn Monroe 6\n", + "Name: name, dtype: int64" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cast[cast['character'] == 'Herself'].groupby(['character', 'name'])['name'].value_counts().sort_values(ascending=False).iloc[:10]" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 86, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "character name name \n", + "Himself Adolf Hitler Adolf Hitler 99\n", + " Richard Nixon Richard Nixon 44\n", + " Ronald Reagan Ronald Reagan 41\n", + " John F. Kennedy John F. Kennedy 37\n", + " George W. Bush George W. Bush 25\n", + " Winston Churchill Winston Churchill 24\n", + " Martin Luther King Martin Luther King 23\n", + " Bill Clinton Bill Clinton 22\n", + " Ron Jeremy Ron Jeremy 22\n", + " Franklin D. Roosevelt Franklin D. Roosevelt 21\n", + "Name: name, dtype: int64" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cast[cast['character'] == 'Himself'].groupby(['character', 'name'])['name'].value_counts().sort_values(ascending=False).iloc[:10]" + ] }, { "cell_type": "markdown", @@ -935,17 +2337,65 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 88, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "character\n", + "Zombie 6264\n", + "Zombie Horde 206\n", + "Zombie - Protestor - Victim 78\n", + "Zombie Extra 70\n", + "Zombie Dancer 43\n", + "Zombie Girl 36\n", + "Zombie #1 36\n", + "Zombie #2 31\n", + "Zombie Vampire 25\n", + "Zombie Victim 22\n", + "Name: character, dtype: int64" + ] + }, + "execution_count": 88, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cast[cast['character'].str.startswith('Zombie')].groupby('character')['character'].count().sort_values(ascending=False).iloc[:10]" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 89, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "character\n", + "Policeman 7029\n", + "Police Officer 4808\n", + "Police Inspector 742\n", + "Police Sergeant 674\n", + "Police officer 539\n", + "Police 456\n", + "Policewoman 415\n", + "Police Chief 410\n", + "Police Captain 387\n", + "Police Commissioner 337\n", + "Name: character, dtype: int64" + ] + }, + "execution_count": 89, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cast[cast['character'].str.startswith('Police')].groupby('character')['character'].count().sort_values(ascending=False).iloc[:10]" + ] }, { "cell_type": "markdown", @@ -956,10 +2406,53 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 92, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "year\n", + "1985 1\n", + "1986 3\n", + "1988 4\n", + "1989 2\n", + "1990 2\n", + "1991 3\n", + "1992 1\n", + "1993 4\n", + "1994 1\n", + "1995 2\n", + "1996 2\n", + "1997 2\n", + "1999 3\n", + "2000 3\n", + "2001 2\n", + "2003 3\n", + "2005 3\n", + "2006 2\n", + "2008 2\n", + "2009 1\n", + "2010 1\n", + "2012 1\n", + "2013 2\n", + "2014 1\n", + "2015 1\n", + "2016 5\n", + "2017 3\n", + "2018 1\n", + "2019 1\n", + "Name: year, dtype: int64" + ] + }, + "execution_count": 92, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cast[cast['name'] == 'Keanu Reeves'].groupby('year')['year'].count()" + ] }, { "cell_type": "markdown", @@ -970,10 +2463,28 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 100, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "keanu_reeves = cast[cast['name'] == 'Keanu Reeves']\n", + "plt.scatter(x=keanu_reeves['year'], y=keanu_reeves['n'])\n", + "plt.xlabel('Year')\n", + "plt.ylabel('N')\n", + "plt.title('Keanu Reeves cast postions through his career over the years')\n", + "plt.show()" + ] }, { "cell_type": "markdown", @@ -1025,10 +2536,181 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 120, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titleyearnametypecharactern
1723645The Muppet Movie1979Frank OzactorMiss Piggy2.0
1723652The Muppet Movie1979Frank OzactorMotorcycle Guy2.0
1723651The Muppet Movie1979Frank OzactorSwedish Chef (assistant)2.0
1723650The Muppet Movie1979Frank OzactorMarvin Suggs2.0
1723649The Muppet Movie1979Frank OzactorDoc Hopper's Men2.0
.....................
1723661Zathura: A Space Adventure2005Frank OzactorRobot6.0
1723616Sesame Street: C is for Cookie Monster2010Frank OzactorCookie MonsterNaN
1723605Inside Out2015Frank OzactorSubconscious Guard Dave14.0
1723631The Great Gilly Hopkins2015Frank OzactorCookie Monster20.0
1723623Star Wars: Episode VII - The Force Awakens2015Frank OzactorYodaNaN
\n", + "

64 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " title year name type \\\n", + "1723645 The Muppet Movie 1979 Frank Oz actor \n", + "1723652 The Muppet Movie 1979 Frank Oz actor \n", + "1723651 The Muppet Movie 1979 Frank Oz actor \n", + "1723650 The Muppet Movie 1979 Frank Oz actor \n", + "1723649 The Muppet Movie 1979 Frank Oz actor \n", + "... ... ... ... ... \n", + "1723661 Zathura: A Space Adventure 2005 Frank Oz actor \n", + "1723616 Sesame Street: C is for Cookie Monster 2010 Frank Oz actor \n", + "1723605 Inside Out 2015 Frank Oz actor \n", + "1723631 The Great Gilly Hopkins 2015 Frank Oz actor \n", + "1723623 Star Wars: Episode VII - The Force Awakens 2015 Frank Oz actor \n", + "\n", + " character n \n", + "1723645 Miss Piggy 2.0 \n", + "1723652 Motorcycle Guy 2.0 \n", + "1723651 Swedish Chef (assistant) 2.0 \n", + "1723650 Marvin Suggs 2.0 \n", + "1723649 Doc Hopper's Men 2.0 \n", + "... ... ... \n", + "1723661 Robot 6.0 \n", + "1723616 Cookie Monster NaN \n", + "1723605 Subconscious Guard Dave 14.0 \n", + "1723631 Cookie Monster 20.0 \n", + "1723623 Yoda NaN \n", + "\n", + "[64 rows x 6 columns]" + ] + }, + "execution_count": 120, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cast[cast['name'] == 'Frank Oz'].sort_values(by='year')" + ] }, { "cell_type": "markdown", @@ -1083,10 +2765,25 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 131, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "summer = release_dates[(release_dates['title'].str.contains('Summer', case=False)) & (release_dates['country'] == 'USA')]\n", + "plt.hist(x=summer['date'].dt.month)\n", + "plt.show()" + ] }, { "cell_type": "markdown", @@ -1100,10 +2797,25 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 135, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "action = release_dates[(release_dates['title'].str.contains('Action', case=False)) & (release_dates['country'] == 'USA')]\n", + "plt.hist(x=action['date'].dt.isocalendar().week)\n", + "plt.show()" + ] }, { "cell_type": "markdown", @@ -1153,7 +2865,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1167,9 +2879,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.6-final" + "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 2 -} \ No newline at end of file +} From d0e02ba862a97510e38d85fb0661f7f5bf7044b6 Mon Sep 17 00:00:00 2001 From: Conor Smith Date: Mon, 24 Oct 2022 12:57:59 -0400 Subject: [PATCH 4/7] Completed Project --- .../Mini_Project_Data_Wrangling_Pandas.ipynb | 880 +++++++++++++----- 1 file changed, 662 insertions(+), 218 deletions(-) diff --git a/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb b/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb index 4c8beee7..0b20583e 100755 --- a/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb +++ b/mec-5.3.10-data-wranging-with-pandas-mini-project/Mini_Project_Data_Wrangling_Pandas.ipynb @@ -398,7 +398,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -426,7 +426,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -505,7 +505,7 @@ "4 #Ewankosau saranghaeyo 2015 Philippines 2015-01-21" ] }, - "execution_count": 7, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -530,7 +530,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "metadata": {}, "outputs": [ { @@ -539,7 +539,7 @@ "244914" ] }, - "execution_count": 8, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -557,7 +557,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 10, "metadata": {}, "outputs": [ { @@ -613,7 +613,7 @@ "150621 Batman 1989" ] }, - "execution_count": 9, + "execution_count": 10, "metadata": {}, "output_type": "execute_result" } @@ -633,7 +633,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 11, "metadata": {}, "outputs": [ { @@ -737,7 +737,7 @@ "56029 Batman Beyond: Rising Knight 2014" ] }, - "execution_count": 10, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } @@ -757,7 +757,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 12, "metadata": {}, "outputs": [ { @@ -884,7 +884,7 @@ "208220 Batman Begins 2005" ] }, - "execution_count": 11, + "execution_count": 12, "metadata": {}, "output_type": "execute_result" } @@ -902,7 +902,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 13, "metadata": {}, "outputs": [ { @@ -987,7 +987,7 @@ "223087 Harry Potter and the Sorcerer's Stone 2001" ] }, - "execution_count": 12, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -1006,7 +1006,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 14, "metadata": {}, "outputs": [ { @@ -1015,7 +1015,7 @@ "11474" ] }, - "execution_count": 13, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } @@ -1033,7 +1033,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 15, "metadata": {}, "outputs": [ { @@ -1042,7 +1042,7 @@ "8702" ] }, - "execution_count": 14, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -1061,7 +1061,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 16, "metadata": {}, "outputs": [ { @@ -1070,7 +1070,7 @@ "244914" ] }, - "execution_count": 15, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -1088,7 +1088,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 17, "metadata": {}, "outputs": [ { @@ -1097,7 +1097,7 @@ "20" ] }, - "execution_count": 16, + "execution_count": 17, "metadata": {}, "output_type": "execute_result" } @@ -1117,7 +1117,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 18, "metadata": {}, "outputs": [ { @@ -1196,7 +1196,7 @@ "244747 Hamlet 2017" ] }, - "execution_count": 23, + "execution_count": 18, "metadata": {}, "output_type": "execute_result" } @@ -1217,7 +1217,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 19, "metadata": {}, "outputs": [ { @@ -1226,7 +1226,7 @@ "27" ] }, - "execution_count": 26, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -1245,7 +1245,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 20, "metadata": {}, "outputs": [ { @@ -1254,7 +1254,7 @@ "51" ] }, - "execution_count": 27, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -1274,7 +1274,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 21, "metadata": {}, "outputs": [ { @@ -1415,7 +1415,7 @@ "1826027 Inception 2010 Pete Postlethwaite actor Maurice Fischer 10.0" ] }, - "execution_count": 29, + "execution_count": 21, "metadata": {}, "output_type": "execute_result" } @@ -1439,7 +1439,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 22, "metadata": {}, "outputs": [ { @@ -1570,7 +1570,7 @@ "2248085 actor Albus Dumbledore NaN " ] }, - "execution_count": 31, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } @@ -1582,7 +1582,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 23, "metadata": {}, "outputs": [ { @@ -1680,7 +1680,7 @@ "947790 actor Albus Dumbledore 1.0 " ] }, - "execution_count": 33, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } @@ -1703,7 +1703,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 24, "metadata": {}, "outputs": [ { @@ -1712,7 +1712,7 @@ "62" ] }, - "execution_count": 34, + "execution_count": 24, "metadata": {}, "output_type": "execute_result" } @@ -1723,7 +1723,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 25, "metadata": {}, "outputs": [ { @@ -1931,7 +1931,7 @@ "1892378 Lucas Hill 1.0 " ] }, - "execution_count": 37, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } @@ -1954,7 +1954,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 26, "metadata": {}, "outputs": [ { @@ -1963,7 +1963,7 @@ "234635" ] }, - "execution_count": 48, + "execution_count": 26, "metadata": {}, "output_type": "execute_result" } @@ -1975,7 +1975,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 27, "metadata": {}, "outputs": [ { @@ -1984,7 +1984,7 @@ "1452413" ] }, - "execution_count": 49, + "execution_count": 27, "metadata": {}, "output_type": "execute_result" } @@ -2009,7 +2009,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 28, "metadata": {}, "outputs": [ { @@ -2018,7 +2018,7 @@ "153233" ] }, - "execution_count": 56, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } @@ -2030,7 +2030,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 29, "metadata": {}, "outputs": [ { @@ -2039,7 +2039,7 @@ "2174370" ] }, - "execution_count": 59, + "execution_count": 29, "metadata": {}, "output_type": "execute_result" } @@ -2050,7 +2050,7 @@ }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 30, "metadata": {}, "outputs": [ { @@ -2059,7 +2059,7 @@ "1458573" ] }, - "execution_count": 60, + "execution_count": 30, "metadata": {}, "output_type": "execute_result" } @@ -2084,7 +2084,7 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 31, "metadata": {}, "outputs": [ { @@ -2103,7 +2103,7 @@ "Name: title, dtype: int64" ] }, - "execution_count": 61, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } @@ -2122,7 +2122,7 @@ }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 32, "metadata": {}, "outputs": [ { @@ -2131,7 +2131,7 @@ "" ] }, - "execution_count": 62, + "execution_count": 32, "metadata": {}, "output_type": "execute_result" }, @@ -2159,7 +2159,7 @@ }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 33, "metadata": {}, "outputs": [ { @@ -2172,7 +2172,7 @@ "Name: year, dtype: int64" ] }, - "execution_count": 70, + "execution_count": 33, "metadata": {}, "output_type": "execute_result" } @@ -2193,27 +2193,33 @@ }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 60, "metadata": {}, "outputs": [ { - "ename": "TypeError", - "evalue": "cannot convert the series to ", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", - "Input \u001b[0;32mIn [75]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmath\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m movies[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mdecade\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m math\u001b[38;5;241m.\u001b[39mfloor(\u001b[38;5;28;43mfloat\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mmovies\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43myear\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m/\u001b[39m \u001b[38;5;241m10\u001b[39m)\n\u001b[1;32m 3\u001b[0m movies\u001b[38;5;241m.\u001b[39mhead()\n", - "File \u001b[0;32m~/Desktop/mec-mini-projects/mec-5.3.10-data-wranging-with-pandas-mini-project/env/lib/python3.10/site-packages/pandas/core/series.py:191\u001b[0m, in \u001b[0;36m_coerce_method..wrapper\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 190\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m converter(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39miloc[\u001b[38;5;241m0\u001b[39m])\n\u001b[0;32m--> 191\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcannot convert the series to \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mconverter\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", - "\u001b[0;31mTypeError\u001b[0m: cannot convert the series to " - ] + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" } ], "source": [ "import math\n", - "\n", - "movies['decade'] = math.floor(float(movies['year']) / 10)\n", - "movies.head()" + "decade = []\n", + "for year in movies['year']:\n", + " decade.append(math.floor(year/10) * 10)\n", + "movies['decade'] = decade\n", + "decade_df = pd.DataFrame(movies.groupby('decade')['title'].count())\n", + "plt.barh(y=decade_df.index, width=decade_df['title'])\n", + "plt.xlabel('Number of Films')\n", + "plt.ylabel('Decade')\n", + "plt.ylim(1880, 2030)\n", + "plt.title('Number of films released per decade')\n", + "plt.show()" ] }, { @@ -2231,7 +2237,7 @@ }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 35, "metadata": {}, "outputs": [ { @@ -2251,7 +2257,7 @@ "Name: character, dtype: int64" ] }, - "execution_count": 79, + "execution_count": 35, "metadata": {}, "output_type": "execute_result" } @@ -2262,7 +2268,7 @@ }, { "cell_type": "code", - "execution_count": 85, + "execution_count": 36, "metadata": {}, "outputs": [ { @@ -2282,7 +2288,7 @@ "Name: name, dtype: int64" ] }, - "execution_count": 85, + "execution_count": 36, "metadata": {}, "output_type": "execute_result" } @@ -2293,7 +2299,7 @@ }, { "cell_type": "code", - "execution_count": 86, + "execution_count": 37, "metadata": {}, "outputs": [ { @@ -2313,7 +2319,7 @@ "Name: name, dtype: int64" ] }, - "execution_count": 86, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } @@ -2337,7 +2343,7 @@ }, { "cell_type": "code", - "execution_count": 88, + "execution_count": 38, "metadata": {}, "outputs": [ { @@ -2357,7 +2363,7 @@ "Name: character, dtype: int64" ] }, - "execution_count": 88, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" } @@ -2368,7 +2374,7 @@ }, { "cell_type": "code", - "execution_count": 89, + "execution_count": 39, "metadata": {}, "outputs": [ { @@ -2388,7 +2394,7 @@ "Name: character, dtype: int64" ] }, - "execution_count": 89, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } @@ -2406,7 +2412,7 @@ }, { "cell_type": "code", - "execution_count": 92, + "execution_count": 40, "metadata": {}, "outputs": [ { @@ -2445,7 +2451,7 @@ "Name: year, dtype: int64" ] }, - "execution_count": 92, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } @@ -2463,7 +2469,7 @@ }, { "cell_type": "code", - "execution_count": 100, + "execution_count": 41, "metadata": {}, "outputs": [ { @@ -2495,10 +2501,28 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 69, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkQAAAHFCAYAAAAT5Oa6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAABEA0lEQVR4nO3de1xVVf7/8feRqyAcRblICjpaeCMsLUEtvAuKl181VjqIpWbz1dRwmrTJUes3idV0mTRz+pbWjDmOee0ymOUtE7wNmJfEezolaIqgWCKyfn/080xHLgKCgPv1fDzO4+Fee+21P+vsczrv9tmbYzPGGAEAAFhYneouAAAAoLoRiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiFCrLViwQDabTdu3by92fVxcnJo1a3Zji/r/rtR29OjRcm+7efNmTZ8+XWfPni3zNm+88YZatmwpd3d32Ww2nT17ViNGjCgy/2bNmmnEiBHlrulmNH36dNlstkof74cffqi0MWuKo0ePymaz6eWXX67Uca/nfXIjrF+/XjabTevXr6/uUlDFCERADbR582bNmDGjzIEoPT1d48ePV/fu3bV27VqlpKTIx8dHU6dO1fLly6u2WAC4CbhWdwEArt+ePXskSaNHj9bdd9/taG/RokV1lQQAtQpniGA5c+bM0b333quAgAB5e3srPDxcL774oi5duuTUr1u3bmrXrp1SUlLUuXNn1a1bV82aNdP8+fMlSZ988onuvPNOeXl5KTw8XMnJyWXa/+eff66ePXvK19dXXl5e6tKli7744gvH+unTp+upp56SJDVv3lw2m63UU/bdunXTb37zG0lSp06dZLPZHF+JFfeV2dWufCXwwQcf6Omnn1bjxo1Vr149DRgwQFlZWTp37pwee+wxNWrUSI0aNdIjjzyi8+fPO42xZMkSderUSXa7XV5eXvrVr36lRx999JrPhc1m07hx4zR//nyFhYWpbt266tixo1JTU2WM0UsvvaTmzZurXr166tGjhw4ePOi0/Zo1azRo0CA1adJEnp6eatmypcaMGVPsV1affPKJ2rdvLw8PDzVv3rzEr36MMXrzzTfVvn171a1bVw0aNNADDzygw4cPX3M+Vxw/flz33XeffH19Zbfb9Zvf/EanTp1yrB85cqT8/Px04cKFItv26NFDbdu2veY+rvU6kqSDBw/qkUce0a233iovLy/dcsstGjBggHbt2lVkvLNnz2rSpEn61a9+JQ8PDwUEBKhfv37at29fkb6vvPKK47hERUUpNTW1LE+LUlNT1aVLF3l6eio4OFhTpkwp8r67YvHixYqKipK3t7fq1aunvn37Ki0trUi/LVu2aMCAAWrYsKE8PT3VokULTZw4sULPwb59+xQTEyMvLy81atRIjz/+uM6dO1dsfWV5/lHLGKAWmz9/vpFkUlNTzaVLl4o8+vXrZ0JDQ522efLJJ83cuXNNcnKyWbt2rXn11VdNo0aNzCOPPOLULzo62jRs2NCEhYWZd955x6xevdrExcUZSWbGjBkmPDzcLFq0yHz66acmMjLSeHh4mO+++65IbUeOHHG0/e1vfzM2m80MHjzYLFu2zHz00UcmLi7OuLi4mM8//9wYY8zx48fNE088YSSZZcuWmZSUFJOSkmJycnKKfQ727Nljnn32WSPJzJ8/36SkpJiDBw8aY4xJSEgoMv/Q0FCTkJDgWF63bp2RZEJDQ82IESNMcnKyeeutt0y9evVM9+7dTe/evc3vfvc789lnn5lZs2YZFxcX88QTTzi237x5s7HZbOahhx4yn376qVm7dq2ZP3++iY+Pv+bxu7Lfzp07m2XLlpnly5eb2267zfj5+Zknn3zSDBo0yHz88cdm4cKFJjAw0Nx+++2msLDQsf3cuXPNzJkzzapVq8yGDRvMe++9ZyIiIkxYWJjJz8939Pv888+Ni4uL6dq1q1m2bJlZsmSJueuuu0xISIi5+j+Do0ePNm5ubmbSpEkmOTnZfPDBB6ZVq1YmMDDQZGZmljqfadOmOeb01FNPmdWrV5tXXnnFeHt7mzvuuMNR086dO40k8/bbbxc5lpLMnDlzSt1PWV5HxhizYcMGM2nSJPPhhx+aDRs2mOXLl5vBgwebunXrmn379jn65ebmmrZt2xpvb2/z3HPPmdWrV5ulS5eaCRMmmLVr1xpjjDly5IiRZJo1a2ZiYmLMihUrzIoVK0x4eLhp0KCBOXv2bKk179mzx3h5eZk2bdqYRYsWmZUrV5q+ffs6jsEv3yd/+tOfjM1mM48++qj5+OOPzbJly0xUVJTx9vY2e/bscfRLTk42bm5u5vbbbzcLFiwwa9euNe+++6556KGHyv0cZGZmmoCAAHPLLbeY+fPnm08//dQMGzbMUd+6devK/fyjdiEQoVa7EjpKe1wdCH7p8uXL5tKlS+b99983Li4u5syZM4510dHRRpLZvn27o+306dPGxcXF1K1b1yn8pKenG0nmL3/5S5HarvyHPi8vz/j5+ZkBAwYUqSEiIsLcfffdjraXXnqpyIdEWZ6Hbdu2ObWXJxBdXdfEiRONJDN+/Hin9sGDBxs/Pz/H8ssvv2wkXfMDsTiSTFBQkDl//ryjbcWKFUaSad++vVP4ee2114wk8/XXXxc7VmFhobl06ZL59ttvjSSzcuVKx7pOnTqZ4OBg8+OPPzracnNzjZ+fn1MgSklJMZLMn//8Z6exjx8/burWrWt+//vflzqfK4HoySefdGpfuHChkWT+/ve/O9qio6NN+/btnfr99re/Nb6+vubcuXMl7qM8r6OrFRQUmPz8fHPrrbc61fjcc88ZSWbNmjUlbnslEIWHh5uCggJH+9atW40ks2jRohK3NcaYBx980NStW9cpVBYUFJhWrVo5vdaPHTtmXF1dnUK3McacO3fOBAUFmSFDhjjaWrRoYVq0aOF0XK+lpOfg6aefNjabzaSnpzv17927t1Mgup7nHzUbX5nhpvD+++9r27ZtRR5du3Yt0jctLU0DBw5Uw4YN5eLiIjc3Nw0fPlyXL1/W/v37nfo2btxYHTp0cCz7+fkpICBA7du3V3BwsKO9devWkqRvv/22xBo3b96sM2fOKCEhQQUFBY5HYWGhYmJitG3bNuXl5V3vU1FhcXFxTstX5tS/f/8i7WfOnHF8bXbXXXdJkoYMGaJ//vOf+u6778q13+7du8vb27vIfmNjY53uACvuOT558qQef/xxNW3aVK6urnJzc1NoaKgk6ZtvvpEk5eXladu2bbrvvvvk6enp2NbHx0cDBgxwquXjjz+WzWbTb37zG6djFBQUpIiIiDLfaTRs2DCn5SFDhsjV1VXr1q1ztE2YMEHp6en66quvJEm5ubn629/+poSEBNWrV6/EscvzOiooKNALL7ygNm3ayN3dXa6urnJ3d9eBAwccz48k/etf/9Jtt92mXr16XXNu/fv3l4uLi2P59ttvl1T6a1+S1q1bp549eyowMNDR5uLiogcffNCp3+rVq1VQUKDhw4c7zc/T01PR0dGOY7B//34dOnRII0eOdDquVyvrc7Bu3Tq1bdtWERERTtsPHTrUabmmv49RcVxUjZtC69at1bFjxyLtdrtdx48fdywfO3ZM99xzj8LCwvT666+rWbNm8vT01NatWzV27Fj9+OOPTtv7+fkVGdPd3b1Iu7u7uyTpp59+KrHGrKwsSdIDDzxQYp8zZ844hYMbqaQ5lTbXevXq6d5779WKFSv0l7/8RcOHD9fFixfVtm1b/eEPf9DDDz9cJfuVpMLCQvXp00fff/+9pk6dqvDwcHl7e6uwsFCRkZGOY5mdna3CwkIFBQUV2ffVbVlZWTLGOH1o/9KvfvWra86nuHFdXV3VsGFDnT592tE2aNAgNWvWTHPmzFGXLl20YMEC5eXlaezYsaWOXZ7XUWJioubMmaOnn35a0dHRatCggerUqaNRo0Y5vdZPnTqlkJCQMs2tYcOGTsseHh6SVOS9c7XTp0+X+RhI/w3aV6tTp46jZklq0qRJqfst63Nw+vRpNW/evMz11dT3MSqOQARLWbFihfLy8rRs2TLHmQTp59vWq1qjRo0k/fz3giIjI4vtU9IHcU03aNAgDRo0SBcvXlRqaqpmzpypoUOHqlmzZoqKiqqSfe7evVs7d+7UggULlJCQ4Gi/+sLrBg0ayGazKTMzs8gYV7c1atRINptNX375peOD/peKaytOZmambrnlFsdyQUGBTp8+7RQm6tSpo7Fjx+qZZ57Rn//8Z7355pvq2bOnwsLCSh27PK+jv//97xo+fLheeOEFp/U//PCD6tev71j29/fXf/7znzLNraIaNmxY5mMgSR9++KHTe/Rq/v7+knTNusv6HJS3vpvxfWx1BCJYypWvYH75wWaM0dtvv13l++7SpYvq16+vvXv3aty4caX2Lev/ddc0Hh4eio6OVv369bV69WqlpaVVWSAq7lhK0rx585yWvb29dffdd2vZsmV66aWXHF+vnDt3Th999JFT37i4OCUlJem7777TkCFDKlzbwoULnb5q/ec//6mCggJ169bNqd+oUaM0ffp0DRs2TBkZGZo1a9Y1xy7P68hmsxV5fj755BN99913atmypaMtNjZWf/zjH7V27Vr16NGjDDMsv+7du2vVqlXKyspyBIbLly9r8eLFTv369u0rV1dXHTp0SPfff3+J4912221q0aKF3n33XSUmJpYYVsv6HHTv3l0vvviidu7c6fS12QcffOC0bXmef9QuBCJYSu/eveXu7q6HH35Yv//97/XTTz9p7ty5ys7OrvJ916tXT2+88YYSEhJ05swZPfDAAwoICNCpU6e0c+dOnTp1SnPnzpUkhYeHS5Jef/11JSQkyM3NTWFhYfLx8anyOsvrj3/8o/7zn/+oZ8+eatKkic6ePavXX39dbm5uio6OrrL9tmrVSi1atNDkyZNljJGfn58++ugjrVmzpkjf559/XjExMerdu7cmTZqky5cva9asWfL29taZM2cc/bp06aLHHntMjzzyiLZv3657771X3t7eOnHihDZt2qTw8HD99re/vWZty5Ytk6urq3r37q09e/Zo6tSpioiIKBKy6tevr+HDh2vu3LkKDQ0tck1TccrzOoqLi9OCBQvUqlUr3X777dqxY4deeumlIl8zTZw4UYsXL9agQYM0efJk3X333frxxx+1YcMGxcXFqXv37tes61qeffZZrVq1Sj169NAf//hHeXl5ac6cOUWut2nWrJmee+45/eEPf9Dhw4cVExOjBg0aKCsrS1u3bpW3t7dmzJgh6ec/oTFgwABFRkbqySefVEhIiI4dO6bVq1dr4cKF5X4O3n33XfXv31//9//+XwUGBmrhwoVF/uxAeZ5/1DLVfFE3cF1Kurvqiv79+xe5y+qjjz4yERERxtPT09xyyy3mqaeeMv/617+K3FobHR1t2rZtW2TM0NBQ079//yLtkszYsWOL1Hb1nWIbNmww/fv3N35+fsbNzc3ccsstpn///mbJkiVO/aZMmWKCg4NNnTp1itRW1uehPHeZXb3/ksa8cifVqVOnjDHGfPzxxyY2Ntbccsstxt3d3QQEBJh+/fqZL7/8ssR6r7j6OTPmv3czvfTSS07txdW5d+9e07t3b+Pj42MaNGhgfv3rX5tjx44ZSWbatGlO269atcrcfvvtxt3d3YSEhJikpCTHXK727rvvmk6dOhlvb29Tt25d06JFCzN8+HCnOw6Lc2W8HTt2mAEDBph69eoZHx8f8/DDD5usrKxit1m/fr2RZJKSkkod+2pleR1lZ2ebkSNHmoCAAOPl5WW6du1qvvzySxMdHW2io6OdxsvOzjYTJkwwISEhxs3NzQQEBJj+/fs7bk0v6bgYY4p9vovz1VdfOf5ERVBQkHnqqafMX//612LfJytWrDDdu3c3vr6+xsPDw4SGhpoHHnigyG3tKSkpJjY21tjtduPh4WFatGjhdPdYeZ6DK68nT09P4+fnZ0aOHGlWrlxZ7PuvrO9j1B42Y4y5sREMAHDFpEmTNHfuXB0/frzIBcsAbhy+MgOAapCamqr9+/frzTff1JgxYwhDQDXjDBEAVAObzSYvLy/169dP8+fPL/VvDwGoepwhAoBqwP+LAjULf6kaAABYHoEIAABYHoEIAABYHtcQlVFhYaG+//57+fj4OP3gJAAAqLmMMTp37pyCg4Mdv4VXHAJRGX3//fdq2rRpdZcBAAAq4Pjx46X+GDCBqIyu/GTC8ePH5evrW83VAACAssjNzVXTpk2v+dNHBKIyuvI1ma+vL4EIAIBa5lqXu3BRNQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDwCEQAAsDzX6i6gtmk3bbXqeHhVdxkAANw0jib1r+4SOEMEAABAIAIAAJbHV2bltHtGX/n6+lZ3GQAAoBJxhggAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFgegQgAAFhetQaimTNn6q677pKPj48CAgI0ePBgZWRkOPUxxmj69OkKDg5W3bp11a1bN+3Zs8epz8WLF/XEE0+oUaNG8vb21sCBA/Wf//zHqU92drbi4+Nlt9tlt9sVHx+vs2fPVvUUAQBALVCtgWjDhg0aO3asUlNTtWbNGhUUFKhPnz7Ky8tz9HnxxRf1yiuvaPbs2dq2bZuCgoLUu3dvnTt3ztFn4sSJWr58uf7xj39o06ZNOn/+vOLi4nT58mVHn6FDhyo9PV3JyclKTk5Wenq64uPjb+h8AQBADWVqkJMnTxpJZsOGDcYYYwoLC01QUJBJSkpy9Pnpp5+M3W43b731ljHGmLNnzxo3Nzfzj3/8w9Hnu+++M3Xq1DHJycnGGGP27t1rJJnU1FRHn5SUFCPJ7Nu3r0y15eTkGEkmJyfnuucJAABujLJ+frtWaxq7Sk5OjiTJz89PknTkyBFlZmaqT58+jj4eHh6Kjo7W5s2bNWbMGO3YsUOXLl1y6hMcHKx27dpp8+bN6tu3r1JSUmS329WpUydHn8jISNntdm3evFlhYWFlrrHdtNWq4+F1vVMFgFrraFL/6i4BqHQ1JhAZY5SYmKiuXbuqXbt2kqTMzExJUmBgoFPfwMBAffvtt44+7u7uatCgQZE+V7bPzMxUQEBAkX0GBAQ4+lzt4sWLunjxomM5Nze3gjMDAAA1XY25y2zcuHH6+uuvtWjRoiLrbDab07Ixpkjb1a7uU1z/0saZOXOm4wJsu92upk2blmUaAACgFqoRZ4ieeOIJrVq1Shs3blSTJk0c7UFBQZJ+PsPTuHFjR/vJkycdZ42CgoKUn5+v7Oxsp7NEJ0+eVOfOnR19srKyiuz31KlTRc4+XTFlyhQlJiY6lnNzc9W0aVPtntFXvr6+1zFbAABQ01TrGSJjjMaNG6dly5Zp7dq1at68udP65s2bKygoSGvWrHG05efna8OGDY6w06FDB7m5uTn1OXHihHbv3u3oExUVpZycHG3dutXRZ8uWLcrJyXH0uZqHh4d8fX2dHgAA4OZUrWeIxo4dqw8++EArV66Uj4+P43oeu92uunXrymazaeLEiXrhhRd066236tZbb9ULL7wgLy8vDR061NF35MiRmjRpkho2bCg/Pz/97ne/U3h4uHr16iVJat26tWJiYjR69GjNmzdPkvTYY48pLi6uXBdUAwCAm1O1BqK5c+dKkrp16+bUPn/+fI0YMUKS9Pvf/14//vij/ud//kfZ2dnq1KmTPvvsM/n4+Dj6v/rqq3J1ddWQIUP0448/qmfPnlqwYIFcXFwcfRYuXKjx48c77kYbOHCgZs+eXbUTBAAAtYLNGGOqu4jaIDc3V3a7XTk5OXx9BgBALVHWz+8ac5cZAABAdSEQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAy3Ot7gJqm3bTVquOh1d1l4GbxNGk/tVdAgBAnCECAADgDFF57Z7RV76+vtVdBgAAqEScIQIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZHIAIAAJZXrYFo48aNGjBggIKDg2Wz2bRixQqn9VlZWRoxYoSCg4Pl5eWlmJgYHThwwKlPZmam4uPjFRQUJG9vb91555368MMPnfpkZ2crPj5edrtddrtd8fHxOnv2bBXPDgAA1BbVGojy8vIUERGh2bNnF1lnjNHgwYN1+PBhrVy5UmlpaQoNDVWvXr2Ul5fn6BcfH6+MjAytWrVKu3bt0n333acHH3xQaWlpjj5Dhw5Venq6kpOTlZycrPT0dMXHx9+QOQIAgFrA1BCSzPLlyx3LGRkZRpLZvXu3o62goMD4+fmZt99+29Hm7e1t3n//faex/Pz8zP/+7/8aY4zZu3evkWRSU1Md61NSUowks2/fvjLXl5OTYySZnJyc8k4NAABUk7J+frtWZxgrzcWLFyVJnp6ejjYXFxe5u7tr06ZNGjVqlCSpa9euWrx4sfr376/69evrn//8py5evKhu3bpJklJSUmS329WpUyfHOJGRkbLb7dq8ebPCwsJK3P+VGiQpNzdXktRu2mrV8fCq1LnebI4m9a/uEgAAKJcae1F1q1atFBoaqilTpig7O1v5+flKSkpSZmamTpw44ei3ePFiFRQUqGHDhvLw8NCYMWO0fPlytWjRQtLP1xgFBAQUGT8gIECZmZkl7n/mzJmOa47sdruaNm1a+ZMEAAA1Qo0NRG5ublq6dKn2798vPz8/eXl5af369YqNjZWLi4uj37PPPqvs7Gx9/vnn2r59uxITE/XrX/9au3btcvSx2WxFxjfGFNt+xZQpU5STk+N4HD9+vHInCAAAaowa+5WZJHXo0EHp6enKyclRfn6+/P391alTJ3Xs2FGSdOjQIc2ePVu7d+9W27ZtJUkRERH68ssvNWfOHL311lsKCgpSVlZWkbFPnTqlwMDAEvft4eEhDw+PIu27Z/SVr69vJc0QAADUBDX2DNEv2e12+fv768CBA9q+fbsGDRokSbpw4YIkqU4d52m4uLiosLBQkhQVFaWcnBxt3brVsX7Lli3KyclR586db9AMAABATVatZ4jOnz+vgwcPOpaPHDmi9PR0+fn5KSQkREuWLJG/v79CQkK0a9cuTZgwQYMHD1afPn0k/XydUcuWLTVmzBi9/PLLatiwoVasWKE1a9bo448/liS1bt1aMTExGj16tObNmydJeuyxxxQXF1fiBdUAAMBaqjUQbd++Xd27d3csJyYmSpISEhK0YMECnThxQomJicrKylLjxo01fPhwTZ061dHfzc1Nn376qSZPnqwBAwbo/Pnzatmypd577z3169fP0W/hwoUaP368I0gNHDiw2L99BAAArMlmjDHVXURtkJubK7vdrpycHK4hAgCglijr53etuIYIAACgKhGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5RGIAACA5blWdwG1Tbtpq1XHw6u6y7C8o0n9q7sEAMBNhDNEAADA8ghEAADA8vjKrJx2z+grX1/f6i4DAABUIs4QAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAy7uuQJSfn6+MjAwVFBRUVj0AAAA3XIUC0YULFzRy5Eh5eXmpbdu2OnbsmCRp/PjxSkpKqtQCAQAAqlqFAtGUKVO0c+dOrV+/Xp6eno72Xr16afHixZVWHAAAwI1Qod8yW7FihRYvXqzIyEjZbDZHe5s2bXTo0KFKKw4AAOBGqNAZolOnTikgIKBIe15enlNAAgAAqA0qFIjuuusuffLJJ47lKyHo7bffVlRUVOVUBgAAcINU6CuzmTNnKiYmRnv37lVBQYFef/117dmzRykpKdqwYUNl1wgAAFClKnSGqHPnzvrqq6904cIFtWjRQp999pkCAwOVkpKiDh06VHaNAAAAVcpmjDHVXURtkJubK7vdrpycHPn6+lZ3OQAAoAzK+vld5q/McnNzy7xzAgMAAKhNyhyI6tevX+Y7yC5fvlzhggAAAG60MgeidevWOf599OhRTZ48WSNGjHDcVZaSkqL33ntPM2fOrPwqAQAAqlCFriHq2bOnRo0apYcfftip/YMPPtBf//pXrV+/vrLqqzG4hggAgNqnrJ/fFbrLLCUlRR07dizS3rFjR23durUiQwIAAFSbCgWipk2b6q233irSPm/ePDVt2vS6iwIAALiRKvSHGV999VXdf//9Wr16tSIjIyVJqampOnTokJYuXVqpBQIAAFS1Cp0h6tevnw4cOKCBAwfqzJkzOn36tAYNGqT9+/erX79+lV0jAABAleIPM5YRF1UDAFD7VPofZizOhQsXdOzYMeXn5zu133777dczLAAAwA1Voa/MTp06pbi4OPn4+Kht27a64447nB5ltXHjRg0YMEDBwcGy2WxasWKF0/qsrCyNGDFCwcHB8vLyUkxMjA4cOFBknJSUFPXo0UPe3t6qX7++unXrph9//NGxPjs7W/Hx8bLb7bLb7YqPj9fZs2crMnUAAHATqlAgmjhxorKzs5Wamqq6desqOTlZ7733nm699VatWrWqzOPk5eUpIiJCs2fPLrLOGKPBgwfr8OHDWrlypdLS0hQaGqpevXopLy/P0S8lJUUxMTHq06ePtm7dqm3btmncuHGqU+e/Uxs6dKjS09OVnJys5ORkpaenKz4+viJTBwAANyNTAUFBQWbLli3GGGN8fHxMRkaGMcaYlStXmi5dulRkSCPJLF++3LGckZFhJJndu3c72goKCoyfn595++23HW2dOnUyzz77bInj7t2710gyqampjraUlBQjyezbt6/M9eXk5BhJJicnp8zbAACA6lXWz+8KXUOUl5engIAASZKfn59OnTql2267TeHh4fr3v/9dKUHt4sWLkiRPT09Hm4uLi9zd3bVp0yaNGjVKJ0+e1JYtWzRs2DB17txZhw4dUqtWrfSnP/1JXbt2lfTzGSS73a5OnTo5xomMjJTdbtfmzZsVFhZW4v6v1CD998dt201brToeXpUyx5vV0aT+1V0CAADlUqGvzMLCwpSRkSFJat++vebNm6fvvvtOb731lho3blwphbVq1UqhoaGaMmWKsrOzlZ+fr6SkJGVmZurEiROSpMOHD0uSpk+frtGjRys5OVl33nmnevbs6bjWKDMz0xHefikgIECZmZkl7n/mzJmOa47sdjt/cBIAgJtYhc4QTZw40RFKpk2bpr59+2rhwoVyd3fXggULKqUwNzc3LV26VCNHjpSfn59cXFzUq1cvxcbGOvoUFhZKksaMGaNHHnlEknTHHXfoiy++0Lvvvuv4oVmbzVZkfGNMse1XTJkyRYmJiY7l3NxcNW3aVLtn9OW2ewAAbjIVCkTDhg1z/PuOO+7Q0aNHtW/fPoWEhKhRo0aVVlyHDh2Unp6unJwc5efny9/fX506dXL8jtqVs1Ft2rRx2q5169Y6duyYJCkoKEhZWVlFxj516pQCAwNL3LeHh4c8PDwqayoAAKAGq9BXZlfz8vLSnXfeWalh6Jfsdrv8/f114MABbd++XYMGDZIkNWvWTMHBwY6v767Yv3+/QkNDJUlRUVHKyclx+tHZLVu2KCcnR507d66SegEAQO1SoTNEDzzwgDp27KjJkyc7tb/00kvaunWrlixZUqZxzp8/r4MHDzqWjxw5ovT0dPn5+SkkJERLliyRv7+/QkJCtGvXLk2YMEGDBw9Wnz59JP38VdhTTz2ladOmKSIiQu3bt9d7772nffv26cMPP5T089mimJgYjR49WvPmzZMkPfbYY4qLiyvxgmoAAGAxFbmFrVGjRubrr78u0v7111+bgICAMo+zbt06I6nIIyEhwRhjzOuvv26aNGli3NzcTEhIiHn22WfNxYsXi4wzc+ZM06RJE+Pl5WWioqLMl19+6bT+9OnTZtiwYcbHx8f4+PiYYcOGmezs7HLNmdvuAQCofcr6+V2h3zKrW7eu0tPTi5xh2bdvn+644w6nvxJ9s+C3zAAAqH3K+vldoWuI2rVrp8WLFxdp/8c//lHkAmcAAICarkLXEE2dOlX333+/Dh06pB49ekiSvvjiCy1atKjM1w8BAADUFBUKRAMHDtSKFSv0wgsv6MMPP1TdunV1++236/PPP1d0dHRl1wgAAFClKnQNkRVxDREAALVPlV5DJElnz57V//7v/+qZZ57RmTNnJEn//ve/9d1331V0SAAAgGpRoa/Mvv76a/Xq1Ut2u11Hjx7VqFGj5Ofnp+XLl+vbb7/V+++/X9l1AgAAVJkKnSFKTEzUiBEjdODAAadfo4+NjdXGjRsrrTgAAIAboUKBaNu2bRozZkyR9ltuuaXUX5AHAACoiSoUiDw9PZWbm1ukPSMjQ/7+/tddFAAAwI1UoUA0aNAgPffcc7p06ZKkn39T7NixY5o8ebLuv//+Si0QAACgqlUoEL388ss6deqUAgIC9OOPPyo6OlotW7aUj4+P/vSnP1V2jQAAAFWqQneZ+fr6atOmTVq3bp127NihwsJC3XnnnerVq1dl1wcAAFDlyh2ICgsLtWDBAi1btkxHjx6VzWZT8+bNFRQUJGOMbDZbVdQJAABQZcr1lZkxRgMHDtSoUaP03XffKTw8XG3bttW3336rESNG6P/8n/9TVXUCAABUmXKdIVqwYIE2btyoL774Qt27d3dat3btWg0ePFjvv/++hg8fXqlFAgAAVKVynSFatGiRnnnmmSJhSJJ69OihyZMna+HChZVWHAAAwI1QrkD09ddfKyYmpsT1sbGx2rlz53UXBQAAcCOVKxCdOXNGgYGBJa4PDAxUdnb2dRcFAABwI5UrEF2+fFmuriVfduTi4qKCgoLrLgoAAOBGKtdF1cYYjRgxQh4eHsWuv3jxYqUUBQAAcCOVKxAlJCRcsw93mAEAgNqmXIFo/vz5VVUHAABAtanQb5kBAADcTAhEAADA8ghEAADA8ir0a/dW1m7aatXx8Kqy8Y8m9a+ysQEAQPE4QwQAACyPQAQAACyPr8zKafeMvvL19a3uMgAAQCXiDBEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALA8AhEAALC8ag1EGzdu1IABAxQcHCybzaYVK1Y4rc/KytKIESMUHBwsLy8vxcTE6MCBA8WOZYxRbGxsseNkZ2crPj5edrtddrtd8fHxOnv2bNVMCgAA1DrVGojy8vIUERGh2bNnF1lnjNHgwYN1+PBhrVy5UmlpaQoNDVWvXr2Ul5dXpP9rr70mm81W7H6GDh2q9PR0JScnKzk5Wenp6YqPj6/0+QAAgNrJtTp3Hhsbq9jY2GLXHThwQKmpqdq9e7fatm0rSXrzzTcVEBCgRYsWadSoUY6+O3fu1CuvvKJt27apcePGTuN88803Sk5OVmpqqjp16iRJevvttxUVFaWMjAyFhYVV0ewAAEBtUa2BqDQXL16UJHl6ejraXFxc5O7urk2bNjkC0YULF/Twww9r9uzZCgoKKjJOSkqK7Ha7IwxJUmRkpOx2uzZv3lzuQNRu2mrV8fCqyJTK5GhS/yobGwAAFK/GXlTdqlUrhYaGasqUKcrOzlZ+fr6SkpKUmZmpEydOOPo9+eST6ty5swYNGlTsOJmZmQoICCjSHhAQoMzMzBL3f/HiReXm5jo9AADAzanGBiI3NzctXbpU+/fvl5+fn7y8vLR+/XrFxsbKxcVFkrRq1SqtXbtWr732WqljFXdtkTGmxGuOJGnmzJmOi7DtdruaNm16XfMBAAA1V439ykySOnTooPT0dOXk5Cg/P1/+/v7q1KmTOnbsKElau3atDh06pPr16zttd//99+uee+7R+vXrFRQUpKysrCJjnzp1SoGBgSXue8qUKUpMTHQs5+bmqmnTpto9o698fX0rZ4IAAKBGqNGB6Aq73S7p5wutt2/frueff16SNHnyZKeLqyUpPDxcr776qgYMGCBJioqKUk5OjrZu3aq7775bkrRlyxbl5OSoc+fOJe7Tw8NDHh4eVTEdAABQw1RrIDp//rwOHjzoWD5y5IjS09Pl5+enkJAQLVmyRP7+/goJCdGuXbs0YcIEDR48WH369JEkBQUFFXshdUhIiJo3by5Jat26tWJiYjR69GjNmzdPkvTYY48pLi6OO8wAAICkag5E27dvV/fu3R3LV76iSkhI0IIFC3TixAklJiYqKytLjRs31vDhwzV16tRy72fhwoUaP368I0gNHDiw2L99BAAArMlmjDHVXURtkJubK7vdrpycHK4hAgCglijr53eNvcsMAADgRiEQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAyyMQAQAAy3Ot7gJqm3bTVquOh1eVjX80qX+VjQ0AAIrHGSIAAGB5nCEqp90z+srX17e6ywAAAJWIM0QAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyCEQAAMDyqjUQbdy4UQMGDFBwcLBsNptWrFjhtD4rK0sjRoxQcHCwvLy8FBMTowMHDjjWnzlzRk888YTCwsLk5eWlkJAQjR8/Xjk5OU7jZGdnKz4+Xna7XXa7XfHx8Tp79uwNmCEAAKgNqjUQ5eXlKSIiQrNnzy6yzhijwYMH6/Dhw1q5cqXS0tIUGhqqXr16KS8vT5L0/fff6/vvv9fLL7+sXbt2acGCBUpOTtbIkSOdxho6dKjS09OVnJys5ORkpaenKz4+/obMEQAA1Hw2Y4yp7iIkyWazafny5Ro8eLAkaf/+/QoLC9Pu3bvVtm1bSdLly5cVEBCgWbNmadSoUcWOs2TJEv3mN79RXl6eXF1d9c0336hNmzZKTU1Vp06dJEmpqamKiorSvn37FBYWVqb6cnNzZbfblZOTI19f3+ufMAAAqHJl/fx2vYE1lcvFixclSZ6eno42FxcXubu7a9OmTSUGoisTdnX9eWopKSmy2+2OMCRJkZGRstvt2rx5c4mB6OLFi44apJ+fUElqN2216nh4Xd/kqtnRpP7VXQIAADVKjb2oulWrVgoNDdWUKVOUnZ2t/Px8JSUlKTMzUydOnCh2m9OnT+v555/XmDFjHG2ZmZkKCAgo0jcgIECZmZkl7n/mzJmOa47sdruaNm16/ZMCAAA1Uo0NRG5ublq6dKn2798vPz8/eXl5af369YqNjZWLi0uR/rm5uerfv7/atGmjadOmOa2z2WxF+htjim2/YsqUKcrJyXE8jh8/fv2TAgAANVKN/cpMkjp06KD09HTl5OQoPz9f/v7+6tSpkzp27OjU79y5c4qJiVG9evW0fPlyubm5OdYFBQUpKyuryNinTp1SYGBgifv28PCQh4dHkfbdM/pyDREAADeZGnuG6Jfsdrv8/f114MABbd++XYMGDXKsy83NVZ8+feTu7q5Vq1Y5XXMkSVFRUcrJydHWrVsdbVu2bFFOTo46d+58w+YAAABqrmo9Q3T+/HkdPHjQsXzkyBGlp6fLz89PISEhWrJkifz9/RUSEqJdu3ZpwoQJGjx4sPr06SPp5zNDffr00YULF/T3v/9dubm5jouf/f395eLiotatWysmJkajR4/WvHnzJEmPPfaY4uLiynyHGQAAuLlVayDavn27unfv7lhOTEyUJCUkJGjBggU6ceKEEhMTlZWVpcaNG2v48OGaOnWqo/+OHTu0ZcsWSVLLli2dxj5y5IiaNWsmSVq4cKHGjx/vCFIDBw4s9m8fAQAAa6oxf4eopuPvEAEAUPuU9fO7VlxDBAAAUJUIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPIIRAAAwPJcq7uA2qbdtNWq4+FV3WVY3tGk/tVdAgDgJsIZIgAAYHkEIgAAYHl8ZVZOu2f0la+vb3WXAQAAKhFniAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOURiAAAgOW5VncBtYUxRpKUm5tbzZUAAICyuvK5feVzvCQEojI6ffq0JKlp06bVXAkAACivc+fOyW63l7ieQFRGfn5+kqRjx46V+oTebHJzc9W0aVMdP35cvr6+1V3ODcO8mbdVWHXuzNs68zbG6Ny5cwoODi61H4GojOrU+flyK7vdbpkX0S/5+voybwth3tZj1bkzb2soy4kMLqoGAACWRyACAACWRyAqIw8PD02bNk0eHh7VXcoNxbyZtxVYdd6SdefOvK0177KwmWvdhwYAAHCT4wwRAACwPAIRAACwPAIRAACwPAIRAACwPALRL7z55ptq3ry5PD091aFDB3355Zel9t+wYYM6dOggT09P/epXv9Jbb711gyqtHDNnztRdd90lHx8fBQQEaPDgwcrIyCh1m/Xr18tmsxV57Nu37wZVff2mT59epP6goKBSt6ntx1qSmjVrVuyxGzt2bLH9a+ux3rhxowYMGKDg4GDZbDatWLHCab0xRtOnT1dwcLDq1q2rbt26ac+ePdccd+nSpWrTpo08PDzUpk0bLV++vIpmUHGlzf3SpUt6+umnFR4eLm9vbwUHB2v48OH6/vvvSx1zwYIFxb4OfvrppyqeTdld65iPGDGiSP2RkZHXHLemH/Nrzbu442az2fTSSy+VOGZtON5VhUD0/y1evFgTJ07UH/7wB6Wlpemee+5RbGysjh07Vmz/I0eOqF+/frrnnnuUlpamZ555RuPHj9fSpUtvcOUVt2HDBo0dO1apqalas2aNCgoK1KdPH+Xl5V1z24yMDJ04ccLxuPXWW29AxZWnbdu2TvXv2rWrxL43w7GWpG3btjnNec2aNZKkX//616VuV9uOdV5eniIiIjR79uxi17/44ot65ZVXNHv2bG3btk1BQUHq3bu3zp07V+KYKSkpevDBBxUfH6+dO3cqPj5eQ4YM0ZYtW6pqGhVS2twvXLigf//735o6dar+/e9/a9myZdq/f78GDhx4zXF9fX2dXgMnTpyQp6dnVUyhQq51zCUpJibGqf5PP/201DFrwzG/1ryvPmbvvvuubDab7r///lLHrenHu8oYGGOMufvuu83jjz/u1NaqVSszefLkYvv//ve/N61atXJqGzNmjImMjKyyGqvayZMnjSSzYcOGEvusW7fOSDLZ2dk3rrBKNm3aNBMREVHm/jfjsTbGmAkTJpgWLVqYwsLCYtffDMdaklm+fLljubCw0AQFBZmkpCRH208//WTsdrt56623ShxnyJAhJiYmxqmtb9++5qGHHqr0mivL1XMvztatW40k8+2335bYZ/78+cZut1ducVWouHknJCSYQYMGlWuc2nbMy3K8Bw0aZHr06FFqn9p2vCsTZ4gk5efna8eOHerTp49Te58+fbR58+Zit0lJSSnSv2/fvtq+fbsuXbpUZbVWpZycHEn//SHb0txxxx1q3LixevbsqXXr1lV1aZXuwIEDCg4OVvPmzfXQQw/p8OHDJfa9GY91fn6+/v73v+vRRx+VzWYrtW9tP9a/dOTIEWVmZjodTw8PD0VHR5f4XpdKfg2Utk1tkJOTI5vNpvr165fa7/z58woNDVWTJk0UFxentLS0G1NgJVq/fr0CAgJ02223afTo0Tp58mSp/W+2Y56VlaVPPvlEI0eOvGbfm+F4VwSBSNIPP/ygy5cvKzAw0Kk9MDBQmZmZxW6TmZlZbP+CggL98MMPVVZrVTHGKDExUV27dlW7du1K7Ne4cWP99a9/1dKlS7Vs2TKFhYWpZ8+e2rhx4w2s9vp06tRJ77//vlavXq23335bmZmZ6ty5s06fPl1s/5vtWEvSihUrdPbsWY0YMaLEPjfDsb7alfdzed7rV7Yr7zY13U8//aTJkydr6NChpf7IZ6tWrbRgwQKtWrVKixYtkqenp7p06aIDBw7cwGqvT2xsrBYuXKi1a9fqz3/+s7Zt26YePXro4sWLJW5zsx3z9957Tz4+PrrvvvtK7XczHO+K4tfuf+Hq/1M2xpT6f8/F9S+uvTYYN26cvv76a23atKnUfmFhYQoLC3MsR0VF6fjx43r55Zd17733VnWZlSI2Ntbx7/DwcEVFRalFixZ67733lJiYWOw2N9OxlqR33nlHsbGxCg4OLrHPzXCsS1Le93pFt6mpLl26pIceekiFhYV68803S+0bGRnpdAFyly5ddOedd+qNN97QX/7yl6outVI8+OCDjn+3a9dOHTt2VGhoqD755JNSA8LNdMzfffddDRs27JrXAt0Mx7uiOEMkqVGjRnJxcSmS/E+ePFnk/xCuCAoKKra/q6urGjZsWGW1VoUnnnhCq1at0rp169SkSZNybx8ZGVmr/+/B29tb4eHhJc7hZjrWkvTtt9/q888/16hRo8q9bW0/1lfuJizPe/3KduXdpqa6dOmShgwZoiNHjmjNmjWlnh0qTp06dXTXXXfV6tdB48aNFRoaWuocbqZj/uWXXyojI6NC7/mb4XiXFYFIkru7uzp06OC46+aKNWvWqHPnzsVuExUVVaT/Z599po4dO8rNza3Kaq1MxhiNGzdOy5Yt09q1a9W8efMKjZOWlqbGjRtXcnU3zsWLF/XNN9+UOIeb4Vj/0vz58xUQEKD+/fuXe9vafqybN2+uoKAgp+OZn5+vDRs2lPhel0p+DZS2TU10JQwdOHBAn3/+eYUCvTFG6enptfp1cPr0aR0/frzUOdwsx1z6+Yxwhw4dFBERUe5tb4bjXWbVdTV3TfOPf/zDuLm5mXfeecfs3bvXTJw40Xh7e5ujR48aY4yZPHmyiY+Pd/Q/fPiw8fLyMk8++aTZu3eveeedd4ybm5v58MMPq2sK5fbb3/7W2O12s379enPixAnH48KFC44+V8/71VdfNcuXLzf79+83u3fvNpMnTzaSzNKlS6tjChUyadIks379enP48GGTmppq4uLijI+Pz019rK+4fPmyCQkJMU8//XSRdTfLsT537pxJS0szaWlpRpJ55ZVXTFpamuNOqqSkJGO3282yZcvMrl27zMMPP2waN25scnNzHWPEx8c73WH61VdfGRcXF5OUlGS++eYbk5SUZFxdXU1qauoNn19pSpv7pUuXzMCBA02TJk1Menq603v+4sWLjjGunvv06dNNcnKyOXTokElLSzOPPPKIcXV1NVu2bKmOKRartHmfO3fOTJo0yWzevNkcOXLErFu3zkRFRZlbbrml1h/za73WjTEmJyfHeHl5mblz5xY7Rm083lWFQPQLc+bMMaGhocbd3d3ceeedTrefJyQkmOjoaKf+69evN3fccYdxd3c3zZo1K/EFV1NJKvYxf/58R5+r5z1r1izTokUL4+npaRo0aGC6du1qPvnkkxtf/HV48MEHTePGjY2bm5sJDg429913n9mzZ49j/c14rK9YvXq1kWQyMjKKrLtZjvWVPxdw9SMhIcEY8/Ot99OmTTNBQUHGw8PD3HvvvWbXrl1OY0RHRzv6X7FkyRITFhZm3NzcTKtWrWpkMCxt7keOHCnxPb9u3TrHGFfPfeLEiSYkJMS4u7sbf39/06dPH7N58+YbP7lSlDbvCxcumD59+hh/f3/j5uZmQkJCTEJCgjl27JjTGLXxmF/rtW6MMfPmzTN169Y1Z8+eLXaM2ni8q4rNmP9/dSgAAIBFcQ0RAACwPAIRAACwPAIRAACwPAIRAACwPAIRAACwPAIRAACwPAIRAACwPAIRgFrh6NGjstlsSk9Pr+5SHPbt26fIyEh5enqqffv25d6+W7dumjhxomO5WbNmeu211yqtPgBlRyACUCYjRoyQzWZTUlKSU/uKFStq7S+AX69p06bJ29tbGRkZ+uKLL4rtc+V5u/px8OBBLVu2TM8///wNrhpAcQhEAMrM09NTs2bNUnZ2dnWXUmny8/MrvO2hQ4fUtWtXhYaGlvpDqTExMTpx4oTTo3nz5vLz85OPj0+F9w+g8hCIAJRZr169FBQUpJkzZ5bYZ/r06UW+PnrttdfUrFkzx/KIESM0ePBgvfDCCwoMDFT9+vU1Y8YMFRQU6KmnnpKfn5+aNGmid999t8j4+/btU+fOneXp6am2bdtq/fr1Tuv37t2rfv36qV69egoMDFR8fLx++OEHx/pu3bpp3LhxSkxMVKNGjdS7d+9i51FYWKjnnntOTZo0kYeHh9q3b6/k5GTHepvNph07dui5556TzWbT9OnTS3xOPDw8FBQU5PRwcXEp8pXZ1Ww2m+bNm6e4uDh5eXmpdevWSklJ0cGDB9WtWzd5e3srKipKhw4dcmyzc+dOde/eXT4+PvL19VWHDh20ffv2EvcB4GcEIgBl5uLiohdeeEFvvPGG/vOf/1zXWGvXrtX333+vjRs36pVXXtH06dMVFxenBg0aaMuWLXr88cf1+OOP6/jx407bPfXUU5o0aZLS0tLUuXNnDRw4UKdPn5YknThxQtHR0Wrfvr22b9+u5ORkZWVlaciQIU5jvPfee3J1ddVXX32lefPmFVvf66+/rj//+c96+eWX9fXXX6tv374aOHCgDhw44NhX27ZtNWnSJJ04cUK/+93vruv5KMnzzz+v4cOHKz09Xa1atdLQoUM1ZswYTZkyxRF0xo0b5+g/bNgwNWnSRNu2bdOOHTs0efJkubm5VUltwE2lun9dFkDtkJCQYAYNGmSMMSYyMtI8+uijxhhjli9fbn75n5Jp06aZiIgIp21fffVVExoa6jRWaGiouXz5sqMtLCzM3HPPPY7lgoIC4+3tbRYtWmSMMY5fa09KSnL0uXTpkmnSpImZNWuWMcaYqVOnmj59+jjt+/jx40aSycjIMMb8/Ove7du3v+Z8g4ODzZ/+9Centrvuusv8z//8j2M5IiLCTJs2rdRxEhISjIuLi/H29nY8HnjgAUctEyZMcPQNDQ01r776qmNZknn22WcdyykpKUaSeeeddxxtixYtMp6eno5lHx8fs2DBgmvOD4Az12pNYwBqpVmzZqlHjx6aNGlShcdo27at6tT570nqwMBAtWvXzrHs4uKihg0b6uTJk07bRUVFOf7t6uqqjh076ptvvpEk7dixQ+vWrVO9evWK7O/QoUO67bbbJEkdO3Ystbbc3Fx9//336tKli1N7ly5dtHPnzjLO8L+6d++uuXPnOpa9vb3LvO3tt9/u+HdgYKAkKTw83Kntp59+Um5urnx9fZWYmKhRo0bpb3/7m3r16qVf//rXatGiRblrBqyGr8wAlNu9996rvn376plnnimyrk6dOjLGOLVdunSpSL+rv8ax2WzFthUWFl6znit3uRUWFmrAgAFKT093ehw4cED33nuvo39ZA8nVd88ZYyp0R523t7datmzpeDRu3LjM2/7yObmy7+LarjxP06dP1549e9S/f3+tXbtWbdq00fLly8tdM2A1BCIAFZKUlKSPPvpImzdvdmr39/dXZmamUyiqzL8dlJqa6vh3QUGBduzYoVatWkmS7rzzTu3Zs0fNmjVzCiAtW7Ys11kZX19fBQcHa9OmTU7tmzdvVuvWrStnIlXotttu05NPPqnPPvtM9913n+bPn1/dJQE1HoEIQIWEh4dr2LBheuONN5zau3XrplOnTunFF1/UoUOHNGfOHP3rX/+qtP3OmTNHy5cv1759+zR27FhlZ2fr0UcflSSNHTtWZ86c0cMPP6ytW7fq8OHD+uyzz/Too4/q8uXL5drPU089pVmzZmnx4sXKyMjQ5MmTlZ6ergkTJlTaXCrbjz/+qHHjxmn9+vX69ttv9dVXX2nbtm21IsQB1Y1ABKDCnn/++SJfj7Vu3Vpvvvmm5syZo4iICG3durVS78BKSkrSrFmzFBERoS+//FIrV65Uo0aNJEnBwcH66quvdPnyZfXt21ft2rXThAkTZLfbna5XKovx48dr0qRJmjRpksLDw5WcnKxVq1bp1ltvrbS5VDYXFxedPn1aw4cP12233aYhQ4YoNjZWM2bMqO7SgBrPZq7+rxkAAIDFcIYIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABYHoEIAABY3v8De1Xm74w3oVkAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "hamlet_df = pd.DataFrame(movies[movies['title'].str.contains('Hamlet', case=False)].groupby('decade')['title'].count())\n", + "plt.barh(y=hamlet_df.index, width=hamlet_df['title'])\n", + "plt.title('Hamlet films made by each decade')\n", + "plt.xlabel('Number of Films')\n", + "plt.ylabel('Decade')\n", + "plt.show()" + ] }, { "cell_type": "markdown", @@ -2515,17 +2539,43 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 71, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "11823" + ] + }, + "execution_count": 71, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(cast[(cast['year'] >= 1960) & (cast['year'] <= 1969) & (cast['n'] == 1.0) & (cast['type'].isin(['actor', 'actress']))])" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 72, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "26344" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(cast[(cast['year'] >= 2000) & (cast['year'] <= 2009) & (cast['n'] == 1.0) & (cast['type'].isin(['actor', 'actress']))])" + ] }, { "cell_type": "markdown", @@ -2536,7 +2586,7 @@ }, { "cell_type": "code", - "execution_count": 120, + "execution_count": 82, "metadata": {}, "outputs": [ { @@ -2560,156 +2610,94 @@ " \n", " \n", " \n", + " \n", + " character\n", + " \n", + " \n", " title\n", " year\n", - " name\n", - " type\n", - " character\n", - " n\n", + " \n", " \n", " \n", " \n", " \n", - " 1723645\n", - " The Muppet Movie\n", - " 1979\n", - " Frank Oz\n", - " actor\n", - " Miss Piggy\n", - " 2.0\n", + " The Muppet Movie\n", + " 1979\n", + " 8\n", " \n", " \n", - " 1723652\n", - " The Muppet Movie\n", - " 1979\n", - " Frank Oz\n", - " actor\n", - " Motorcycle Guy\n", - " 2.0\n", + " An American Werewolf in London\n", + " 1981\n", + " 2\n", " \n", " \n", - " 1723651\n", - " The Muppet Movie\n", - " 1979\n", - " Frank Oz\n", - " actor\n", - " Swedish Chef (assistant)\n", - " 2.0\n", + " The Great Muppet Caper\n", + " 1981\n", + " 6\n", " \n", " \n", - " 1723650\n", - " The Muppet Movie\n", - " 1979\n", - " Frank Oz\n", - " actor\n", - " Marvin Suggs\n", - " 2.0\n", + " The Dark Crystal\n", + " 1982\n", + " 2\n", " \n", " \n", - " 1723649\n", - " The Muppet Movie\n", - " 1979\n", - " Frank Oz\n", - " actor\n", - " Doc Hopper's Men\n", - " 2.0\n", + " The Muppets Take Manhattan\n", + " 1984\n", + " 7\n", " \n", " \n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", + " Follow That Bird\n", + " 1985\n", + " 3\n", " \n", " \n", - " 1723661\n", - " Zathura: A Space Adventure\n", - " 2005\n", - " Frank Oz\n", - " actor\n", - " Robot\n", - " 6.0\n", + " The Muppet Christmas Carol\n", + " 1992\n", + " 7\n", " \n", " \n", - " 1723616\n", - " Sesame Street: C is for Cookie Monster\n", - " 2010\n", - " Frank Oz\n", - " actor\n", - " Cookie Monster\n", - " NaN\n", + " Muppet Treasure Island\n", + " 1996\n", + " 4\n", " \n", " \n", - " 1723605\n", - " Inside Out\n", - " 2015\n", - " Frank Oz\n", - " actor\n", - " Subconscious Guard Dave\n", - " 14.0\n", + " Muppets from Space\n", + " 1999\n", + " 4\n", " \n", " \n", - " 1723631\n", - " The Great Gilly Hopkins\n", - " 2015\n", - " Frank Oz\n", - " actor\n", - " Cookie Monster\n", - " 20.0\n", - " \n", - " \n", - " 1723623\n", - " Star Wars: Episode VII - The Force Awakens\n", - " 2015\n", - " Frank Oz\n", - " actor\n", - " Yoda\n", - " NaN\n", + " The Adventures of Elmo in Grouchland\n", + " 1999\n", + " 3\n", " \n", " \n", "\n", - "

64 rows × 6 columns

\n", "" ], "text/plain": [ - " title year name type \\\n", - "1723645 The Muppet Movie 1979 Frank Oz actor \n", - "1723652 The Muppet Movie 1979 Frank Oz actor \n", - "1723651 The Muppet Movie 1979 Frank Oz actor \n", - "1723650 The Muppet Movie 1979 Frank Oz actor \n", - "1723649 The Muppet Movie 1979 Frank Oz actor \n", - "... ... ... ... ... \n", - "1723661 Zathura: A Space Adventure 2005 Frank Oz actor \n", - "1723616 Sesame Street: C is for Cookie Monster 2010 Frank Oz actor \n", - "1723605 Inside Out 2015 Frank Oz actor \n", - "1723631 The Great Gilly Hopkins 2015 Frank Oz actor \n", - "1723623 Star Wars: Episode VII - The Force Awakens 2015 Frank Oz actor \n", - "\n", - " character n \n", - "1723645 Miss Piggy 2.0 \n", - "1723652 Motorcycle Guy 2.0 \n", - "1723651 Swedish Chef (assistant) 2.0 \n", - "1723650 Marvin Suggs 2.0 \n", - "1723649 Doc Hopper's Men 2.0 \n", - "... ... ... \n", - "1723661 Robot 6.0 \n", - "1723616 Cookie Monster NaN \n", - "1723605 Subconscious Guard Dave 14.0 \n", - "1723631 Cookie Monster 20.0 \n", - "1723623 Yoda NaN \n", - "\n", - "[64 rows x 6 columns]" + " character\n", + "title year \n", + "The Muppet Movie 1979 8\n", + "An American Werewolf in London 1981 2\n", + "The Great Muppet Caper 1981 6\n", + "The Dark Crystal 1982 2\n", + "The Muppets Take Manhattan 1984 7\n", + "Follow That Bird 1985 3\n", + "The Muppet Christmas Carol 1992 7\n", + "Muppet Treasure Island 1996 4\n", + "Muppets from Space 1999 4\n", + "The Adventures of Elmo in Grouchland 1999 3" ] }, - "execution_count": 120, + "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "cast[cast['name'] == 'Frank Oz'].sort_values(by='year')" + "frank_oz = cast[cast['name'] == 'Frank Oz']\n", + "frank_oz_more_than_one = pd.DataFrame(frank_oz.groupby(['title', 'year'])['character'].count())\n", + "frank_oz_more_than_one[frank_oz_more_than_one['character'] > 1].sort_index(axis='index', level='year')" ] }, { @@ -2721,10 +2709,96 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 85, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
title
character
Animal6
Bert3
Cookie Monster5
Fozzie Bear4
Grover2
Miss Piggy6
Sam the Eagle5
Yoda6
\n", + "
" + ], + "text/plain": [ + " title\n", + "character \n", + "Animal 6\n", + "Bert 3\n", + "Cookie Monster 5\n", + "Fozzie Bear 4\n", + "Grover 2\n", + "Miss Piggy 6\n", + "Sam the Eagle 5\n", + "Yoda 6" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "frank_oz_twice = pd.DataFrame(frank_oz.groupby('character')['title'].count())\n", + "frank_oz_twice[frank_oz_twice['title'] >= 2]" + ] }, { "cell_type": "markdown", @@ -2745,9 +2819,30 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 43, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "christmas = release_dates[(release_dates.title.str.contains('Christmas')) & (release_dates.country == 'USA')]\n", "christmas.date.dt.month.value_counts().sort_index().plot(kind='bar')" @@ -2765,12 +2860,12 @@ }, { "cell_type": "code", - "execution_count": 131, + "execution_count": 86, "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -2782,6 +2877,9 @@ "source": [ "summer = release_dates[(release_dates['title'].str.contains('Summer', case=False)) & (release_dates['country'] == 'USA')]\n", "plt.hist(x=summer['date'].dt.month)\n", + "plt.title('Frequency by month of movies released in the USA with \"Summer\" in the title')\n", + "plt.ylabel('Frequency')\n", + "plt.xlabel('Month Number')\n", "plt.show()" ] }, @@ -2797,12 +2895,12 @@ }, { "cell_type": "code", - "execution_count": 135, + "execution_count": 87, "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -2814,6 +2912,9 @@ "source": [ "action = release_dates[(release_dates['title'].str.contains('Action', case=False)) & (release_dates['country'] == 'USA')]\n", "plt.hist(x=action['date'].dt.isocalendar().week)\n", + "plt.title('Frequency by week of movies released in the USA with \"Action\" in the title')\n", + "plt.ylabel('Frequency')\n", + "plt.xlabel('Week Number')\n", "plt.show()" ] }, @@ -2827,11 +2928,291 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 98, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlenamencountrydate
17SpeedKeanu Reeves1.0USA1922-10-22
18SpeedKeanu Reeves1.0USA1936-05-08
21Sweet NovemberKeanu Reeves1.0USA1968-02-08
27The Night BeforeKeanu Reeves1.0USA1988-04-15
3Bill & Ted's Excellent AdventureKeanu Reeves1.0USA1989-02-17
2Bill & Ted's Bogus JourneyKeanu Reeves1.0USA1991-07-19
14Little BuddhaKeanu Reeves1.0USA1994-05-25
19SpeedKeanu Reeves1.0USA1994-06-10
11Johnny MnemonicKeanu Reeves1.0USA1995-05-26
1A Walk in the CloudsKeanu Reeves1.0USA1995-08-11
4Chain ReactionKeanu Reeves1.0USA1996-08-02
6Feeling MinnesotaKeanu Reeves1.0USA1996-09-13
24The Devil's AdvocateKeanu Reeves1.0USA1997-10-17
26The MatrixKeanu Reeves1.0USA1999-03-31
28The ReplacementsKeanu Reeves1.0USA2000-08-11
22Sweet NovemberKeanu Reeves1.0USA2001-02-16
7Hard BallKeanu Reeves1.0USA2001-09-14
5ConstantineKeanu Reeves1.0USA2005-02-18
25The Lake HouseKeanu Reeves1.0USA2006-06-16
20Street KingsKeanu Reeves1.0USA2008-04-11
23The Day the Earth Stood StillKeanu Reeves1.0USA2008-12-12
047 RoninKeanu Reeves1.0USA2013-12-25
9John WickKeanu Reeves1.0USA2014-10-24
12Knock KnockKeanu Reeves1.0USA2015-10-09
10John Wick: Chapter 2Keanu Reeves1.0USA2017-02-10
13Knock KnockKeanu Reeves1.0USA2017-10-06
\n", + "
" + ], + "text/plain": [ + " title name n country date\n", + "17 Speed Keanu Reeves 1.0 USA 1922-10-22\n", + "18 Speed Keanu Reeves 1.0 USA 1936-05-08\n", + "21 Sweet November Keanu Reeves 1.0 USA 1968-02-08\n", + "27 The Night Before Keanu Reeves 1.0 USA 1988-04-15\n", + "3 Bill & Ted's Excellent Adventure Keanu Reeves 1.0 USA 1989-02-17\n", + "2 Bill & Ted's Bogus Journey Keanu Reeves 1.0 USA 1991-07-19\n", + "14 Little Buddha Keanu Reeves 1.0 USA 1994-05-25\n", + "19 Speed Keanu Reeves 1.0 USA 1994-06-10\n", + "11 Johnny Mnemonic Keanu Reeves 1.0 USA 1995-05-26\n", + "1 A Walk in the Clouds Keanu Reeves 1.0 USA 1995-08-11\n", + "4 Chain Reaction Keanu Reeves 1.0 USA 1996-08-02\n", + "6 Feeling Minnesota Keanu Reeves 1.0 USA 1996-09-13\n", + "24 The Devil's Advocate Keanu Reeves 1.0 USA 1997-10-17\n", + "26 The Matrix Keanu Reeves 1.0 USA 1999-03-31\n", + "28 The Replacements Keanu Reeves 1.0 USA 2000-08-11\n", + "22 Sweet November Keanu Reeves 1.0 USA 2001-02-16\n", + "7 Hard Ball Keanu Reeves 1.0 USA 2001-09-14\n", + "5 Constantine Keanu Reeves 1.0 USA 2005-02-18\n", + "25 The Lake House Keanu Reeves 1.0 USA 2006-06-16\n", + "20 Street Kings Keanu Reeves 1.0 USA 2008-04-11\n", + "23 The Day the Earth Stood Still Keanu Reeves 1.0 USA 2008-12-12\n", + "0 47 Ronin Keanu Reeves 1.0 USA 2013-12-25\n", + "9 John Wick Keanu Reeves 1.0 USA 2014-10-24\n", + "12 Knock Knock Keanu Reeves 1.0 USA 2015-10-09\n", + "10 John Wick: Chapter 2 Keanu Reeves 1.0 USA 2017-02-10\n", + "13 Knock Knock Keanu Reeves 1.0 USA 2017-10-06" + ] + }, + "execution_count": 98, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - " " + "keanu_lead = cast[(cast['name'] == 'Keanu Reeves') & (cast['n'] == 1.0)]\n", + "usa = release_dates[release_dates['country'] == 'USA']\n", + "keanu_merge = pd.merge(keanu_lead, usa, how='left', on='title')\n", + "keanu_merge_usa = keanu_merge[keanu_merge['country'] == 'USA']\n", + "keanu_merge_usa[['title', 'name', 'n', 'country', 'date']].sort_values(by='date')" ] }, { @@ -2843,10 +3224,44 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 110, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/var/folders/b6/4qc_2zbx4bg37ybn_70yv7xc0000gn/T/ipykernel_27695/3150025360.py:4: SettingWithCopyWarning: \n", + "A value is trying to be set on a copy of a slice from a DataFrame.\n", + "Try using .loc[row_indexer,col_indexer] = value instead\n", + "\n", + "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", + " keanu_movies_usa['month'] = keanu_movies_usa['date'].dt.month\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjYAAAHFCAYAAADhWLMfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAABAHElEQVR4nO3deVhUdf//8deIMCwiAi6IIpqamvuSlVhq5pK4dme5K5apuVPupWkmbplZaVm5tLjclZlmamRKmZr7bpnldpuGpeGWiPL5/dGP+TaCOoPgwOn5uK65LuYzZ3mfz5wz58VZZmzGGCMAAAALyOPpAgAAALIKwQYAAFgGwQYAAFgGwQYAAFgGwQYAAFgGwQYAAFgGwQYAAFgGwQYAAFgGwQYAAFhGjgw2c+fOlc1mk6+vr44cOZLu9fr166tSpUoeqExau3atbDabPv74Y4/M312HDx9WdHS0QkJCZLPZNHDgwOsOW7JkSTVv3jxd+zvvvCMvLy+1bNlSly5dysZqs9fhw4dls9kcjzx58ig4OFgNGzbUl19+6eny/rXStvfDhw97uhS3dOvWTSVLlvR0GbckbZuYO3dulk1z/vz5mjZt2nXnNWXKlCybV0ZuNp8pU6akW99SUlL01ltv6e6771ZISIj8/f0VGRmpVq1a6dNPP81wOikpKQoLC3N7f5C2vm/ZssWt5bKC27Vu5MhgkyY5OVnPPfecp8vI1QYNGqTvv/9es2fP1oYNGzRo0CC3xp88ebJ69Oihjh07avHixfL19c2mSm+ffv36acOGDfr22281ZcoU/fTTT2rWrJm++eYbT5f2rxQdHa0NGzaoaNGini7lX6do0aLasGGDoqOjs2ya19t55WSdO3dWv3791KBBA33wwQdatmyZnnvuOeXNm1erVq3KcJzPP/9cv/32myTp3XffvZ3l5lq3a93Im+1zuAVNmzbV/Pnz9eyzz6pq1aqeLue2+uuvv+Tr6yubzXZL09mzZ49q166t1q1buz3uiBEjFBcXp379+unVV1+95VpyihIlSujee++VJEVFRals2bKqV6+e3n33XT3wwAMeru7fp1ChQipUqJCny9DFixfl7+/v6TJuK7vd7tgW/q0OHTqkRYsWadSoURozZoyjvWHDhurRo4dSU1MzHO/dd9+Vj4+P6tWrpy+//FL/+9//VLx48dtVNm4gRx+xGTJkiEJDQzV06NAbDnejw6k2m00vvPCC4/kLL7wgm82mXbt2qW3btgoKClJISIhiY2N15coV/fjjj2ratKkCAwNVsmRJTZo0KcN5Xrp0SbGxsQoLC5Ofn5/q1aun7du3pxtuy5YtatmypUJCQuTr66vq1avrv//9r9MwaYcmv/zyS3Xv3l2FChWSv7+/kpOTr7vMR48eVadOnVS4cGHZ7XZVqFBBL7/8smMjTDtldvDgQa1YscJx+sWVw/2pqanq3bu34uLiNGrUKE2fPt0p1BhjNGPGDFWrVk1+fn4KDg7Wo48+ql9++cVpOvHx8WrVqpWKFy8uX19flSlTRj179tTvv//uNFzae7J37161b99eQUFBKlKkiLp3766kpCTHcO68z+6oVauWJDn++0pz8uRJ9ezZU8WLF5ePj49KlSqlMWPG6MqVK07DXb58WePGjVP58uVlt9tVqFAhxcTE6NSpU45hWrdurcjIyAw/JO+55x7VqFHD8dzV/t2+fbuaN2/uWAfCw8MVHR2t//3vfzdc3rRTuRs2bFCdOnXk5+enkiVLas6cOZKk5cuXq0aNGvL391flypW1cuXKdNNYt26dGjZsqMDAQPn7+6tOnTpavny54/WdO3fKZrNl+J9s2vq4dOlSSdc/FfXVV1+pYcOGyp8/v/z9/RUVFaXVq1c7DXPq1Ck99dRTioiIcPR9VFSUvvrqqxv2Qdo6t23bNj366KMKDg5W6dKlJbne/xnJ6m3D1eVzpa8yktE25er2mJH69etr+fLlOnLkiNNp32tNnTpVpUqVUr58+XTfffdp48aN6YZx5bMzK/zxxx+SdN0jhnnypN9N/vrrr1q5cqVatGihwYMHKzU11e3TeWfOnFFMTIxCQkIUEBCgFi1aOK0nL774ovLmzatjx46lG7d79+4KDQ294aUB3bp1U758+fTDDz+oSZMmCggIUNGiRTVhwgRJ0saNG1W3bl0FBATozjvv1Lx589JNY8+ePWrVqpWCg4Pl6+uratWqpRsubV+zYMECjRw5UuHh4cqfP78eeugh/fjjj47hsnLduCmTA82ZM8dIMps3bzavvvqqkWRWr17teL1evXqmYsWKjueHDh0yksycOXPSTUuSGT16tOP56NGjjSRTrlw58+KLL5r4+HgzZMgQI8n07dvXlC9f3kyfPt3Ex8ebmJgYI8l88sknjvHXrFljJJmIiAjTqlUrs2zZMvPBBx+YMmXKmPz585uff/7ZMezXX39tfHx8zP33328WLVpkVq5cabp165au1rTlLVasmHnqqafMihUrzMcff2yuXLmSYf8kJiaaYsWKmUKFCpk333zTrFy50vTt29dIMr179zbGGJOUlGQ2bNhgwsLCTFRUlNmwYYPZsGGDuXTp0nX7PTIy0jRu3Ni0a9fO2Gw28+qrr2Y4XI8ePYy3t7d55plnzMqVK838+fNN+fLlTZEiRczJkycdw82cOdPExcWZpUuXmoSEBDNv3jxTtWpVU65cOXP58uUM35NRo0aZ+Ph4M3XqVGO3201MTIxjOHfe54ykjT958mSn9j179hhJpl+/fo62EydOmIiICBMZGWneeust89VXX5kXX3zR2O12061bN8dwV69eNU2bNjUBAQFmzJgxJj4+3rzzzjumWLFi5q677jIXL140xhjz2WefGUkmPj7ead779+83ksz06dPd6t/z58+b0NBQU6tWLfPf//7XJCQkmEWLFplevXqZffv23bAf6tWrZ0JDQ025cuXMu+++a1atWmWaN29uJJkxY8aYypUrmwULFpgvvvjC3HvvvcZut5vjx487xl+7dq3x9vY2NWvWNIsWLTJLliwxjRs3NjabzSxcuNAxXPXq1U1UVFS6+T/22GOmcOHCJiUlxRjzf+v/oUOHHMO8//77xmazmdatW5vFixebZcuWmebNmxsvLy/z1VdfOYZr0qSJKVSokJk1a5ZZu3atWbJkiRk1apRTHRlJW+ciIyPN0KFDTXx8vFmyZInL/W+MMV27djWRkZFO083qbcOV5XO1rzKS0Tbl6vaYkb1795qoqCgTFhbm+MzZsGGD07xKlixpmjZtapYsWWKWLFliKleubIKDg82ff/7pmI6rn503WqZrt/M0kydPdlrfzp8/bwoUKGDCwsLMW2+95bQeXs9LL71kJJnly5eb1NRUExkZaUqVKmVSU1NvOm7a+h4REWG6d+9uVqxYYWbNmmUKFy5sIiIizJkzZ4wxxvz222/GbrebkSNHOo3/xx9/GD8/PzN48OAbzqdr167Gx8fHVKhQwbz66qtO+7Thw4ebO++8M932v2XLFsf4P/zwgwkMDDSlS5c27733nlm+fLlp3769kWQmTpzoGC5tn1iyZEnTsWNHs3z5crNgwQJTokQJU7ZsWcd+LKvWDVfk+GCTnJxs7rjjDlOrVi3HSpMVwebll192Gq5atWpGklm8eLGjLSUlxRQqVMg88sgjjra0N7FGjRpOK/Hhw4eNt7e3efLJJx1t5cuXN9WrV3d8gKdp3ry5KVq0qLl69arT8nbp0sWl/hk2bJiRZL7//nun9t69exubzWZ+/PFHR1tkZKSJjo52abqRkZFGkpFkRowYkeEwGzZsyLD/jh07Zvz8/MyQIUMyHC81NdWkpKSYI0eOGEnms88+c7yW9p5MmjTJaZynn37a+Pr6Ovo5q4LNxIkTTUpKirl06ZLZsWOHue+++0zRokWdPtB69uxp8uXLZ44cOeI0jSlTphhJZu/evcYYYxYsWJAu/BpjzObNm40kM2PGDGPM3+tSkSJFTIcOHZyGGzJkiPHx8TG///67Mcb1/t2yZYuR5NgZu6NevXrpPsT++OMP4+XlZfz8/JxCzI4dO9IFr3vvvdcULlzYnDt3ztF25coVU6lSJVO8eHHH+zV9+nQjyWl9PH36tLHb7eaZZ55xtF0bbC5cuGBCQkJMixYtnOq+evWqqVq1qqldu7ajLV++fGbgwIFu90HaOjdq1CindnfW72uDTXZsGzdbPnf6KiM3CjY32x6vJzo6Ol3g++e8Kleu7PRP26ZNm4wks2DBAkebq5+dN1omV4ONMcYsX77cFCxY0PH5Fxoaatq2bWuWLl2abvzU1FRTpkwZU6xYMcdypPXZP/8Bv5609b1NmzZO7d99952RZMaNG+do69q1qylcuLBJTk52tE2cONHkyZPnpgGsa9eu6T6b0vZpksy2bdsc7Wnbf2xsrKOtXbt2xm63m6NHjzpN9+GHHzb+/v6OsJG2T2zWrJnTcP/973+NJEd4MSZr1g1X5OhTUZLk4+OjcePGacuWLVl6GPLau38qVKggm82mhx9+2NGWN29elSlTJsM7szp06OB0GC0yMlJ16tTRmjVrJEkHDx7UDz/8oI4dO0qSrly54ng0a9ZMJ06ccDpMJ0n/+c9/XKr966+/1l133aXatWs7tXfr1k3GGH399dcuTScj1apVU4kSJfT6669neAjw888/l81mU6dOnZyWKSwsTFWrVtXatWsdwyYmJqpXr16KiIhQ3rx55e3trcjISEnS/v370027ZcuWTs+rVKmiS5cuKTExMdPLk5GhQ4fK29vbcWh1z549WrZsmdMdLp9//rkaNGig8PBwp+VMWz8SEhIcwxUoUEAtWrRwGq5atWoKCwtz9EfevHnVqVMnLV682HE4/+rVq3r//ffVqlUrhYaGOqbnSv+WKVNGwcHBGjp0qN58803t27fPrT4oWrSoatas6XgeEhKiwoULq1q1agoPD3e0V6hQQZIc28CFCxf0/fff69FHH1W+fPkcw3l5ealz58763//+51ivO3bsKLvd7nSIfsGCBUpOTlZMTMx1a1u/fr1Onz6trl27OvVBamqqmjZtqs2bN+vChQuSpNq1a2vu3LkaN26cNm7cqJSUFLf64dptzp31+1rZsW3cbPnc6St3Zdf2GB0dLS8vL6fpSv+3jmXms/NWNWvWTEePHtWnn36qZ599VhUrVtSSJUvUsmVL9e3b12nYhIQEHTx4UF27dnUsR0xMjGw2m2bPnu3yPNOWL02dOnUUGRnp2IdI0oABA5SYmKiPPvpI0t+XCcycOVPR0dEu3ZFns9nUrFkzx/O0fVrRokVVvXp1R3va9v/Pfd3XX3+thg0bKiIiwmma3bp108WLF7Vhwwan9ozWF0kZ7j+v52brhqtyfLCRpHbt2qlGjRoaOXKk2x9c1xMSEuL03MfHR/7+/unu+vHx8cnwPGZYWFiGbWnna9Ou13j22Wfl7e3t9Hj66aclKd35dFfvCvnjjz8yHDZth5RWQ2YUK1ZMa9euVXBwsJo0aZJu5f3tt99kjFGRIkXSLdfGjRsdy5SamqrGjRtr8eLFGjJkiFavXq1NmzY5wtJff/2Vbt5pO/c0drv9usPeigEDBmjz5s1at26dpkyZopSUFLVq1cqp33777TctW7Ys3TJWrFhR0v+9d7/99pv+/PNP+fj4pBv25MmTTu9x9+7ddenSJS1cuFCStGrVKp04ccJpJ+9q/wYFBSkhIUHVqlXTiBEjVLFiRYWHh2v06NEubSPXrv/S3+t6RtuFJMc2cObMGRljXFr/QkJC1LJlS7333nu6evWqpL+vp6ldu7ajHzOStu08+uij6fpg4sSJMsbo9OnTkqRFixapa9eueuedd3TfffcpJCREXbp00cmTJ2/aB1L6bc7V/r9e3Vm9bdxs+dzpK3dl1/Z4s+lm5rPzn/Lm/fuemLR17lpp18h5e3s7tfv5+al169aaPHmyI7zcddddeuONN7R3717HcGnXjbVp00Z//vmn/vzzTwUFBalu3br65JNP9Oeff7rUDzfbh0hS9erVdf/99+uNN96Q9Hd4Pnz4cLqwdT3X26ddb/v/577O3f1MVqwvWbXO5ei7otLYbDZNnDhRjRo10qxZs9K9nvbGXXux7a3s4G8mow/OkydPOt6YggULSpKGDx+uRx55JMNplCtXzum5q3cdhYaG6sSJE+naf/31V6d5Z1apUqW0du1aNWjQQE2aNNHKlStVp04dx7RtNpu+/fZbx0r3T2lte/bs0c6dOzV37lx17drV8frBgwczXVdWvc/Fixd3XDAcFRWlsLAwderUSaNHj9brr78u6e/lrFKlil566aUMp5G2cRcsWFChoaEZXmArSYGBgY6/046yzZkzRz179tScOXMUHh6uxo0bO4ZxtX8lqXLlylq4cKGMMdq1a5fmzp2rsWPHys/PT8OGDXOrT1wVHBysPHnyuLz+xcTE6KOPPlJ8fLxKlCihzZs3a+bMmTecR9r4r7322nXv2ClSpIhj2GnTpmnatGk6evSoli5dqmHDhikxMfG678k/XbvNudP/GdWd1dvGzZbPnb7KLTLz2Xnt+F5eXjp+/HiGrx8/flxeXl7pdqLXKlGihJ566ikNHDhQe/fuVcWKFZWUlKRPPvlEknT33XdnON78+fMdAexGrrcPKVOmjFNb//791bZtW23btk2vv/667rzzTjVq1Oim079V2b2fyU65IthI0kMPPaRGjRpp7Nix6Q6NFSlSRL6+vtq1a5dT+2effZZt9SxYsECxsbGOD8YjR45o/fr16tKli6S/N7yyZctq586dGj9+fJbOu2HDhoqLi9O2bduc7qZ57733ZLPZ1KBBg1ueR8mSJR3hpmnTplqxYoWioqLUvHlzTZgwQcePH9djjz123fHT+uXaD/i33nor0zVl1/vcsWNHvfPOO3r77bc1ePBgRUZGqnnz5vriiy9UunRpBQcHX3fc5s2ba+HChbp69aruueeem84rJiZGvXv31rp167Rs2TLFxsY6HXp1tX//yWazqWrVqnrllVc0d+5cbdu2zaXxMiMgIED33HOPFi9erClTpsjPz0/S30chPvjgAxUvXlx33nmnY/jGjRurWLFimjNnjkqUKCFfX1+1b9/+hvOIiopSgQIFtG/fPpf/M5X+3hH17dtXq1ev1nfffZep5ctM/7s7bma3jYyWL7N9lZ3sdvstHdW51c9OX19fRUVFaenSpZo0aZLTEYtLly5p6dKlqlu3rqP93LlzstlsTqdW06SdFkz7R2b+/Pn666+/9OKLL6pu3brphm/btq1mz57tUrD58MMPnU6Frl+/XkeOHNGTTz7pNFybNm1UokQJPfPMM0pISNArr7xyW756o2HDhvr000/166+/Op2efu+99+Tv75+prwm41XXDVbkm2EjSxIkTVbNmTSUmJjodyk47rz179myVLl1aVatW1aZNmzR//vxsqyUxMVFt2rRRjx49lJSUpNGjR8vX11fDhw93DPPWW2/p4YcfVpMmTdStWzcVK1ZMp0+f1v79+7Vt2zbHeVN3DRo0SO+9956io6M1duxYRUZGavny5ZoxY4Z69+7ttGO5FZGRkU7h5osvvtD999+vp556SjExMdqyZYseeOABBQQE6MSJE1q3bp0qV66s3r17q3z58ipdurSGDRsmY4xCQkK0bNkyxcfHZ7qe7HyfJ06cqHvuuUcvvvii3nnnHY0dO1bx8fGqU6eO+vfvr3LlyunSpUs6fPiwvvjiC7355psqXry42rVrpw8//FDNmjXTgAEDVLt2bXl7e+t///uf1qxZo1atWqlNmzaO+bRv316xsbFq3769kpOT1a1bN6c6oqKiXOrfzz//XDNmzFDr1q11xx13yBijxYsX688//8z2/+bi4uLUqFEjNWjQQM8++6x8fHw0Y8YM7dmzRwsWLHD60PXy8lKXLl00depU5c+fX4888oiCgoJuOP18+fLptddeU9euXXX69Gk9+uijKly4sE6dOqWdO3fq1KlTmjlzppKSktSgQQN16NBB5cuXV2BgoDZv3qyVK1de9z/9m3G1/29lXFe3DVeWz9W+up0qV66sxYsXa+bMmapZs6by5MnjOELqqlv97JwwYYIaNGig++67TwMHDlSJEiV09OhRTZs2Tb/99pvjdLAk/fjjj2rSpInatWunevXqqWjRojpz5oyWL1+uWbNmqX79+o4j1u+++66Cg4P17LPPZvhlpWnr+s6dO2/63WtbtmzRk08+qbZt2+rYsWMaOXKkihUrli4UeXl5qU+fPho6dKgCAgLSfWZkl9GjRzuuNRw1apRCQkL04Ycfavny5Zo0adJNt+OMZMW64RK3LjW+Tf55V9S1OnToYCQ53RVlzN+3Nz/55JOmSJEiJiAgwLRo0cIcPnz4undFnTp1ymn8rl27moCAgHTzu/YOrLQrwN9//33Tv39/U6hQIWO3283999/vdJdJmp07dzpub/X29jZhYWHmwQcfNG+++aZLy3s9R44cMR06dDChoaHG29vblCtXzkyePDnd3QLu3hWV0bBHjx41pUuXNgEBASYhIcEYY8zs2bPNPffcYwICAoyfn58pXbq06dKli1Mf7Nu3zzRq1MgEBgaa4OBg07ZtW3P06FGX35OMbgN29X3OyM3ulmjbtq3JmzevOXjwoDHGmFOnTpn+/fubUqVKGW9vbxMSEmJq1qxpRo4cac6fP+8YLyUlxUyZMsVUrVrV+Pr6mnz58pny5cubnj17mp9++indfNLW4YxuhU5zs/794YcfTPv27U3p0qWNn5+fCQoKMrVr1zZz5869YR8Yk36dTnO991+S6dOnj1Pbt99+ax588EFHfffee69ZtmxZhvM7cOCA426Ta293Nybj99kYYxISEkx0dLQJCQkx3t7eplixYiY6Otp89NFHxhhjLl26ZHr16mWqVKli8ufPb/z8/Ey5cuXM6NGjzYULF27YB9db59K4sn5ndLu3q+O6sm24s3w366vrudFdUa5sjxk5ffq0efTRR02BAgWMzWYzabuZG21/GW2/rnx23siWLVtMmzZtTMGCBY2Xl5cpWLCgadOmjdm6davTcGfOnDHjxo0zDz74oClWrJjx8fExAQEBplq1ambcuHGOr2zYuXOnkXTDu9R++OGHdF8dca20fvzyyy9N586dTYECBYyfn59p1qxZhp8XxhjHZ1yvXr1cWnZjXN+npclo+9+9e7dp0aKFCQoKMj4+PqZq1arp7kpN2ydeu65ltG5l1bpxM7b/PyIAAMiBXnvtNfXv31979uy54YX3+BvBBgCAHGj79u06dOiQevbsqaioKC1ZssTTJeUKBBsAAHKgkiVL6uTJk7r//vv1/vvvZ3iLONIj2AAAAMvIFV/QBwAA4AqCDQAAsAyCDQAAsIxc9QV910pNTdWvv/6qwMDA2/JNjAAA4NYZY3Tu3DmFh4crT56sPcaSq4PNr7/+mu7nFQAAQO5w7NgxFS9ePEunmauDTdoPDB47dkz58+f3cDUAAMAVZ8+eVUREhNMPBWeVXB1s0k4/5c+fn2ADAEAukx2XkXDxMAAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsIy8ni4A+DcrOWy5R+Z7eEK0R+YLANmNIzYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyCDYAAMAyPBpsrly5oueee06lSpWSn5+f7rjjDo0dO1apqameLAsAAORSeT0584kTJ+rNN9/UvHnzVLFiRW3ZskUxMTEKCgrSgAEDPFkaAADIhTwabDZs2KBWrVopOjpaklSyZEktWLBAW7Zs8WRZAAAgl/Loqai6detq9erVOnDggCRp586dWrdunZo1a5bh8MnJyTp79qzTAwAAII1Hj9gMHTpUSUlJKl++vLy8vHT16lW99NJLat++fYbDx8XFacyYMbe5SgAAkFt49IjNokWL9MEHH2j+/Pnatm2b5s2bpylTpmjevHkZDj98+HAlJSU5HseOHbvNFQMAgJzMo0dsBg8erGHDhqldu3aSpMqVK+vIkSOKi4tT165d0w1vt9tlt9tvd5kAACCX8OgRm4sXLypPHucSvLy8uN0bAABkikeP2LRo0UIvvfSSSpQooYoVK2r79u2aOnWqunfv7smyAABALuXRYPPaa6/p+eef19NPP63ExESFh4erZ8+eGjVqlCfLAgAAuZRHg01gYKCmTZumadOmebIMAABgEfxWFAAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsAyCDQAAsIy8ni4A7is5bLlH5nt4QrRH5gsAgKs4YgMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACyDYAMAACzD7WCzbds27d692/H8s88+U+vWrTVixAhdvnw5S4sDAABwh9vBpmfPnjpw4IAk6ZdfflG7du3k7++vjz76SEOGDHG7gOPHj6tTp04KDQ2Vv7+/qlWrpq1bt7o9HQAAALeDzYEDB1StWjVJ0kcffaQHHnhA8+fP19y5c/XJJ5+4Na0zZ84oKipK3t7eWrFihfbt26eXX35ZBQoUcLcsAAAA5XV3BGOMUlNTJUlfffWVmjdvLkmKiIjQ77//7ta0Jk6cqIiICM2ZM8fRVrJkSXdLAgAAkJSJIza1atXSuHHj9P777yshIUHR0dGSpEOHDqlIkSJuTWvp0qWqVauW2rZtq8KFC6t69ep6++23rzt8cnKyzp496/QAAABI4/YRm2nTpqljx45asmSJRo4cqTJlykiSPv74Y9WpU8etaf3yyy+aOXOmYmNjNWLECG3atEn9+/eX3W5Xly5d0g0fFxenMWPGuFsy/uVKDlvusXkfnhDtsXkDOZmntku2SetzO9hUqVLF6a6oNJMnT5aXl5db00pNTVWtWrU0fvx4SVL16tW1d+9ezZw5M8NgM3z4cMXGxjqenz17VhEREW4uAQAAsCq3g80/nT9/3nG9TRpvb2+Xxy9atKjuuusup7YKFSpc9yJku90uu93ufqEAAOBfwe1rbA4dOqTo6GgFBAQoKChIwcHBCg4OVoECBRQcHOzWtKKiovTjjz86tR04cECRkZHulgUAAOD+EZuOHTtKkmbPnq0iRYrIZrNleuaDBg1SnTp1NH78eD322GPatGmTZs2apVmzZmV6mgAA4N/L7WCza9cubd26VeXKlbvlmd9999369NNPNXz4cI0dO1alSpVyXJwMAADgLreDzd13361jx45lSbCRpObNmzu+CwcAAOBWuB1s3nnnHfXq1UvHjx9XpUqV0l0sXKVKlSwrDgAAwB1uB5tTp07p559/VkxMjKPNZrPJGCObzaarV69maYEAAACucjvYdO/eXdWrV9eCBQtu+eJhAACArOR2sDly5IiWLl3q+MZhAACAnMLt77F58MEHtXPnzuyoBQAA4Ja4fcSmRYsWGjRokHbv3q3KlSunu3i4ZcuWWVYcAACAO9wONr169ZIkjR07Nt1rXDwMAAA8ye1gc+1vQwEAAOQUbl9jAwAAkFO5dMRm+vTpLk+wf//+mS4GAADgVrgUbF555RWXJmaz2Qg2AADAY1wKNocOHcruOgAAAG4Z19gAAADLcOmITWxsrF588UUFBAQoNjb2hsNOnTo1SwoDAABwl0vBZvv27UpJSXH8fT38bhQAAPAkl4LNmjVr9MsvvygoKEhr1qzJ7poAAAAyxeVrbMqWLatTp045nj/++OP67bffsqUoAACAzHA52BhjnJ5/8cUXunDhQpYXBAAAkFncFQUAACzD5WBjs9nSXRzMxcIAACAncflHMI0x6tatm+x2uyTp0qVL6tWrlwICApyGW7x4cdZWCAAA4CKXg03Xrl2dnnfq1CnLiwEAALgVLgebOXPmZGcdAAAAt4yLhwEAgGUQbAAAgGUQbAAAgGUQbAAAgGW4FGxq1KihM2fOSJLGjh2rixcvZmtRAAAAmeFSsNm/f7/j5xPGjBmj8+fPZ2tRAAAAmeHS7d7VqlVTTEyM6tatK2OMpkyZonz58mU47KhRo7K0QAAAAFe5FGzmzp2r0aNH6/PPP5fNZtOKFSuUN2/6UW02G8EGAAB4jEvBply5clq4cKEkKU+ePFq9erUKFy6crYUBAAC4y+VvHk6TmpqaHXUAyEFKDlvukfkenhB9w9c9VZd089oA5AxuBxtJ+vnnnzVt2jTt379fNptNFSpU0IABA1S6dOmsrg8AAMBlbn+PzapVq3TXXXdp06ZNqlKliipVqqTvv/9eFStWVHx8fHbUCAAA4BK3j9gMGzZMgwYN0oQJE9K1Dx06VI0aNcqy4gAAANzh9hGb/fv364knnkjX3r17d+3bty9LigIAAMgMt4NNoUKFtGPHjnTtO3bs4E4pAADgUW6fiurRo4eeeuop/fLLL6pTp45sNpvWrVuniRMn6plnnsmOGgEAAFzidrB5/vnnFRgYqJdfflnDhw+XJIWHh+uFF15Q//79s7xAAAAAV7kdbGw2mwYNGqRBgwbp3LlzkqTAwMAsLwwAAMBdmfoemzQEGgAAkJO4ffEwAABATkWwAQAAlkGwAQAAluFWsElJSVGDBg104MCB7KoHAAAg09wKNt7e3tqzZ49sNlt21QMAAJBpbp+K6tKli959993sqAUAAOCWuH279+XLl/XOO+8oPj5etWrVUkBAgNPrU6dOzbLiAAAA3OF2sNmzZ49q1KghSemuteEUFQAA8CS3g82aNWuyow4AAIBblunbvQ8ePKhVq1bpr7/+kiQZY7KsKAAAgMxwO9j88ccfatiwoe688041a9ZMJ06ckCQ9+eST/Lo3AADwKLeDzaBBg+Tt7a2jR4/K39/f0f74449r5cqVWVocAACAO9y+xubLL7/UqlWrVLx4caf2smXL6siRI1lWGAAAgLvcPmJz4cIFpyM1aX7//XfZ7fYsKQoAACAz3A42DzzwgN577z3Hc5vNptTUVE2ePFkNGjTI0uIAAADc4fapqMmTJ6t+/frasmWLLl++rCFDhmjv3r06ffq0vvvuu+yoEQAAwCVuH7G56667tGvXLtWuXVuNGjXShQsX9Mgjj2j79u0qXbp0dtQIAADgEreP2EhSWFiYxowZk9W1AAAA3JJMBZszZ87o3Xff1f79+2Wz2VShQgXFxMQoJCQkq+sDAABwmdunohISElSqVClNnz5dZ86c0enTpzV9+nSVKlVKCQkJ2VEjAACAS9w+YtOnTx899thjmjlzpry8vCRJV69e1dNPP60+ffpoz549WV4kAACAK9w+YvPzzz/rmWeecYQaSfLy8lJsbKx+/vnnLC0OAADAHW4Hmxo1amj//v3p2vfv369q1aplRU0AAACZ4tKpqF27djn+7t+/vwYMGKCDBw/q3nvvlSRt3LhRb7zxhiZMmJA9VQIAALjApWBTrVo12Ww2GWMcbUOGDEk3XIcOHfT4449nXXUAAABucCnYHDp0KLvrUFxcnEaMGKEBAwZo2rRp2T4/AABgPS4Fm8jIyGwtYvPmzZo1a5aqVKmSrfMBAADWlqkv6Dt+/Li+++47JSYmKjU11em1/v37uzWt8+fPq2PHjnr77bc1bty4zJQDAAAgKRPBZs6cOerVq5d8fHwUGhoqm83meM1ms7kdbPr06aPo6Gg99NBDNw02ycnJSk5Odjw/e/ase8UDAABLczvYjBo1SqNGjdLw4cOVJ4/bd4s7WbhwobZt26bNmze7NHxcXBy/UQUALio5bLnH5n14QrTH5m1FvJeuczuZXLx4Ue3atbvlUHPs2DENGDBAH3zwgXx9fV0aZ/jw4UpKSnI8jh07dks1AAAAa3E7nTzxxBP66KOPbnnGW7duVWJiomrWrKm8efMqb968SkhI0PTp05U3b15dvXo13Th2u1358+d3egAAAKRx+1RUXFycmjdvrpUrV6py5cry9vZ2en3q1KkuTadhw4bavXu3U1tMTIzKly+voUOHOv1kAwAAgCvcDjbjx4/XqlWrVK5cOUlKd/GwqwIDA1WpUiWntoCAAIWGhqZrBwAAcIXbwWbq1KmaPXu2unXrlg3lAAAAZJ7bwcZutysqKio7atHatWuzZboAAODfwe2LhwcMGKDXXnstO2oBAAC4JW4fsdm0aZO+/vprff7556pYsWK6i4cXL16cZcUBAAC4w+1gU6BAAT3yyCPZUQsAAMAtydRPKgAAAOREt/b1wQAAADmI20dsSpUqdcPvq/nll19uqSAAAIDMcjvYDBw40Ol5SkqKtm/frpUrV2rw4MFZVRcAAIDb3A42AwYMyLD9jTfe0JYtW265IAAAgMzKsmtsHn74YX3yySdZNTkAAAC3ZVmw+fjjjxUSEpJVkwMAAHCb26eiqlev7nTxsDFGJ0+e1KlTpzRjxowsLQ4AAMAdbgeb1q1bOz3PkyePChUqpPr166t8+fJZVRcAAIDb3A42o0ePzo46AAAAbhlf0AcAACzD5SM2efLkueEX80mSzWbTlStXbrkoAACAzHA52Hz66afXfW39+vV67bXXZIzJkqIAAAAyw+Vg06pVq3RtP/zwg4YPH65ly5apY8eOevHFF7O0OAAAAHdk6hqbX3/9VT169FCVKlV05coV7dixQ/PmzVOJEiWyuj4AAACXuRVskpKSNHToUJUpU0Z79+7V6tWrtWzZMlWqVCm76gMAAHCZy6eiJk2apIkTJyosLEwLFizI8NQUAACAJ7kcbIYNGyY/Pz+VKVNG8+bN07x58zIcbvHixVlWHAAAgDtcDjZdunS56e3eAAAAnuRysJk7d242lgEAAHDr+OZhAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGS5/8/C/Uclhyz0y38MToj0yXwCZ46nPConPi6zG537uxxEbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGQQbAABgGR4NNnFxcbr77rsVGBiowoULq3Xr1vrxxx89WRIAAMjFPBpsEhIS1KdPH23cuFHx8fG6cuWKGjdurAsXLniyLAAAkEvl9eTMV65c6fR8zpw5Kly4sLZu3aoHHnjAQ1UBAIDcKkddY5OUlCRJCgkJ8XAlAAAgN/LoEZt/MsYoNjZWdevWVaVKlTIcJjk5WcnJyY7nZ8+evV3lAQCAXCDHHLHp27evdu3apQULFlx3mLi4OAUFBTkeERERt7FCAACQ0+WIYNOvXz8tXbpUa9asUfHixa873PDhw5WUlOR4HDt27DZWCQAAcjqPnooyxqhfv3769NNPtXbtWpUqVeqGw9vtdtnt9ttUHQAAyG08Gmz69Omj+fPn67PPPlNgYKBOnjwpSQoKCpKfn58nSwMAALmQR09FzZw5U0lJSapfv76KFi3qeCxatMiTZQEAgFzK46eiAAAAskqOuHgYAAAgKxBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZRBsAACAZeT1dAGwjpLDlnts3ocnRHts3gCAnIMjNgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDIINgAAwDI8HmxmzJihUqVKydfXVzVr1tS3337r6ZIAAEAu5dFgs2jRIg0cOFAjR47U9u3bdf/99+vhhx/W0aNHPVkWAADIpTwabKZOnaonnnhCTz75pCpUqKBp06YpIiJCM2fO9GRZAAAgl/JYsLl8+bK2bt2qxo0bO7U3btxY69ev91BVAAAgN8vrqRn//vvvunr1qooUKeLUXqRIEZ08eTLDcZKTk5WcnOx4npSUJEk6e/ZsttSYmnwxW6Z7MzdbHupK70a15dS6JN7La+XUuqTcuY7l1Lok1rFr5dS6pOzZx6ZN0xiT5dOW8ZDjx48bSWb9+vVO7ePGjTPlypXLcJzRo0cbSTx48ODBgwcPCzyOHTuW5fnCY0dsChYsKC8vr3RHZxITE9MdxUkzfPhwxcbGOp6npqbq9OnTCg0Nlc1my9Z6reDs2bOKiIjQsWPHlD9/fk+XkyvQZ+6hv9xDf7mPPnNPTu0vY4zOnTun8PDwLJ+2x4KNj4+Patasqfj4eLVp08bRHh8fr1atWmU4jt1ul91ud2orUKBAdpZpSfnz589RK3huQJ+5h/5yD/3lPvrMPTmxv4KCgrJluh4LNpIUGxurzp07q1atWrrvvvs0a9YsHT16VL169fJkWQAAIJfyaLB5/PHH9ccff2js2LE6ceKEKlWqpC+++EKRkZGeLAsAAORSHg02kvT000/r6aef9nQZ/wp2u12jR49OdzoP10efuYf+cg/95T76zD3/xv6yGZMd91oBAADcfh7/rSgAAICsQrABAACWQbABAACWQbABAACWQbD5F4iLi9Pdd9+twMBAFS5cWK1bt9aPP/7o6bJyjbi4ONlsNg0cONDTpeRox48fV6dOnRQaGip/f39Vq1ZNW7du9XRZOdKVK1f03HPPqVSpUvLz89Mdd9yhsWPHKjU11dOl5RjffPONWrRoofDwcNlsNi1ZssTpdWOMXnjhBYWHh8vPz0/169fX3r17PVNsDnCj/kpJSdHQoUNVuXJlBQQEKDw8XF26dNGvv/7quYKzEcHmXyAhIUF9+vTRxo0bFR8frytXrqhx48a6cOGCp0vL8TZv3qxZs2apSpUqni4lRztz5oyioqLk7e2tFStWaN++fXr55Zf5ZvDrmDhxot588029/vrr2r9/vyZNmqTJkyfrtdde83RpOcaFCxdUtWpVvf766xm+PmnSJE2dOlWvv/66Nm/erLCwMDVq1Ejnzp27zZXmDDfqr4sXL2rbtm16/vnntW3bNi1evFgHDhxQy5YtPVDpbZDlvz6FHC8xMdFIMgkJCZ4uJUc7d+6cKVu2rImPjzf16tUzAwYM8HRJOdbQoUNN3bp1PV1GrhEdHW26d+/u1PbII4+YTp06eaiinE2S+fTTTx3PU1NTTVhYmJkwYYKj7dKlSyYoKMi8+eabHqgwZ7m2vzKyadMmI8kcOXLk9hR1G3HE5l8oKSlJkhQSEuLhSnK2Pn36KDo6Wg899JCnS8nxli5dqlq1aqlt27YqXLiwqlevrrffftvTZeVYdevW1erVq3XgwAFJ0s6dO7Vu3To1a9bMw5XlDocOHdLJkyfVuHFjR5vdble9evW0fv16D1aWeyQlJclms1nyqKrHv3kYt5cxRrGxsapbt64qVark6XJyrIULF2rbtm3avHmzp0vJFX755RfNnDlTsbGxGjFihDZt2qT+/fvLbrerS5cuni4vxxk6dKiSkpJUvnx5eXl56erVq3rppZfUvn17T5eWK5w8eVKSVKRIEaf2IkWK6MiRI54oKVe5dOmShg0bpg4dOuS4H8bMCgSbf5m+fftq165dWrdunadLybGOHTumAQMG6Msvv5Svr6+ny8kVUlNTVatWLY0fP16SVL16de3du1czZ84k2GRg0aJF+uCDDzR//nxVrFhRO3bs0MCBAxUeHq6uXbt6urxcw2azOT03xqRrg7OUlBS1a9dOqampmjFjhqfLyRYEm3+Rfv36aenSpfrmm29UvHhxT5eTY23dulWJiYmqWbOmo+3q1av65ptv9Prrrys5OVleXl4erDDnKVq0qO666y6ntgoVKuiTTz7xUEU52+DBgzVs2DC1a9dOklS5cmUdOXJEcXFxBBsXhIWFSfr7yE3RokUd7YmJiemO4uD/pKSk6LHHHtOhQ4f09ddfW/JojcRdUf8Kxhj17dtXixcv1tdff61SpUp5uqQcrWHDhtq9e7d27NjheNSqVUsdO3bUjh07CDUZiIqKSvcVAgcOHFBkZKSHKsrZLl68qDx5nD9+vby8uN3bRaVKlVJYWJji4+MdbZcvX1ZCQoLq1KnjwcpyrrRQ89NPP+mrr75SaGiop0vKNhyx+Rfo06eP5s+fr88++0yBgYGO89NBQUHy8/PzcHU5T2BgYLrrjwICAhQaGsp1SdcxaNAg1alTR+PHj9djjz2mTZs2adasWZo1a5anS8uRWrRooZdeekklSpRQxYoVtX37dk2dOlXdu3f3dGk5xvnz53Xw4EHH80OHDmnHjh0KCQlRiRIlNHDgQI0fP15ly5ZV2bJlNX78ePn7+6tDhw4erNpzbtRf4eHhevTRR7Vt2zZ9/vnnunr1qmM/EBISIh8fH0+VnT08fFcWbgNJGT7mzJnj6dJyDW73vrlly5aZSpUqGbvdbsqXL29mzZrl6ZJyrLNnz5oBAwaYEiVKGF9fX3PHHXeYkSNHmuTkZE+XlmOsWbMmw8+trl27GmP+vuV79OjRJiwszNjtdvPAAw+Y3bt3e7ZoD7pRfx06dOi6+4E1a9Z4uvQsZzPGmNsZpAAAALIL19gAAADLINgAAADLINgAAADLINgAAADLINgAAADLINgAAADLINgAAADLINgAyDY2m01LlizxdBluWbt2rWw2m/78809PlwIgEwg2gMV069ZNNptNvXr1Svfa008/LZvNpm7dumXpPF944QVVq1YtS6ZVv3592Ww2LVy40Kl92rRpKlmyZJbMA4B1EWwAC4qIiNDChQv1119/OdouXbqkBQsWqESJEh6szDW+vr567rnnlJKS4ulSsszly5c9XQLwr0CwASyoRo0aKlGihBYvXuxoW7x4sSIiIlS9enWnYZOTk9W/f38VLlxYvr6+qlu3rjZv3ux4Pe3UzOrVq1WrVi35+/urTp06jl/znjt3rsaMGaOdO3fKZrPJZrNp7ty5jvF///13tWnTRv7+/ipbtqyWLl160/rbt2+vpKQkvf3229cdplu3bmrdurVT28CBA1W/fn3H8/r166tfv34aOHCggoODVaRIEc2aNUsXLlxQTEyMAgMDVbp0aa1YsSLd9L/77jtVrVpVvr6+uueee7R7926n19evX68HHnhAfn5+ioiIUP/+/XXhwgXH6yVLltS4cePUrVs3BQUFqUePHjddbgC3jmADWFRMTIzmzJnjeD579uwMfz16yJAh+uSTTzRv3jxt27ZNZcqUUZMmTXT69Gmn4UaOHKmXX35ZW7ZsUd68eR3Tevzxx/XMM8+oYsWKOnHihE6cOKHHH3/cMd6YMWP02GOPadeuXWrWrJk6duyYbtrXyp8/v0aMGKGxY8c6hYXMmDdvngoWLKhNmzapX79+6t27t9q2bas6depo27ZtatKkiTp37qyLFy86jTd48GBNmTJFmzdvVuHChdWyZUvHEaTdu3erSZMmeuSRR7Rr1y4tWrRI69atU9++fZ2mMXnyZFWqVElbt27V888/f0vLAcBFnv4VTgBZq2vXrqZVq1bm1KlTxm63m0OHDpnDhw8bX19fc+rUKdOqVSvHLySfP3/eeHt7mw8//NAx/uXLl014eLiZNGmSMeb/fjX4q6++cgyzfPlyI8n89ddfxhhjRo8ebapWrZquFknmueeeczw/f/68sdlsZsWKFdetP+2X1C9dumQiIyPN2LFjjTHGvPLKKyYyMjLdcv7TgAEDTL169ZymVbduXcfzK1eumICAANO5c2dH24kTJ4wks2HDBqflXbhwoWOYP/74w/j5+ZlFixYZY4zp3Lmzeeqpp5zm/e2335o8efI4+iQyMtK0bt36ussJIHvk9WiqApBtChYsqOjoaM2bN0/GGEVHR6tgwYJOw/z8889KSUlRVFSUo83b21u1a9fW/v37nYatUqWK4++iRYtKkhITE296zc4/xwsICFBgYKASExNvWr/dbtfYsWPVt29f9e7d+6bDuzJ/Ly8vhYaGqnLlyo62IkWKSFK6mu677z7H3yEhISpXrpyjT7Zu3aqDBw/qww8/dAxjjFFqaqoOHTqkChUqSJJq1aqV6boBZA7BBrCw7t27O06PvPHGG+leN8ZI+vu27Gvbr23z9vZ2/J32Wmpq6k1r+Od4aeO6Mp4kderUSVOmTNG4cePS3RGVJ08eR/1pMrrYOKP5Z3ZZ/jlsz5491b9//3TD/DPoBQQE3HSaALIW19gAFta0aVNdvnxZly9fVpMmTdK9XqZMGfn4+GjdunWOtpSUFG3ZssVx1MEVPj4+unr1apbU/E958uRRXFycZs6cqcOHDzu9VqhQIZ04ccKpbceOHVk2740bNzr+PnPmjA4cOKDy5ctL+vvi7L1796pMmTLpHj4+PllWAwD3EWwAC/Py8tL+/fu1f/9+eXl5pXs9ICBAvXv31uDBg7Vy5Urt27dPPXr00MWLF/XEE0+4PJ+SJUvq0KFD2rFjh37//XclJydn2TJER0frnnvu0VtvveXU/uCDD2rLli1677339NNPP2n06NHas2dPls137NixWr16tfbs2aNu3bqpYMGCjruwhg4dqg0bNqhPnz7asWOHfvrpJy1dulT9+vXLsvkDyByCDWBx+fPnV/78+a/7+oQJE/Sf//xHnTt3Vo0aNXTw4EGtWrVKwcHBLs/jP//5j5o2baoGDRqoUKFCWrBgQVaU7jBx4kRdunTJqa1JkyZ6/vnnNWTIEN199906d+6cunTpkmXznDBhggYMGKCaNWvqxIkTWrp0qeNoTJUqVZSQkKCffvpJ999/v6pXr67nn3/ece0RAM+xmWtPUgMAAORSHLEBAACWQbABAACWQbABAACWQbABAACWQbABAACWQbABAACWQbABAACWQbABAACWQbABAACWQbABAACWQbABAACWQbABAACW8f8AH1loFfwamLgAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "keanu = cast[cast['name'] == 'Keanu Reeves']\n", + "keanu_movies = pd.merge(keanu, release_dates, how='left', on='title')\n", + "keanu_movies_usa = keanu_movies[keanu_movies['country'] == 'USA']\n", + "keanu_movies_usa['month'] = keanu_movies_usa['date'].dt.month\n", + "keanu_month_df = pd.DataFrame(keanu_movies_usa.groupby('month')['title'].count())\n", + "plt.bar(x=keanu_month_df.index, height=keanu_month_df['title'])\n", + "plt.title('Number of Keanu Reeves movies release in the USA by month')\n", + "plt.xlabel('Month Number')\n", + "plt.ylabel('Number of Films')\n", + "plt.show()" + ] }, { "cell_type": "markdown", @@ -2855,6 +3270,35 @@ "### Section III - Q5: Make a bar plot showing the years in which movies with Ian McKellen tend to be released in the USA?" ] }, + { + "cell_type": "code", + "execution_count": 122, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "ian_df = cast[cast['name'] == 'Ian McKellen']\n", + "ian_merge = pd.merge(ian_df, release_dates, how='left', on='title')\n", + "ian_usa = ian_merge[ian_merge['country'] == 'USA']\n", + "ian_usa\n", + "ian_year_df = pd.DataFrame(ian_usa.groupby('year_y')['title'].count())\n", + "plt.bar(x=ian_year_df.index, height=ian_year_df['title'])\n", + "plt.title('Years when Ian Mckellen movies are released in the USA')\n", + "plt.xlabel('Years')\n", + "plt.ylabel('Number of Films')\n", + "plt.show()" + ] + }, { "cell_type": "code", "execution_count": null, From e9becc9b54b679e8ba9f4a3f97364b6a557155df Mon Sep 17 00:00:00 2001 From: Conor Smith Date: Tue, 1 Nov 2022 05:55:00 -0400 Subject: [PATCH 5/7] Completed Project after deleting previously --- ...Mini_Project_Wrangling_Json_Exercise.ipynb | 742 +++++++++++++++++- 1 file changed, 721 insertions(+), 21 deletions(-) diff --git a/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb b/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb index a8bfea9e..5ccdabb4 100755 --- a/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb +++ b/mec-5.4.4-json-data-wrangling-mini-project/Mini_Project_Wrangling_Json_Exercise.ipynb @@ -80,9 +80,7 @@ { "cell_type": "code", "execution_count": 7, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -148,9 +146,7 @@ { "cell_type": "code", "execution_count": 8, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -246,9 +242,7 @@ { "cell_type": "code", "execution_count": 9, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -433,9 +427,7 @@ { "cell_type": "code", "execution_count": 10, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -586,35 +578,743 @@ "3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in." ] }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sectorsupplementprojectflgprojectfinancialtypeprodlinemjthemeidacommamtimpagencyproject_namemjthemecodeclosingdate...majorsector_percentboard_approval_monththeme_namecodecountrynameurlsourceprojectstatusdisplayibrdcommamtsector_namecode_id
0[{'Name': 'Primary education'}, {'Name': 'Seco...NIDAPE[Human development]130000000MINISTRY OF EDUCATIONEthiopia General Education Quality Improvement...8,112018-07-07T00:00:00Z...[{'Percent': 46, 'Name': 'Education'}, {'Perce...November[{'code': '65', 'name': 'Education for all'}]Federal Democratic Republic of Ethiopiahttp://www.worldbank.org/projects/P129828/ethi...IBRDActive0[{'code': 'EP', 'name': 'Primary education'}, ...{'$oid': '52b213b38594d8a2be17c780'}
1[{'Name': 'Public administration- Other social...NOTHERRE[Economic management, Social protection and ri...0MINISTRY OF FINANCETN: DTF Social Protection Reforms Support1,6NaN...[{'Percent': 70, 'Name': 'Public Administratio...November[{'code': '24', 'name': 'Other economic manage...Republic of Tunisiahttp://www.worldbank.org/projects/P144674?lang=enIBRDActive0[{'code': 'BS', 'name': 'Public administration...{'$oid': '52b213b38594d8a2be17c781'}
2[{'Name': 'Rural and Inter-Urban Roads and Hig...YIDAPE[Trade and integration, Public sector governan...6060000MINISTRY OF TRANSPORT AND COMMUNICATIONSTuvalu Aviation Investment Project - Additiona...5,2,11,6NaN...[{'Percent': 100, 'Name': 'Transportation'}]November[{'code': '47', 'name': 'Regional integration'...Tuvaluhttp://www.worldbank.org/projects/P145310?lang=enIBRDActive0[{'code': 'TI', 'name': 'Rural and Inter-Urban...{'$oid': '52b213b38594d8a2be17c782'}
3[{'Name': 'Other social services'}]NOTHERRE[Social dev/gender/inclusion, Social dev/gende...0LABOR INTENSIVE PUBLIC WORKS PROJECT PMUGov't and Civil Society Organization Partnership7,7NaN...[{'Percent': 100, 'Name': 'Health and other so...October[{'code': '57', 'name': 'Participation and civ...Republic of Yemenhttp://www.worldbank.org/projects/P144665?lang=enIBRDActive0[{'code': 'JB', 'name': 'Other social services'}]{'$oid': '52b213b38594d8a2be17c783'}
4[{'Name': 'General industry and trade sector'}...NIDAPE[Trade and integration, Financial and private ...13100000MINISTRY OF TRADE AND INDUSTRYSecond Private Sector Competitiveness and Econ...5,42019-04-30T00:00:00Z...[{'Percent': 50, 'Name': 'Industry and trade'}...October[{'code': '45', 'name': 'Export development an...Kingdom of Lesothohttp://www.worldbank.org/projects/P144933/seco...IBRDActive0[{'code': 'YZ', 'name': 'General industry and ...{'$oid': '52b213b38594d8a2be17c784'}
\n", + "

5 rows × 50 columns

\n", + "
" + ], + "text/plain": [ + " sector supplementprojectflg \\\n", + "0 [{'Name': 'Primary education'}, {'Name': 'Seco... N \n", + "1 [{'Name': 'Public administration- Other social... N \n", + "2 [{'Name': 'Rural and Inter-Urban Roads and Hig... Y \n", + "3 [{'Name': 'Other social services'}] N \n", + "4 [{'Name': 'General industry and trade sector'}... N \n", + "\n", + " projectfinancialtype prodline \\\n", + "0 IDA PE \n", + "1 OTHER RE \n", + "2 IDA PE \n", + "3 OTHER RE \n", + "4 IDA PE \n", + "\n", + " mjtheme idacommamt \\\n", + "0 [Human development] 130000000 \n", + "1 [Economic management, Social protection and ri... 0 \n", + "2 [Trade and integration, Public sector governan... 6060000 \n", + "3 [Social dev/gender/inclusion, Social dev/gende... 0 \n", + "4 [Trade and integration, Financial and private ... 13100000 \n", + "\n", + " impagency \\\n", + "0 MINISTRY OF EDUCATION \n", + "1 MINISTRY OF FINANCE \n", + "2 MINISTRY OF TRANSPORT AND COMMUNICATIONS \n", + "3 LABOR INTENSIVE PUBLIC WORKS PROJECT PMU \n", + "4 MINISTRY OF TRADE AND INDUSTRY \n", + "\n", + " project_name mjthemecode \\\n", + "0 Ethiopia General Education Quality Improvement... 8,11 \n", + "1 TN: DTF Social Protection Reforms Support 1,6 \n", + "2 Tuvalu Aviation Investment Project - Additiona... 5,2,11,6 \n", + "3 Gov't and Civil Society Organization Partnership 7,7 \n", + "4 Second Private Sector Competitiveness and Econ... 5,4 \n", + "\n", + " closingdate ... \\\n", + "0 2018-07-07T00:00:00Z ... \n", + "1 NaN ... \n", + "2 NaN ... \n", + "3 NaN ... \n", + "4 2019-04-30T00:00:00Z ... \n", + "\n", + " majorsector_percent board_approval_month \\\n", + "0 [{'Percent': 46, 'Name': 'Education'}, {'Perce... November \n", + "1 [{'Percent': 70, 'Name': 'Public Administratio... November \n", + "2 [{'Percent': 100, 'Name': 'Transportation'}] November \n", + "3 [{'Percent': 100, 'Name': 'Health and other so... October \n", + "4 [{'Percent': 50, 'Name': 'Industry and trade'}... October \n", + "\n", + " theme_namecode \\\n", + "0 [{'code': '65', 'name': 'Education for all'}] \n", + "1 [{'code': '24', 'name': 'Other economic manage... \n", + "2 [{'code': '47', 'name': 'Regional integration'... \n", + "3 [{'code': '57', 'name': 'Participation and civ... \n", + "4 [{'code': '45', 'name': 'Export development an... \n", + "\n", + " countryname \\\n", + "0 Federal Democratic Republic of Ethiopia \n", + "1 Republic of Tunisia \n", + "2 Tuvalu \n", + "3 Republic of Yemen \n", + "4 Kingdom of Lesotho \n", + "\n", + " url source \\\n", + "0 http://www.worldbank.org/projects/P129828/ethi... IBRD \n", + "1 http://www.worldbank.org/projects/P144674?lang=en IBRD \n", + "2 http://www.worldbank.org/projects/P145310?lang=en IBRD \n", + "3 http://www.worldbank.org/projects/P144665?lang=en IBRD \n", + "4 http://www.worldbank.org/projects/P144933/seco... IBRD \n", + "\n", + " projectstatusdisplay ibrdcommamt \\\n", + "0 Active 0 \n", + "1 Active 0 \n", + "2 Active 0 \n", + "3 Active 0 \n", + "4 Active 0 \n", + "\n", + " sector_namecode \\\n", + "0 [{'code': 'EP', 'name': 'Primary education'}, ... \n", + "1 [{'code': 'BS', 'name': 'Public administration... \n", + "2 [{'code': 'TI', 'name': 'Rural and Inter-Urban... \n", + "3 [{'code': 'JB', 'name': 'Other social services'}] \n", + "4 [{'code': 'YZ', 'name': 'General industry and ... \n", + "\n", + " _id \n", + "0 {'$oid': '52b213b38594d8a2be17c780'} \n", + "1 {'$oid': '52b213b38594d8a2be17c781'} \n", + "2 {'$oid': '52b213b38594d8a2be17c782'} \n", + "3 {'$oid': '52b213b38594d8a2be17c783'} \n", + "4 {'$oid': '52b213b38594d8a2be17c784'} \n", + "\n", + "[5 rows x 50 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "import json\n", + "from pandas.io.json import json_normalize\n", + "\n", + "data = pd.read_json('data/world_bank_projects.json')\n", + "data.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'sector': [{'Name': 'Primary education'},\n", + " {'Name': 'Secondary education'},\n", + " {'Name': 'Public administration- Other social services'},\n", + " {'Name': 'Tertiary education'}],\n", + " 'supplementprojectflg': 'N',\n", + " 'projectfinancialtype': 'IDA',\n", + " 'prodline': 'PE',\n", + " 'mjtheme': ['Human development'],\n", + " 'idacommamt': 130000000,\n", + " 'impagency': 'MINISTRY OF EDUCATION',\n", + " 'project_name': 'Ethiopia General Education Quality Improvement Project II',\n", + " 'mjthemecode': '8,11',\n", + " 'closingdate': '2018-07-07T00:00:00Z',\n", + " 'totalcommamt': 130000000,\n", + " 'id': 'P129828',\n", + " 'mjsector_namecode': [{'code': 'EX', 'name': 'Education'},\n", + " {'code': 'EX', 'name': 'Education'},\n", + " {'code': 'BX', 'name': 'Public Administration, Law, and Justice'},\n", + " {'code': 'EX', 'name': 'Education'}],\n", + " 'docty': 'Project Information Document,Indigenous Peoples Plan,Project Information Document',\n", + " 'sector1': {'Percent': 46, 'Name': 'Primary education'},\n", + " 'lendinginstr': 'Investment Project Financing',\n", + " 'countrycode': 'ET',\n", + " 'sector2': {'Percent': 26, 'Name': 'Secondary education'},\n", + " 'totalamt': 130000000,\n", + " 'mjtheme_namecode': [{'code': '8', 'name': 'Human development'},\n", + " {'code': '11', 'name': ''}],\n", + " 'boardapprovaldate': '2013-11-12T00:00:00Z',\n", + " 'countryshortname': 'Ethiopia',\n", + " 'sector4': {'Percent': 12, 'Name': 'Tertiary education'},\n", + " 'prodlinetext': 'IBRD/IDA',\n", + " 'productlinetype': 'L',\n", + " 'regionname': 'Africa',\n", + " 'status': 'Active',\n", + " 'country_namecode': 'Federal Democratic Republic of Ethiopia!$!ET',\n", + " 'envassesmentcategorycode': 'C',\n", + " 'project_abstract': {'cdata': 'The development objective of the Second Phase of General Education Quality Improvement Project for Ethiopia is to improve learning conditions in primary and secondary schools and strengthen institutions at different levels of educational administration. The project has six components. The first component is curriculum, textbooks, assessment, examinations, and inspection. This component will support improvement of learning conditions in grades KG-12 by providing increased access to teaching and learning materials and through improvements to the curriculum by assessing the strengths and weaknesses of the current curriculum. This component has following four sub-components: (i) curriculum reform and implementation; (ii) teaching and learning materials; (iii) assessment and examinations; and (iv) inspection. The second component is teacher development program (TDP). This component will support improvements in learning conditions in both primary and secondary schools by advancing the quality of teaching in general education through: (a) enhancing the training of pre-service teachers in teacher education institutions; and (b) improving the quality of in-service teacher training. This component has following three sub-components: (i) pre-service teacher training; (ii) in-service teacher training; and (iii) licensing and relicensing of teachers and school leaders. The third component is school improvement plan. This component will support the strengthening of school planning in order to improve learning outcomes, and to partly fund the school improvement plans through school grants. It has following two sub-components: (i) school improvement plan; and (ii) school grants. The fourth component is management and capacity building, including education management information systems (EMIS). This component will support management and capacity building aspect of the project. This component has following three sub-components: (i) capacity building for education planning and management; (ii) capacity building for school planning and management; and (iii) EMIS. The fifth component is improving the quality of learning and teaching in secondary schools and universities through the use of information and communications technology (ICT). It has following five sub-components: (i) national policy and institution for ICT in general education; (ii) national ICT infrastructure improvement plan for general education; (iii) develop an integrated monitoring, evaluation, and learning system specifically for the ICT component; (iv) teacher professional development in the use of ICT; and (v) provision of limited number of e-Braille display readers with the possibility to scale up to all secondary education schools based on the successful implementation and usage of the readers. The sixth component is program coordination, monitoring and evaluation, and communication. It will support institutional strengthening by developing capacities in all aspects of program coordination, monitoring and evaluation; a new sub-component on communications will support information sharing for better management and accountability. It has following three sub-components: (i) program coordination; (ii) monitoring and evaluation (M and E); and (iii) communication.'},\n", + " 'approvalfy': 1999,\n", + " 'projectdocs': [{'DocDate': '28-AUG-2013',\n", + " 'EntityID': '090224b081e545fb_1_0',\n", + " 'DocURL': 'http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=090224b081e545fb_1_0',\n", + " 'DocType': 'PID',\n", + " 'DocTypeDesc': 'Project Information Document (PID), Vol.'},\n", + " {'DocDate': '01-JUL-2013',\n", + " 'EntityID': '000442464_20130920111729',\n", + " 'DocURL': 'http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=000442464_20130920111729',\n", + " 'DocType': 'IP',\n", + " 'DocTypeDesc': 'Indigenous Peoples Plan (IP), Vol.1 of 1'},\n", + " {'DocDate': '22-NOV-2012',\n", + " 'EntityID': '090224b0817b19e2_1_0',\n", + " 'DocURL': 'http://www-wds.worldbank.org/servlet/WDSServlet?pcont=details&eid=090224b0817b19e2_1_0',\n", + " 'DocType': 'PID',\n", + " 'DocTypeDesc': 'Project Information Document (PID), Vol.'}],\n", + " 'lendprojectcost': 550000000,\n", + " 'lendinginstrtype': 'IN',\n", + " 'theme1': {'Percent': 100, 'Name': 'Education for all'},\n", + " 'grantamt': 0,\n", + " 'themecode': '65',\n", + " 'borrower': 'FEDERAL DEMOCRATIC REPUBLIC OF ETHIOPIA',\n", + " 'sectorcode': 'ET,BS,ES,EP',\n", + " 'sector3': {'Percent': 16,\n", + " 'Name': 'Public administration- Other social services'},\n", + " 'majorsector_percent': [{'Percent': 46, 'Name': 'Education'},\n", + " {'Percent': 26, 'Name': 'Education'},\n", + " {'Percent': 16, 'Name': 'Public Administration, Law, and Justice'},\n", + " {'Percent': 12, 'Name': 'Education'}],\n", + " 'board_approval_month': 'November',\n", + " 'theme_namecode': [{'code': '65', 'name': 'Education for all'}],\n", + " 'countryname': 'Federal Democratic Republic of Ethiopia',\n", + " 'url': 'http://www.worldbank.org/projects/P129828/ethiopia-general-education-quality-improvement-project-ii?lang=en',\n", + " 'source': 'IBRD',\n", + " 'projectstatusdisplay': 'Active',\n", + " 'ibrdcommamt': 0,\n", + " 'sector_namecode': [{'code': 'EP', 'name': 'Primary education'},\n", + " {'code': 'ES', 'name': 'Secondary education'},\n", + " {'code': 'BS', 'name': 'Public administration- Other social services'},\n", + " {'code': 'ET', 'name': 'Tertiary education'}],\n", + " '_id': {'$oid': '52b213b38594d8a2be17c780'}}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "raw_data = json.load((open('data/world_bank_projects.json')))\n", + "raw_data[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Task 1: Find Top 10 Countries with most projects" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "countryname\n", + "People's Republic of China 19\n", + "Republic of Indonesia 19\n", + "Socialist Republic of Vietnam 17\n", + "Republic of India 16\n", + "Republic of Yemen 13\n", + "People's Republic of Bangladesh 12\n", + "Nepal 12\n", + "Kingdom of Morocco 12\n", + "Republic of Mozambique 11\n", + "Africa 11\n", + "Name: project_name, dtype: int64" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Top 10 countries with most projects\n", + "data.groupby('countryname')['project_name'].count().sort_values(ascending=False).head(10" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Task2: Find the top 10 major project themes (using column 'mjtheme_namecode')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
codename
08Human development
111
21Economic management
36Social protection and risk management
45Trade and integration
.........
149410Rural development
14959Urban development
14968Human development
14975Trade and integration
14984Financial and private sector development
\n", + "

1499 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " code name\n", + "0 8 Human development\n", + "1 11 \n", + "2 1 Economic management\n", + "3 6 Social protection and risk management\n", + "4 5 Trade and integration\n", + "... ... ...\n", + "1494 10 Rural development\n", + "1495 9 Urban development\n", + "1496 8 Human development\n", + "1497 5 Trade and integration\n", + "1498 4 Financial and private sector development\n", + "\n", + "[1499 rows x 2 columns]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mjtheme = pd.json_normalize(raw_data, 'mjtheme_namecode')\n", + "mjtheme" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'8': 'Human development',\n", + " '1': 'Economic management',\n", + " '6': 'Social protection and risk management',\n", + " '5': 'Trade and integration',\n", + " '2': 'Public sector governance',\n", + " '11': 'Environment and natural resources management',\n", + " '7': 'Social dev/gender/inclusion',\n", + " '4': 'Financial and private sector development',\n", + " '10': 'Rural development',\n", + " '9': 'Urban development',\n", + " '3': 'Rule of law'}" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Creating dictionary to reference values\n", + "mjtheme_subset = mjtheme[mjtheme['name'] != '']\n", + "project_dict = {}\n", + "mjtheme['code'][0], mjtheme['name'][0]\n", + "for i in range(len(mjtheme['code'])):\n", + " if mjtheme['name'][i] != '':\n", + " project_dict[mjtheme['code'][i]] = mjtheme['name'][i]\n", + " i += 1\n", + "project_dict" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Human development\n", + "1 \n", + "2 Economic management\n", + "3 Social protection and risk management\n", + "4 Trade and integration\n", + " ... \n", + "1494 Rural development\n", + "1495 Urban development\n", + "1496 Human development\n", + "1497 Trade and integration\n", + "1498 Financial and private sector development\n", + "Name: name, Length: 1499, dtype: object" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mjtheme['name']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. In 2. above you will notice that some entries have only the code and the name is missing. Create a dataframe with the missing names filled in." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 Human development\n", + "1 Environment and natural resources management\n", + "2 Economic management\n", + "3 Social protection and risk management\n", + "4 Trade and integration\n", + " ... \n", + "1494 Rural development\n", + "1495 Urban development\n", + "1496 Human development\n", + "1497 Trade and integration\n", + "1498 Financial and private sector development\n", + "Name: name, Length: 1499, dtype: object" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#adding values back in with reference to dicitonary of values\n", + "for i in range(len(mjtheme['name'])):\n", + " if mjtheme['name'][i] == '':\n", + " mjtheme['name'][i] = project_dict[mjtheme['code'][i]]\n", + "mjtheme['name']" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mjtheme['name'].isna().any()" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "name\n", + "Environment and natural resources management 250\n", + "Rural development 216\n", + "Human development 210\n", + "Public sector governance 199\n", + "Social protection and risk management 168\n", + "Financial and private sector development 146\n", + "Social dev/gender/inclusion 130\n", + "Trade and integration 77\n", + "Urban development 50\n", + "Economic management 38\n", + "Name: code, dtype: int64" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Top Projects below\n", + "top_projects = mjtheme.groupby('name')['code'].count().sort_values(ascending=False).head(10)\n", + "top_projects" + ] + }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": true - }, + "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 2", + "display_name": "Python 3 (ipykernel)", "language": "python", - "name": "python2" + "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.9" + "pygments_lexer": "ipython3", + "version": "3.10.6" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 } From 68c9d677055ef00166d7046408d473b463dd971f Mon Sep 17 00:00:00 2001 From: Conor Smith Date: Tue, 8 Nov 2022 06:30:52 -0500 Subject: [PATCH 6/7] Completed Databricks --- Mini_Project_SQL_with_Spark 5.6.ipynb | 1 + 1 file changed, 1 insertion(+) create mode 100644 Mini_Project_SQL_with_Spark 5.6.ipynb diff --git a/Mini_Project_SQL_with_Spark 5.6.ipynb b/Mini_Project_SQL_with_Spark 5.6.ipynb new file mode 100644 index 00000000..5369b254 --- /dev/null +++ b/Mini_Project_SQL_with_Spark 5.6.ipynb @@ -0,0 +1 @@ +{"cells":[{"cell_type":"markdown","source":["## SQL at Scale with Spark SQL\n\nWelcome to the SQL mini project. For this project, you will use the Databricks Platform and work through a series of exercises using Spark SQL. The dataset size may not be too big but the intent here is to familiarize yourself with the Spark SQL interface which scales easily to huge datasets, without you having to worry about changing your SQL queries. \n\nThe data you need is present in the mini-project folder in the form of three CSV files. This data will be imported in Databricks to create the following tables under the __`country_club`__ database.\n\n
\n1. The __`bookings`__ table,\n2. The __`facilities`__ table, and\n3. The __`members`__ table.\n\nYou will be uploading these datasets shortly into the Databricks platform to understand how to create a database within minutes! Once the database and the tables are populated, you will be focusing on the mini-project questions.\n\nIn the mini project, you'll be asked a series of questions. You can solve them using the databricks platform, but for the final deliverable,\nplease download this notebook as an IPython notebook (__`File -> Export -> IPython Notebook`__) and upload it to your GitHub."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"7dc8cef6-8322-4e3a-950b-757de959bbd7","inputWidgets":{},"title":""}}},{"cell_type":"markdown","source":["### Creating the Database\n\nWe will first create our database in which we will be creating our three tables of interest"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"3bd664ca-d7cc-4b4d-9c35-9957dd665c78","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql \ndrop database if exists country_club cascade;\ncreate database country_club;\nshow databases;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"98ba3faa-c4e8-48ef-9cc8-e2226e31582d","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club"],["default"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"databaseName","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
databaseName
country_club
default
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Creating the Tables\n\nIn this section, we will be creating the three tables of interest and populate them with the data from the CSV files already available to you. \nTo get started, first upload the three CSV files to the DBFS as depicted in the following figure\n\n![](https://i.imgur.com/QcCruBr.png)\n\n\nOnce you have done this, please remember to execute the following code to build the dataframes which will be saved as tables in our database"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"89fe2dd6-f130-4979-abff-3cfd7eefc14f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["# File location and type\nfile_location_bookings = \"/FileStore/tables/Bookings.csv\"\nfile_location_facilities = \"/FileStore/tables/Facilities.csv\"\nfile_location_members = \"/FileStore/tables/Members.csv\"\n\nfile_type = \"csv\"\n\n# CSV options\ninfer_schema = \"true\"\nfirst_row_is_header = \"true\"\ndelimiter = \",\"\n\n# The applied options are for CSV files. For other file types, these will be ignored.\nbookings_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_bookings))\n\nfacilities_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_facilities))\n\nmembers_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_members))"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"936f355f-a485-4d3c-9a04-87bb55965d65","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Viewing the dataframe schemas\n\nWe can take a look at the schemas of our potential tables to be written to our database soon"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"f10ed1f5-65a6-4bc3-a902-606102a12222","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["print('Bookings Schema')\nbookings_df.printSchema()\nprint('Facilities Schema')\nfacilities_df.printSchema()\nprint('Members Schema')\nmembers_df.printSchema()"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"45dd3bb9-3cc9-415b-a0a7-891c8a0ade8c","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Create permanent tables\nWe will be creating three permanent tables here in our __`country_club`__ database as we discussed previously with the following code"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"8766081c-ff5f-4bfa-870c-dcb7f9d1698c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["permanent_table_name_bookings = \"country_club.Bookings1\"\nbookings_df.write.format(\"parquet\").saveAsTable(permanent_table_name_bookings)\n\npermanent_table_name_facilities = \"country_club.Facilities1\"\nfacilities_df.write.format(\"parquet\").saveAsTable(permanent_table_name_facilities)\n\npermanent_table_name_members = \"country_club.Members1\"\nmembers_df.write.format(\"parquet\").saveAsTable(permanent_table_name_members)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a989021a-29b8-4159-9a8d-5f3a707379e3","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Refresh tables and check them"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a8d01df0-94bc-4097-845e-02e7e1637e4f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nuse country_club;\nREFRESH table bookings1;\nREFRESH table facilities1;\nREFRESH table members1;\nshow tables;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"11600185-2386-4341-89cb-66d50b9a29ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club","bookings1",false],["country_club","facilities1",false],["country_club","members1",false]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"database","type":"\"string\"","metadata":"{}"},{"name":"tableName","type":"\"string\"","metadata":"{}"},{"name":"isTemporary","type":"\"boolean\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
databasetableNameisTemporary
country_clubbookings1false
country_clubfacilities1false
country_clubmembers1false
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Test a sample SQL query\n\n__Note:__ You can use __`%sql`__ at the beginning of a cell and write SQL queries directly as seen in the following cell. Neat isn't it!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"fdae66bd-5e7a-48f5-b715-ac4f761050ae","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nselect * from bookings1 limit 3"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"4339b5ab-b006-4458-aad5-b1f0a5c1ec87","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,3,1,"2012-07-03T11:00:00.000+0000",2],[1,4,1,"2012-07-03T08:00:00.000+0000",2],[2,6,0,"2012-07-03T18:00:00.000+0000",2]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"bookid","type":"\"integer\"","metadata":"{}"},{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"memid","type":"\"integer\"","metadata":"{}"},{"name":"starttime","type":"\"timestamp\"","metadata":"{}"},{"name":"slots","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
bookidfacidmemidstarttimeslots
0312012-07-03T11:00:00.000+00002
1412012-07-03T08:00:00.000+00002
2602012-07-03T18:00:00.000+00002
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q1: Some of the facilities charge a fee to members, but some do not. Please list the names of the facilities that do."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"17c520af-243e-4a39-8a24-ea6aa3b6a368","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name FROM facilities1 WHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"21f137b3-edf7-4c65-853a-42b836fa3481","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Badminton Court"],["Table Tennis"],["Snooker Table"],["Pool Table"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
name
Badminton Court
Table Tennis
Snooker Table
Pool Table
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q2: How many facilities do not charge a fee to members?"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"79bf7b92-87d8-4efb-ba7d-f0edcb59cc4b","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT COUNT(*) AS Count FROM facilities1 WHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"0b10a941-41e4-4145-b853-801859a6bfa5","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[4]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Count","type":"\"long\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Count
4
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q3: How can you produce a list of facilities that charge a fee to members, where the fee is less than 20% of the facility's monthly maintenance cost? \n#### Return the facid, facility name, member cost, and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"bc6cd845-0be6-4c95-ade4-7d52c3a13cc8","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facid, name, membercost, monthlymaintenance FROM facilities1 WHERE (membercost > 0) AND (membercost < monthlymaintenance * .2)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"d35a57f8-07ea-42dc-9f4f-53694daefff1","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,"Tennis Court 1",5.0,200],[1,"Tennis Court 2",5.0,200],[4,"Massage Room 1",9.9,3000],[5,"Massage Room 2",9.9,3000],[6,"Squash Court",3.5,80]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
facidnamemembercostmonthlymaintenance
0Tennis Court 15.0200
1Tennis Court 25.0200
4Massage Room 19.93000
5Massage Room 29.93000
6Squash Court3.580
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q4: How can you retrieve the details of facilities with ID 1 and 5? Write the query without using the OR operator."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"9bc31a3f-ab2c-413c-9b99-46581023ae0c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT * FROM facilities1 WHERE facid IN (1, 5)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"cb034b10-5840-43e9-a25a-62503daa7c09","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[1,"Tennis Court 2",5.0,25.0,8000,200],[5,"Massage Room 2",9.9,80.0,4000,3000]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"guestcost","type":"\"double\"","metadata":"{}"},{"name":"initialoutlay","type":"\"integer\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
facidnamemembercostguestcostinitialoutlaymonthlymaintenance
1Tennis Court 25.025.08000200
5Massage Room 29.980.040003000
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q5: How can you produce a list of facilities, with each labelled as 'cheap' or 'expensive', depending on if their monthly maintenance cost is more than $100? \n#### Return the name and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"0e0302a2-2911-41be-9599-e12323e7f23c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name, monthlymaintenance, CASE WHEN monthlymaintenance > 100 THEN \"expensive\" ELSE \"cheap\" END AS value FROM facilities1;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"41373ea4-9038-4c8f-842f-8aae7b074809","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 1",200,"expensive"],["Tennis Court 2",200,"expensive"],["Badminton Court",50,"cheap"],["Table Tennis",10,"cheap"],["Massage Room 1",3000,"expensive"],["Massage Room 2",3000,"expensive"],["Squash Court",80,"cheap"],["Snooker Table",15,"cheap"],["Pool Table",15,"cheap"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"},{"name":"value","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
namemonthlymaintenancevalue
Tennis Court 1200expensive
Tennis Court 2200expensive
Badminton Court50cheap
Table Tennis10cheap
Massage Room 13000expensive
Massage Room 23000expensive
Squash Court80cheap
Snooker Table15cheap
Pool Table15cheap
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q6: You'd like to get the first and last name of the last member(s) who signed up. Do not use the LIMIT clause for your solution."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"30f9e29f-9608-4c5d-a371-fc4cb22f9ea2","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT firstname, surname FROM members1 WHERE joindate in (SELECT MAX(joindate) FROM members1)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"74bf3f5b-924d-4d90-b978-ffd456c22f43","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Darren","Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"firstname","type":"\"string\"","metadata":"{}"},{"name":"surname","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
firstnamesurname
DarrenSmith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q7: How can you produce a list of all members who have used a tennis court?\n- Include in your output the name of the court, and the name of the member formatted as a single column. \n- Ensure no duplicate data\n- Also order by the member name."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"ded40971-9804-46e8-a647-5b9cefce363e","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT DISTINCT facilities1.name AS Court_Name, CONCAT(members1.firstname, \" \", members1.surname) AS Member_Name FROM ((bookings1 INNER JOIN members1 ON bookings1.memid = members1.memid) INNER JOIN facilities1 ON bookings1.facid = facilities1.facid) WHERE facilities1.name LIKE \"Tennis Court%\" ORDER BY Member_Name;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"879ff42d-7f1d-47e6-a828-5cd82775c0ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 2","Anne Baker"],["Tennis Court 1","Anne Baker"],["Tennis Court 2","Burton Tracy"],["Tennis Court 1","Burton Tracy"],["Tennis Court 1","Charles Owen"],["Tennis Court 2","Charles Owen"],["Tennis Court 2","Darren Smith"],["Tennis Court 2","David Farrell"],["Tennis Court 1","David Farrell"],["Tennis Court 2","David Jones"],["Tennis Court 1","David Jones"],["Tennis Court 1","David Pinker"],["Tennis Court 1","Douglas Jones"],["Tennis Court 1","Erica Crumpet"],["Tennis Court 1","Florence Bader"],["Tennis Court 2","Florence Bader"],["Tennis Court 1","GUEST GUEST"],["Tennis Court 2","GUEST GUEST"],["Tennis Court 2","Gerald Butters"],["Tennis Court 1","Gerald Butters"],["Tennis Court 2","Henrietta Rumney"],["Tennis Court 1","Jack Smith"],["Tennis Court 2","Jack Smith"],["Tennis Court 2","Janice Joplette"],["Tennis Court 1","Janice Joplette"],["Tennis Court 2","Jemima Farrell"],["Tennis Court 1","Jemima Farrell"],["Tennis Court 1","Joan Coplin"],["Tennis Court 1","John Hunt"],["Tennis Court 2","John Hunt"],["Tennis Court 1","Matthew Genting"],["Tennis Court 2","Millicent Purview"],["Tennis Court 2","Nancy Dare"],["Tennis Court 1","Nancy Dare"],["Tennis Court 1","Ponder Stibbons"],["Tennis Court 2","Ponder Stibbons"],["Tennis Court 1","Ramnaresh Sarwin"],["Tennis Court 2","Ramnaresh Sarwin"],["Tennis Court 1","Tim Boothe"],["Tennis Court 2","Tim Boothe"],["Tennis Court 2","Tim Rownam"],["Tennis Court 1","Tim Rownam"],["Tennis Court 2","Timothy Baker"],["Tennis Court 1","Timothy Baker"],["Tennis Court 2","Tracy Smith"],["Tennis Court 1","Tracy Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Court_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Court_NameMember_Name
Tennis Court 2Anne Baker
Tennis Court 1Anne Baker
Tennis Court 2Burton Tracy
Tennis Court 1Burton Tracy
Tennis Court 1Charles Owen
Tennis Court 2Charles Owen
Tennis Court 2Darren Smith
Tennis Court 2David Farrell
Tennis Court 1David Farrell
Tennis Court 2David Jones
Tennis Court 1David Jones
Tennis Court 1David Pinker
Tennis Court 1Douglas Jones
Tennis Court 1Erica Crumpet
Tennis Court 1Florence Bader
Tennis Court 2Florence Bader
Tennis Court 1GUEST GUEST
Tennis Court 2GUEST GUEST
Tennis Court 2Gerald Butters
Tennis Court 1Gerald Butters
Tennis Court 2Henrietta Rumney
Tennis Court 1Jack Smith
Tennis Court 2Jack Smith
Tennis Court 2Janice Joplette
Tennis Court 1Janice Joplette
Tennis Court 2Jemima Farrell
Tennis Court 1Jemima Farrell
Tennis Court 1Joan Coplin
Tennis Court 1John Hunt
Tennis Court 2John Hunt
Tennis Court 1Matthew Genting
Tennis Court 2Millicent Purview
Tennis Court 2Nancy Dare
Tennis Court 1Nancy Dare
Tennis Court 1Ponder Stibbons
Tennis Court 2Ponder Stibbons
Tennis Court 1Ramnaresh Sarwin
Tennis Court 2Ramnaresh Sarwin
Tennis Court 1Tim Boothe
Tennis Court 2Tim Boothe
Tennis Court 2Tim Rownam
Tennis Court 1Tim Rownam
Tennis Court 2Timothy Baker
Tennis Court 1Timothy Baker
Tennis Court 2Tracy Smith
Tennis Court 1Tracy Smith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q8: How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? \n\n- Remember that guests have different costs to members (the listed costs are per half-hour 'slot')\n- The guest user's ID is always 0. \n\n#### Include in your output the name of the facility, the name of the member formatted as a single column, and the cost.\n\n- Order by descending cost, and do not use any subqueries."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"eb23ed45-ca1c-46b3-9371-ccf3d2904fb9","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name, CONCAT(members1.firstname, \" \", members1.surname) AS Member_Name, CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END AS Total_Cost FROM ((bookings1 INNER JOIN facilities1 ON bookings1.facid = facilities1.facid) INNER JOIN members1 ON bookings1.memid = members1.memid) WHERE bookings1.starttime LIKE \"2012-09-14%\" AND CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots > 30 ELSE facilities1.membercost * bookings1.slots > 30 END ORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"3ec2175c-8f0f-45fd-ae9a-414a6fb3ce28","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Facility_NameMember_NameTotal_Cost
Massage Room 2GUEST GUEST320.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Tennis Court 2GUEST GUEST150.0
Tennis Court 2GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Squash CourtGUEST GUEST70.0
Massage Room 1Jemima Farrell39.6
Squash CourtGUEST GUEST35.0
Squash CourtGUEST GUEST35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q9: This time, produce the same result as in Q8, but using a subquery."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"757c6468-3d07-42e2-b2b9-59e82b96350a","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name, CONCAT(members1.firstname, \" \", members1.surname) AS Member_Name, CASE WHEN booking.memid = 0 THEN facilities1.guestcost * booking.slots ELSE facilities1.membercost * booking.slots END AS Total_Cost FROM (((SELECT * FROM bookings1 WHERE starttime LIKE \"2012-09-14%\") AS booking INNER JOIN facilities1 ON booking.facid = facilities1.facid) INNER JOIN members1 ON booking.memid = members1.memid) WHERE CASE WHEN booking.memid = 0 THEN facilities1.guestcost * booking.slots > 30 ELSE facilities1.membercost * booking.slots > 30 END ORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"72f9d8b6-2d51-4af1-9fa1-d183a0369d30","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Facility_NameMember_NameTotal_Cost
Massage Room 2GUEST GUEST320.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Tennis Court 2GUEST GUEST150.0
Tennis Court 2GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Squash CourtGUEST GUEST70.0
Massage Room 1Jemima Farrell39.6
Squash CourtGUEST GUEST35.0
Squash CourtGUEST GUEST35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q10: Produce a list of facilities with a total revenue less than 1000.\n- The output should have facility name and total revenue, sorted by revenue. \n- Remember that there's a different cost for guests and members!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"dc14e8de-3daa-4339-b78c-2a8d78e599d1","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name, SUM(CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END) AS Total_Revenue FROM ((bookings1 INNER JOIN facilities1 ON bookings1.facid = facilities1.facid) INNER JOIN members1 ON bookings1.memid = members1.memid) GROUP BY facilities1.name HAVING SUM(CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END) < 1000 ORDER BY Total_Revenue;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"53422808-236b-4ebd-af5f-abc9c1bb70de","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Table Tennis",180.0],["Snooker Table",240.0],["Pool Table",270.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"Total_Revenue","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
nameTotal_Revenue
Table Tennis180.0
Snooker Table240.0
Pool Table270.0
"]}}],"execution_count":0},{"cell_type":"code","source":[""],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"1ff7e759-05a7-4e6f-8e4f-9cc05a74316c","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0}],"metadata":{"name":"Mini_Project_SQL_with_Spark","notebookId":1931807081501742,"application/vnd.databricks.v1+notebook":{"notebookName":"Mini_Project_SQL_with_Spark","dashboards":[],"notebookMetadata":{"pythonIndentUnit":4,"mostRecentlyExecutedCommandWithImplicitDF":{"commandId":551598812990966,"dataframes":["_sqldf"]}},"language":"python","widgets":{},"notebookOrigID":551598812990935}},"nbformat":4,"nbformat_minor":0} From 7f1076b00b784427384403f4783638f07f96129a Mon Sep 17 00:00:00 2001 From: Conor Smith Date: Tue, 8 Nov 2022 06:43:23 -0500 Subject: [PATCH 7/7] Reformatted --- Mini_Project_SQL_with_Spark 5.6.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Mini_Project_SQL_with_Spark 5.6.ipynb b/Mini_Project_SQL_with_Spark 5.6.ipynb index 5369b254..e1bd71e6 100644 --- a/Mini_Project_SQL_with_Spark 5.6.ipynb +++ b/Mini_Project_SQL_with_Spark 5.6.ipynb @@ -1 +1 @@ -{"cells":[{"cell_type":"markdown","source":["## SQL at Scale with Spark SQL\n\nWelcome to the SQL mini project. For this project, you will use the Databricks Platform and work through a series of exercises using Spark SQL. The dataset size may not be too big but the intent here is to familiarize yourself with the Spark SQL interface which scales easily to huge datasets, without you having to worry about changing your SQL queries. \n\nThe data you need is present in the mini-project folder in the form of three CSV files. This data will be imported in Databricks to create the following tables under the __`country_club`__ database.\n\n
\n1. The __`bookings`__ table,\n2. The __`facilities`__ table, and\n3. The __`members`__ table.\n\nYou will be uploading these datasets shortly into the Databricks platform to understand how to create a database within minutes! Once the database and the tables are populated, you will be focusing on the mini-project questions.\n\nIn the mini project, you'll be asked a series of questions. You can solve them using the databricks platform, but for the final deliverable,\nplease download this notebook as an IPython notebook (__`File -> Export -> IPython Notebook`__) and upload it to your GitHub."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"7dc8cef6-8322-4e3a-950b-757de959bbd7","inputWidgets":{},"title":""}}},{"cell_type":"markdown","source":["### Creating the Database\n\nWe will first create our database in which we will be creating our three tables of interest"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"3bd664ca-d7cc-4b4d-9c35-9957dd665c78","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql \ndrop database if exists country_club cascade;\ncreate database country_club;\nshow databases;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"98ba3faa-c4e8-48ef-9cc8-e2226e31582d","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club"],["default"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"databaseName","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
databaseName
country_club
default
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Creating the Tables\n\nIn this section, we will be creating the three tables of interest and populate them with the data from the CSV files already available to you. \nTo get started, first upload the three CSV files to the DBFS as depicted in the following figure\n\n![](https://i.imgur.com/QcCruBr.png)\n\n\nOnce you have done this, please remember to execute the following code to build the dataframes which will be saved as tables in our database"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"89fe2dd6-f130-4979-abff-3cfd7eefc14f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["# File location and type\nfile_location_bookings = \"/FileStore/tables/Bookings.csv\"\nfile_location_facilities = \"/FileStore/tables/Facilities.csv\"\nfile_location_members = \"/FileStore/tables/Members.csv\"\n\nfile_type = \"csv\"\n\n# CSV options\ninfer_schema = \"true\"\nfirst_row_is_header = \"true\"\ndelimiter = \",\"\n\n# The applied options are for CSV files. For other file types, these will be ignored.\nbookings_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_bookings))\n\nfacilities_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_facilities))\n\nmembers_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_members))"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"936f355f-a485-4d3c-9a04-87bb55965d65","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Viewing the dataframe schemas\n\nWe can take a look at the schemas of our potential tables to be written to our database soon"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"f10ed1f5-65a6-4bc3-a902-606102a12222","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["print('Bookings Schema')\nbookings_df.printSchema()\nprint('Facilities Schema')\nfacilities_df.printSchema()\nprint('Members Schema')\nmembers_df.printSchema()"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"45dd3bb9-3cc9-415b-a0a7-891c8a0ade8c","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Create permanent tables\nWe will be creating three permanent tables here in our __`country_club`__ database as we discussed previously with the following code"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"8766081c-ff5f-4bfa-870c-dcb7f9d1698c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["permanent_table_name_bookings = \"country_club.Bookings1\"\nbookings_df.write.format(\"parquet\").saveAsTable(permanent_table_name_bookings)\n\npermanent_table_name_facilities = \"country_club.Facilities1\"\nfacilities_df.write.format(\"parquet\").saveAsTable(permanent_table_name_facilities)\n\npermanent_table_name_members = \"country_club.Members1\"\nmembers_df.write.format(\"parquet\").saveAsTable(permanent_table_name_members)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a989021a-29b8-4159-9a8d-5f3a707379e3","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Refresh tables and check them"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a8d01df0-94bc-4097-845e-02e7e1637e4f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nuse country_club;\nREFRESH table bookings1;\nREFRESH table facilities1;\nREFRESH table members1;\nshow tables;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"11600185-2386-4341-89cb-66d50b9a29ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club","bookings1",false],["country_club","facilities1",false],["country_club","members1",false]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"database","type":"\"string\"","metadata":"{}"},{"name":"tableName","type":"\"string\"","metadata":"{}"},{"name":"isTemporary","type":"\"boolean\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
databasetableNameisTemporary
country_clubbookings1false
country_clubfacilities1false
country_clubmembers1false
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Test a sample SQL query\n\n__Note:__ You can use __`%sql`__ at the beginning of a cell and write SQL queries directly as seen in the following cell. Neat isn't it!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"fdae66bd-5e7a-48f5-b715-ac4f761050ae","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nselect * from bookings1 limit 3"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"4339b5ab-b006-4458-aad5-b1f0a5c1ec87","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,3,1,"2012-07-03T11:00:00.000+0000",2],[1,4,1,"2012-07-03T08:00:00.000+0000",2],[2,6,0,"2012-07-03T18:00:00.000+0000",2]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"bookid","type":"\"integer\"","metadata":"{}"},{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"memid","type":"\"integer\"","metadata":"{}"},{"name":"starttime","type":"\"timestamp\"","metadata":"{}"},{"name":"slots","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
bookidfacidmemidstarttimeslots
0312012-07-03T11:00:00.000+00002
1412012-07-03T08:00:00.000+00002
2602012-07-03T18:00:00.000+00002
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q1: Some of the facilities charge a fee to members, but some do not. Please list the names of the facilities that do."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"17c520af-243e-4a39-8a24-ea6aa3b6a368","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name FROM facilities1 WHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"21f137b3-edf7-4c65-853a-42b836fa3481","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Badminton Court"],["Table Tennis"],["Snooker Table"],["Pool Table"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
name
Badminton Court
Table Tennis
Snooker Table
Pool Table
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q2: How many facilities do not charge a fee to members?"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"79bf7b92-87d8-4efb-ba7d-f0edcb59cc4b","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT COUNT(*) AS Count FROM facilities1 WHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"0b10a941-41e4-4145-b853-801859a6bfa5","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[4]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Count","type":"\"long\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Count
4
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q3: How can you produce a list of facilities that charge a fee to members, where the fee is less than 20% of the facility's monthly maintenance cost? \n#### Return the facid, facility name, member cost, and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"bc6cd845-0be6-4c95-ade4-7d52c3a13cc8","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facid, name, membercost, monthlymaintenance FROM facilities1 WHERE (membercost > 0) AND (membercost < monthlymaintenance * .2)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"d35a57f8-07ea-42dc-9f4f-53694daefff1","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,"Tennis Court 1",5.0,200],[1,"Tennis Court 2",5.0,200],[4,"Massage Room 1",9.9,3000],[5,"Massage Room 2",9.9,3000],[6,"Squash Court",3.5,80]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
facidnamemembercostmonthlymaintenance
0Tennis Court 15.0200
1Tennis Court 25.0200
4Massage Room 19.93000
5Massage Room 29.93000
6Squash Court3.580
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q4: How can you retrieve the details of facilities with ID 1 and 5? Write the query without using the OR operator."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"9bc31a3f-ab2c-413c-9b99-46581023ae0c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT * FROM facilities1 WHERE facid IN (1, 5)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"cb034b10-5840-43e9-a25a-62503daa7c09","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[1,"Tennis Court 2",5.0,25.0,8000,200],[5,"Massage Room 2",9.9,80.0,4000,3000]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"guestcost","type":"\"double\"","metadata":"{}"},{"name":"initialoutlay","type":"\"integer\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
facidnamemembercostguestcostinitialoutlaymonthlymaintenance
1Tennis Court 25.025.08000200
5Massage Room 29.980.040003000
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q5: How can you produce a list of facilities, with each labelled as 'cheap' or 'expensive', depending on if their monthly maintenance cost is more than $100? \n#### Return the name and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"0e0302a2-2911-41be-9599-e12323e7f23c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name, monthlymaintenance, CASE WHEN monthlymaintenance > 100 THEN \"expensive\" ELSE \"cheap\" END AS value FROM facilities1;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"41373ea4-9038-4c8f-842f-8aae7b074809","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 1",200,"expensive"],["Tennis Court 2",200,"expensive"],["Badminton Court",50,"cheap"],["Table Tennis",10,"cheap"],["Massage Room 1",3000,"expensive"],["Massage Room 2",3000,"expensive"],["Squash Court",80,"cheap"],["Snooker Table",15,"cheap"],["Pool Table",15,"cheap"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"},{"name":"value","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
namemonthlymaintenancevalue
Tennis Court 1200expensive
Tennis Court 2200expensive
Badminton Court50cheap
Table Tennis10cheap
Massage Room 13000expensive
Massage Room 23000expensive
Squash Court80cheap
Snooker Table15cheap
Pool Table15cheap
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q6: You'd like to get the first and last name of the last member(s) who signed up. Do not use the LIMIT clause for your solution."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"30f9e29f-9608-4c5d-a371-fc4cb22f9ea2","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT firstname, surname FROM members1 WHERE joindate in (SELECT MAX(joindate) FROM members1)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"74bf3f5b-924d-4d90-b978-ffd456c22f43","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Darren","Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"firstname","type":"\"string\"","metadata":"{}"},{"name":"surname","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
firstnamesurname
DarrenSmith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q7: How can you produce a list of all members who have used a tennis court?\n- Include in your output the name of the court, and the name of the member formatted as a single column. \n- Ensure no duplicate data\n- Also order by the member name."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"ded40971-9804-46e8-a647-5b9cefce363e","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT DISTINCT facilities1.name AS Court_Name, CONCAT(members1.firstname, \" \", members1.surname) AS Member_Name FROM ((bookings1 INNER JOIN members1 ON bookings1.memid = members1.memid) INNER JOIN facilities1 ON bookings1.facid = facilities1.facid) WHERE facilities1.name LIKE \"Tennis Court%\" ORDER BY Member_Name;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"879ff42d-7f1d-47e6-a828-5cd82775c0ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 2","Anne Baker"],["Tennis Court 1","Anne Baker"],["Tennis Court 2","Burton Tracy"],["Tennis Court 1","Burton Tracy"],["Tennis Court 1","Charles Owen"],["Tennis Court 2","Charles Owen"],["Tennis Court 2","Darren Smith"],["Tennis Court 2","David Farrell"],["Tennis Court 1","David Farrell"],["Tennis Court 2","David Jones"],["Tennis Court 1","David Jones"],["Tennis Court 1","David Pinker"],["Tennis Court 1","Douglas Jones"],["Tennis Court 1","Erica Crumpet"],["Tennis Court 1","Florence Bader"],["Tennis Court 2","Florence Bader"],["Tennis Court 1","GUEST GUEST"],["Tennis Court 2","GUEST GUEST"],["Tennis Court 2","Gerald Butters"],["Tennis Court 1","Gerald Butters"],["Tennis Court 2","Henrietta Rumney"],["Tennis Court 1","Jack Smith"],["Tennis Court 2","Jack Smith"],["Tennis Court 2","Janice Joplette"],["Tennis Court 1","Janice Joplette"],["Tennis Court 2","Jemima Farrell"],["Tennis Court 1","Jemima Farrell"],["Tennis Court 1","Joan Coplin"],["Tennis Court 1","John Hunt"],["Tennis Court 2","John Hunt"],["Tennis Court 1","Matthew Genting"],["Tennis Court 2","Millicent Purview"],["Tennis Court 2","Nancy Dare"],["Tennis Court 1","Nancy Dare"],["Tennis Court 1","Ponder Stibbons"],["Tennis Court 2","Ponder Stibbons"],["Tennis Court 1","Ramnaresh Sarwin"],["Tennis Court 2","Ramnaresh Sarwin"],["Tennis Court 1","Tim Boothe"],["Tennis Court 2","Tim Boothe"],["Tennis Court 2","Tim Rownam"],["Tennis Court 1","Tim Rownam"],["Tennis Court 2","Timothy Baker"],["Tennis Court 1","Timothy Baker"],["Tennis Court 2","Tracy Smith"],["Tennis Court 1","Tracy Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Court_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Court_NameMember_Name
Tennis Court 2Anne Baker
Tennis Court 1Anne Baker
Tennis Court 2Burton Tracy
Tennis Court 1Burton Tracy
Tennis Court 1Charles Owen
Tennis Court 2Charles Owen
Tennis Court 2Darren Smith
Tennis Court 2David Farrell
Tennis Court 1David Farrell
Tennis Court 2David Jones
Tennis Court 1David Jones
Tennis Court 1David Pinker
Tennis Court 1Douglas Jones
Tennis Court 1Erica Crumpet
Tennis Court 1Florence Bader
Tennis Court 2Florence Bader
Tennis Court 1GUEST GUEST
Tennis Court 2GUEST GUEST
Tennis Court 2Gerald Butters
Tennis Court 1Gerald Butters
Tennis Court 2Henrietta Rumney
Tennis Court 1Jack Smith
Tennis Court 2Jack Smith
Tennis Court 2Janice Joplette
Tennis Court 1Janice Joplette
Tennis Court 2Jemima Farrell
Tennis Court 1Jemima Farrell
Tennis Court 1Joan Coplin
Tennis Court 1John Hunt
Tennis Court 2John Hunt
Tennis Court 1Matthew Genting
Tennis Court 2Millicent Purview
Tennis Court 2Nancy Dare
Tennis Court 1Nancy Dare
Tennis Court 1Ponder Stibbons
Tennis Court 2Ponder Stibbons
Tennis Court 1Ramnaresh Sarwin
Tennis Court 2Ramnaresh Sarwin
Tennis Court 1Tim Boothe
Tennis Court 2Tim Boothe
Tennis Court 2Tim Rownam
Tennis Court 1Tim Rownam
Tennis Court 2Timothy Baker
Tennis Court 1Timothy Baker
Tennis Court 2Tracy Smith
Tennis Court 1Tracy Smith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q8: How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? \n\n- Remember that guests have different costs to members (the listed costs are per half-hour 'slot')\n- The guest user's ID is always 0. \n\n#### Include in your output the name of the facility, the name of the member formatted as a single column, and the cost.\n\n- Order by descending cost, and do not use any subqueries."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"eb23ed45-ca1c-46b3-9371-ccf3d2904fb9","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name, CONCAT(members1.firstname, \" \", members1.surname) AS Member_Name, CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END AS Total_Cost FROM ((bookings1 INNER JOIN facilities1 ON bookings1.facid = facilities1.facid) INNER JOIN members1 ON bookings1.memid = members1.memid) WHERE bookings1.starttime LIKE \"2012-09-14%\" AND CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots > 30 ELSE facilities1.membercost * bookings1.slots > 30 END ORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"3ec2175c-8f0f-45fd-ae9a-414a6fb3ce28","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Facility_NameMember_NameTotal_Cost
Massage Room 2GUEST GUEST320.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Tennis Court 2GUEST GUEST150.0
Tennis Court 2GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Squash CourtGUEST GUEST70.0
Massage Room 1Jemima Farrell39.6
Squash CourtGUEST GUEST35.0
Squash CourtGUEST GUEST35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q9: This time, produce the same result as in Q8, but using a subquery."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"757c6468-3d07-42e2-b2b9-59e82b96350a","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name, CONCAT(members1.firstname, \" \", members1.surname) AS Member_Name, CASE WHEN booking.memid = 0 THEN facilities1.guestcost * booking.slots ELSE facilities1.membercost * booking.slots END AS Total_Cost FROM (((SELECT * FROM bookings1 WHERE starttime LIKE \"2012-09-14%\") AS booking INNER JOIN facilities1 ON booking.facid = facilities1.facid) INNER JOIN members1 ON booking.memid = members1.memid) WHERE CASE WHEN booking.memid = 0 THEN facilities1.guestcost * booking.slots > 30 ELSE facilities1.membercost * booking.slots > 30 END ORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"72f9d8b6-2d51-4af1-9fa1-d183a0369d30","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Facility_NameMember_NameTotal_Cost
Massage Room 2GUEST GUEST320.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Tennis Court 2GUEST GUEST150.0
Tennis Court 2GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Squash CourtGUEST GUEST70.0
Massage Room 1Jemima Farrell39.6
Squash CourtGUEST GUEST35.0
Squash CourtGUEST GUEST35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q10: Produce a list of facilities with a total revenue less than 1000.\n- The output should have facility name and total revenue, sorted by revenue. \n- Remember that there's a different cost for guests and members!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"dc14e8de-3daa-4339-b78c-2a8d78e599d1","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name, SUM(CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END) AS Total_Revenue FROM ((bookings1 INNER JOIN facilities1 ON bookings1.facid = facilities1.facid) INNER JOIN members1 ON bookings1.memid = members1.memid) GROUP BY facilities1.name HAVING SUM(CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END) < 1000 ORDER BY Total_Revenue;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"53422808-236b-4ebd-af5f-abc9c1bb70de","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Table Tennis",180.0],["Snooker Table",240.0],["Pool Table",270.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"Total_Revenue","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
nameTotal_Revenue
Table Tennis180.0
Snooker Table240.0
Pool Table270.0
"]}}],"execution_count":0},{"cell_type":"code","source":[""],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"1ff7e759-05a7-4e6f-8e4f-9cc05a74316c","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0}],"metadata":{"name":"Mini_Project_SQL_with_Spark","notebookId":1931807081501742,"application/vnd.databricks.v1+notebook":{"notebookName":"Mini_Project_SQL_with_Spark","dashboards":[],"notebookMetadata":{"pythonIndentUnit":4,"mostRecentlyExecutedCommandWithImplicitDF":{"commandId":551598812990966,"dataframes":["_sqldf"]}},"language":"python","widgets":{},"notebookOrigID":551598812990935}},"nbformat":4,"nbformat_minor":0} +{"cells":[{"cell_type":"markdown","source":["## SQL at Scale with Spark SQL\n\nWelcome to the SQL mini project. For this project, you will use the Databricks Platform and work through a series of exercises using Spark SQL. The dataset size may not be too big but the intent here is to familiarize yourself with the Spark SQL interface which scales easily to huge datasets, without you having to worry about changing your SQL queries. \n\nThe data you need is present in the mini-project folder in the form of three CSV files. This data will be imported in Databricks to create the following tables under the __`country_club`__ database.\n\n
\n1. The __`bookings`__ table,\n2. The __`facilities`__ table, and\n3. The __`members`__ table.\n\nYou will be uploading these datasets shortly into the Databricks platform to understand how to create a database within minutes! Once the database and the tables are populated, you will be focusing on the mini-project questions.\n\nIn the mini project, you'll be asked a series of questions. You can solve them using the databricks platform, but for the final deliverable,\nplease download this notebook as an IPython notebook (__`File -> Export -> IPython Notebook`__) and upload it to your GitHub."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"7dc8cef6-8322-4e3a-950b-757de959bbd7","inputWidgets":{},"title":""}}},{"cell_type":"markdown","source":["### Creating the Database\n\nWe will first create our database in which we will be creating our three tables of interest"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"3bd664ca-d7cc-4b4d-9c35-9957dd665c78","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql \ndrop database if exists country_club cascade;\ncreate database country_club;\nshow databases;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"98ba3faa-c4e8-48ef-9cc8-e2226e31582d","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club"],["default"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"databaseName","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
databaseName
country_club
default
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Creating the Tables\n\nIn this section, we will be creating the three tables of interest and populate them with the data from the CSV files already available to you. \nTo get started, first upload the three CSV files to the DBFS as depicted in the following figure\n\n![](https://i.imgur.com/QcCruBr.png)\n\n\nOnce you have done this, please remember to execute the following code to build the dataframes which will be saved as tables in our database"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"89fe2dd6-f130-4979-abff-3cfd7eefc14f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["# File location and type\nfile_location_bookings = \"/FileStore/tables/Bookings.csv\"\nfile_location_facilities = \"/FileStore/tables/Facilities.csv\"\nfile_location_members = \"/FileStore/tables/Members.csv\"\n\nfile_type = \"csv\"\n\n# CSV options\ninfer_schema = \"true\"\nfirst_row_is_header = \"true\"\ndelimiter = \",\"\n\n# The applied options are for CSV files. For other file types, these will be ignored.\nbookings_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_bookings))\n\nfacilities_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_facilities))\n\nmembers_df = (spark.read.format(file_type) \n .option(\"inferSchema\", infer_schema) \n .option(\"header\", first_row_is_header) \n .option(\"sep\", delimiter) \n .load(file_location_members))"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"936f355f-a485-4d3c-9a04-87bb55965d65","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Viewing the dataframe schemas\n\nWe can take a look at the schemas of our potential tables to be written to our database soon"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"f10ed1f5-65a6-4bc3-a902-606102a12222","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["print('Bookings Schema')\nbookings_df.printSchema()\nprint('Facilities Schema')\nfacilities_df.printSchema()\nprint('Members Schema')\nmembers_df.printSchema()"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"45dd3bb9-3cc9-415b-a0a7-891c8a0ade8c","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"datasetInfos":[],"data":"Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n","removedWidgets":[],"addedWidgets":{},"metadata":{},"type":"ansi","arguments":{}}},"output_type":"display_data","data":{"text/plain":["Bookings Schema\nroot\n |-- bookid: integer (nullable = true)\n |-- facid: integer (nullable = true)\n |-- memid: integer (nullable = true)\n |-- starttime: timestamp (nullable = true)\n |-- slots: integer (nullable = true)\n\nFacilities Schema\nroot\n |-- facid: integer (nullable = true)\n |-- name: string (nullable = true)\n |-- membercost: double (nullable = true)\n |-- guestcost: double (nullable = true)\n |-- initialoutlay: integer (nullable = true)\n |-- monthlymaintenance: integer (nullable = true)\n\nMembers Schema\nroot\n |-- memid: integer (nullable = true)\n |-- surname: string (nullable = true)\n |-- firstname: string (nullable = true)\n |-- address: string (nullable = true)\n |-- zipcode: integer (nullable = true)\n |-- telephone: string (nullable = true)\n |-- recommendedby: integer (nullable = true)\n |-- joindate: timestamp (nullable = true)\n\n"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Create permanent tables\nWe will be creating three permanent tables here in our __`country_club`__ database as we discussed previously with the following code"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"8766081c-ff5f-4bfa-870c-dcb7f9d1698c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["permanent_table_name_bookings = \"country_club.Bookings1\"\nbookings_df.write.format(\"parquet\").saveAsTable(permanent_table_name_bookings)\n\npermanent_table_name_facilities = \"country_club.Facilities1\"\nfacilities_df.write.format(\"parquet\").saveAsTable(permanent_table_name_facilities)\n\npermanent_table_name_members = \"country_club.Members1\"\nmembers_df.write.format(\"parquet\").saveAsTable(permanent_table_name_members)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a989021a-29b8-4159-9a8d-5f3a707379e3","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["### Refresh tables and check them"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a8d01df0-94bc-4097-845e-02e7e1637e4f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nuse country_club;\nREFRESH table bookings1;\nREFRESH table facilities1;\nREFRESH table members1;\nshow tables;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"11600185-2386-4341-89cb-66d50b9a29ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["country_club","bookings1",false],["country_club","facilities1",false],["country_club","members1",false]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"database","type":"\"string\"","metadata":"{}"},{"name":"tableName","type":"\"string\"","metadata":"{}"},{"name":"isTemporary","type":"\"boolean\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
databasetableNameisTemporary
country_clubbookings1false
country_clubfacilities1false
country_clubmembers1false
"]}}],"execution_count":0},{"cell_type":"markdown","source":["### Test a sample SQL query\n\n__Note:__ You can use __`%sql`__ at the beginning of a cell and write SQL queries directly as seen in the following cell. Neat isn't it!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"fdae66bd-5e7a-48f5-b715-ac4f761050ae","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nselect * from bookings1 limit 3"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"4339b5ab-b006-4458-aad5-b1f0a5c1ec87","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,3,1,"2012-07-03T11:00:00.000+0000",2],[1,4,1,"2012-07-03T08:00:00.000+0000",2],[2,6,0,"2012-07-03T18:00:00.000+0000",2]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"bookid","type":"\"integer\"","metadata":"{}"},{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"memid","type":"\"integer\"","metadata":"{}"},{"name":"starttime","type":"\"timestamp\"","metadata":"{}"},{"name":"slots","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
bookidfacidmemidstarttimeslots
0312012-07-03T11:00:00.000+00002
1412012-07-03T08:00:00.000+00002
2602012-07-03T18:00:00.000+00002
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q1: Some of the facilities charge a fee to members, but some do not. Please list the names of the facilities that do."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"17c520af-243e-4a39-8a24-ea6aa3b6a368","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name \nFROM facilities1 \nWHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"21f137b3-edf7-4c65-853a-42b836fa3481","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Badminton Court"],["Table Tennis"],["Snooker Table"],["Pool Table"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
name
Badminton Court
Table Tennis
Snooker Table
Pool Table
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q2: How many facilities do not charge a fee to members?"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"79bf7b92-87d8-4efb-ba7d-f0edcb59cc4b","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT COUNT(*) AS Count \nFROM facilities1 \nWHERE membercost = 0;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"0b10a941-41e4-4145-b853-801859a6bfa5","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[4]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Count","type":"\"long\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Count
4
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q3: How can you produce a list of facilities that charge a fee to members, where the fee is less than 20% of the facility's monthly maintenance cost? \n#### Return the facid, facility name, member cost, and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"bc6cd845-0be6-4c95-ade4-7d52c3a13cc8","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facid, \nname, \nmembercost, \nmonthlymaintenance \nFROM facilities1 \nWHERE (membercost > 0) \nAND (membercost < monthlymaintenance * .2)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"d35a57f8-07ea-42dc-9f4f-53694daefff1","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[0,"Tennis Court 1",5.0,200],[1,"Tennis Court 2",5.0,200],[4,"Massage Room 1",9.9,3000],[5,"Massage Room 2",9.9,3000],[6,"Squash Court",3.5,80]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
facidnamemembercostmonthlymaintenance
0Tennis Court 15.0200
1Tennis Court 25.0200
4Massage Room 19.93000
5Massage Room 29.93000
6Squash Court3.580
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q4: How can you retrieve the details of facilities with ID 1 and 5? Write the query without using the OR operator."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"9bc31a3f-ab2c-413c-9b99-46581023ae0c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT * \nFROM facilities1 \nWHERE facid IN (1, 5)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"cb034b10-5840-43e9-a25a-62503daa7c09","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[1,"Tennis Court 2",5.0,25.0,8000,200],[5,"Massage Room 2",9.9,80.0,4000,3000]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"facid","type":"\"integer\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"membercost","type":"\"double\"","metadata":"{}"},{"name":"guestcost","type":"\"double\"","metadata":"{}"},{"name":"initialoutlay","type":"\"integer\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
facidnamemembercostguestcostinitialoutlaymonthlymaintenance
1Tennis Court 25.025.08000200
5Massage Room 29.980.040003000
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q5: How can you produce a list of facilities, with each labelled as 'cheap' or 'expensive', depending on if their monthly maintenance cost is more than $100? \n#### Return the name and monthly maintenance of the facilities in question."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"0e0302a2-2911-41be-9599-e12323e7f23c","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT name, \nmonthlymaintenance, \nCASE WHEN monthlymaintenance > 100 \nTHEN \"expensive\" \nELSE \"cheap\" END AS value \nFROM facilities1;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"41373ea4-9038-4c8f-842f-8aae7b074809","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 1",200,"expensive"],["Tennis Court 2",200,"expensive"],["Badminton Court",50,"cheap"],["Table Tennis",10,"cheap"],["Massage Room 1",3000,"expensive"],["Massage Room 2",3000,"expensive"],["Squash Court",80,"cheap"],["Snooker Table",15,"cheap"],["Pool Table",15,"cheap"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"monthlymaintenance","type":"\"integer\"","metadata":"{}"},{"name":"value","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
namemonthlymaintenancevalue
Tennis Court 1200expensive
Tennis Court 2200expensive
Badminton Court50cheap
Table Tennis10cheap
Massage Room 13000expensive
Massage Room 23000expensive
Squash Court80cheap
Snooker Table15cheap
Pool Table15cheap
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q6: You'd like to get the first and last name of the last member(s) who signed up. Do not use the LIMIT clause for your solution."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"30f9e29f-9608-4c5d-a371-fc4cb22f9ea2","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT firstname, \nsurname \nFROM members1 \nWHERE joindate in (SELECT MAX(joindate) FROM members1)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"74bf3f5b-924d-4d90-b978-ffd456c22f43","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Darren","Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"firstname","type":"\"string\"","metadata":"{}"},{"name":"surname","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
firstnamesurname
DarrenSmith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q7: How can you produce a list of all members who have used a tennis court?\n- Include in your output the name of the court, and the name of the member formatted as a single column. \n- Ensure no duplicate data\n- Also order by the member name."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"ded40971-9804-46e8-a647-5b9cefce363e","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT DISTINCT facilities1.name AS Court_Name, \nCONCAT(members1.firstname, \" \", members1.surname) AS Member_Name \nFROM ((bookings1 \nINNER JOIN members1 \nON bookings1.memid = members1.memid) \nINNER JOIN facilities1 \nON bookings1.facid = facilities1.facid) \nWHERE facilities1.name LIKE \"Tennis Court%\" \nORDER BY Member_Name;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"879ff42d-7f1d-47e6-a828-5cd82775c0ee","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Tennis Court 2","Anne Baker"],["Tennis Court 1","Anne Baker"],["Tennis Court 2","Burton Tracy"],["Tennis Court 1","Burton Tracy"],["Tennis Court 1","Charles Owen"],["Tennis Court 2","Charles Owen"],["Tennis Court 2","Darren Smith"],["Tennis Court 2","David Farrell"],["Tennis Court 1","David Farrell"],["Tennis Court 2","David Jones"],["Tennis Court 1","David Jones"],["Tennis Court 1","David Pinker"],["Tennis Court 1","Douglas Jones"],["Tennis Court 1","Erica Crumpet"],["Tennis Court 1","Florence Bader"],["Tennis Court 2","Florence Bader"],["Tennis Court 1","GUEST GUEST"],["Tennis Court 2","GUEST GUEST"],["Tennis Court 2","Gerald Butters"],["Tennis Court 1","Gerald Butters"],["Tennis Court 2","Henrietta Rumney"],["Tennis Court 1","Jack Smith"],["Tennis Court 2","Jack Smith"],["Tennis Court 2","Janice Joplette"],["Tennis Court 1","Janice Joplette"],["Tennis Court 2","Jemima Farrell"],["Tennis Court 1","Jemima Farrell"],["Tennis Court 1","Joan Coplin"],["Tennis Court 1","John Hunt"],["Tennis Court 2","John Hunt"],["Tennis Court 1","Matthew Genting"],["Tennis Court 2","Millicent Purview"],["Tennis Court 2","Nancy Dare"],["Tennis Court 1","Nancy Dare"],["Tennis Court 1","Ponder Stibbons"],["Tennis Court 2","Ponder Stibbons"],["Tennis Court 1","Ramnaresh Sarwin"],["Tennis Court 2","Ramnaresh Sarwin"],["Tennis Court 1","Tim Boothe"],["Tennis Court 2","Tim Boothe"],["Tennis Court 2","Tim Rownam"],["Tennis Court 1","Tim Rownam"],["Tennis Court 2","Timothy Baker"],["Tennis Court 1","Timothy Baker"],["Tennis Court 2","Tracy Smith"],["Tennis Court 1","Tracy Smith"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Court_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Court_NameMember_Name
Tennis Court 2Anne Baker
Tennis Court 1Anne Baker
Tennis Court 2Burton Tracy
Tennis Court 1Burton Tracy
Tennis Court 1Charles Owen
Tennis Court 2Charles Owen
Tennis Court 2Darren Smith
Tennis Court 2David Farrell
Tennis Court 1David Farrell
Tennis Court 2David Jones
Tennis Court 1David Jones
Tennis Court 1David Pinker
Tennis Court 1Douglas Jones
Tennis Court 1Erica Crumpet
Tennis Court 1Florence Bader
Tennis Court 2Florence Bader
Tennis Court 1GUEST GUEST
Tennis Court 2GUEST GUEST
Tennis Court 2Gerald Butters
Tennis Court 1Gerald Butters
Tennis Court 2Henrietta Rumney
Tennis Court 1Jack Smith
Tennis Court 2Jack Smith
Tennis Court 2Janice Joplette
Tennis Court 1Janice Joplette
Tennis Court 2Jemima Farrell
Tennis Court 1Jemima Farrell
Tennis Court 1Joan Coplin
Tennis Court 1John Hunt
Tennis Court 2John Hunt
Tennis Court 1Matthew Genting
Tennis Court 2Millicent Purview
Tennis Court 2Nancy Dare
Tennis Court 1Nancy Dare
Tennis Court 1Ponder Stibbons
Tennis Court 2Ponder Stibbons
Tennis Court 1Ramnaresh Sarwin
Tennis Court 2Ramnaresh Sarwin
Tennis Court 1Tim Boothe
Tennis Court 2Tim Boothe
Tennis Court 2Tim Rownam
Tennis Court 1Tim Rownam
Tennis Court 2Timothy Baker
Tennis Court 1Timothy Baker
Tennis Court 2Tracy Smith
Tennis Court 1Tracy Smith
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q8: How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30? \n\n- Remember that guests have different costs to members (the listed costs are per half-hour 'slot')\n- The guest user's ID is always 0. \n\n#### Include in your output the name of the facility, the name of the member formatted as a single column, and the cost.\n\n- Order by descending cost, and do not use any subqueries."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"eb23ed45-ca1c-46b3-9371-ccf3d2904fb9","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name,\nCONCAT(members1.firstname, \" \", members1.surname) AS Member_Name,\nCASE WHEN bookings1.memid = 0 \nTHEN facilities1.guestcost * bookings1.slots \nELSE facilities1.membercost * bookings1.slots END AS Total_Cost \nFROM ((bookings1 \nINNER JOIN facilities1 \nON bookings1.facid = facilities1.facid) \nINNER JOIN members1 \nON bookings1.memid = members1.memid) \nWHERE bookings1.starttime LIKE \"2012-09-14%\" \nAND CASE WHEN bookings1.memid = 0 \nTHEN facilities1.guestcost * bookings1.slots > 30 \nELSE facilities1.membercost * bookings1.slots > 30 END \nORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"3ec2175c-8f0f-45fd-ae9a-414a6fb3ce28","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Facility_NameMember_NameTotal_Cost
Massage Room 2GUEST GUEST320.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Tennis Court 2GUEST GUEST150.0
Tennis Court 2GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Squash CourtGUEST GUEST70.0
Massage Room 1Jemima Farrell39.6
Squash CourtGUEST GUEST35.0
Squash CourtGUEST GUEST35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q9: This time, produce the same result as in Q8, but using a subquery."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"757c6468-3d07-42e2-b2b9-59e82b96350a","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name AS Facility_Name,\nCONCAT(members1.firstname, \" \",members1.surname) AS Member_Name,\nCASE WHEN booking.memid = 0 \nTHEN facilities1.guestcost * booking.slots \nELSE facilities1.membercost * booking.slots END AS Total_Cost \nFROM \n(((SELECT * \nFROM bookings1 \nWHERE starttime LIKE \"2012-09-14%\") AS booking \nINNER JOIN facilities1 \nON booking.facid = facilities1.facid) \nINNER JOIN members1 \nON booking.memid = members1.memid) \nWHERE CASE WHEN booking.memid = 0 THEN facilities1.guestcost * booking.slots > 30 ELSE facilities1.membercost * booking.slots > 30 END \nORDER BY Total_Cost desc;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"72f9d8b6-2d51-4af1-9fa1-d183a0369d30","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Massage Room 2","GUEST GUEST",320.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Massage Room 1","GUEST GUEST",160.0],["Tennis Court 2","GUEST GUEST",150.0],["Tennis Court 2","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Tennis Court 1","GUEST GUEST",75.0],["Squash Court","GUEST GUEST",70.0],["Massage Room 1","Jemima Farrell",39.6],["Squash Court","GUEST GUEST",35.0],["Squash Court","GUEST GUEST",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"Facility_Name","type":"\"string\"","metadata":"{}"},{"name":"Member_Name","type":"\"string\"","metadata":"{}"},{"name":"Total_Cost","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
Facility_NameMember_NameTotal_Cost
Massage Room 2GUEST GUEST320.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Massage Room 1GUEST GUEST160.0
Tennis Court 2GUEST GUEST150.0
Tennis Court 2GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Tennis Court 1GUEST GUEST75.0
Squash CourtGUEST GUEST70.0
Massage Room 1Jemima Farrell39.6
Squash CourtGUEST GUEST35.0
Squash CourtGUEST GUEST35.0
"]}}],"execution_count":0},{"cell_type":"markdown","source":["#### Q10: Produce a list of facilities with a total revenue less than 1000.\n- The output should have facility name and total revenue, sorted by revenue. \n- Remember that there's a different cost for guests and members!"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"dc14e8de-3daa-4339-b78c-2a8d78e599d1","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["%sql\nSELECT facilities1.name,\nSUM(CASE WHEN bookings1.memid = 0 \nTHEN facilities1.guestcost * bookings1.slots \nELSE facilities1.membercost * bookings1.slots END) AS Total_Revenue \nFROM \n((bookings1 \nINNER JOIN facilities1 \nON bookings1.facid = facilities1.facid) \nINNER JOIN members1 \nON bookings1.memid = members1.memid) \nGROUP BY facilities1.name \nHAVING SUM(CASE WHEN bookings1.memid = 0 THEN facilities1.guestcost * bookings1.slots ELSE facilities1.membercost * bookings1.slots END) < 1000 \nORDER BY Total_Revenue;"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{"implicitDf":true},"nuid":"53422808-236b-4ebd-af5f-abc9c1bb70de","inputWidgets":{},"title":""}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["Table Tennis",180.0],["Snooker Table",240.0],["Pool Table",270.0]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"Total_Revenue","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["
nameTotal_Revenue
Table Tennis180.0
Snooker Table240.0
Pool Table270.0
"]}}],"execution_count":0},{"cell_type":"code","source":[""],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"1ff7e759-05a7-4e6f-8e4f-9cc05a74316c","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0}],"metadata":{"name":"Mini_Project_SQL_with_Spark","notebookId":1931807081501742,"application/vnd.databricks.v1+notebook":{"notebookName":"Mini_Project_SQL_with_Spark","dashboards":[],"notebookMetadata":{"pythonIndentUnit":4,"mostRecentlyExecutedCommandWithImplicitDF":{"commandId":551598812990950,"dataframes":["_sqldf"]}},"language":"python","widgets":{},"notebookOrigID":551598812990935}},"nbformat":4,"nbformat_minor":0}