Hello Everyone, I am excited to share my first complete end-to-end Data Engineering Project, Uber Data Analysis Project, from creating data pipelines to finally creating the dashboard.
00.-.python.preprocessing.mp4
Step 4: Create a project and a bucket on the Google Cloud Platform, upload the data, select the server and set the appropriate permissions.
Note: Project ID and Project Number are hidden.
Step 6: Connect the VM to Mage Project using SSH Linux Terminal and create a mage project (also download the necessary dependencies).
Step 7: Create a data pipeline using Mage Blocks like data loader, transformer, and exporters. Add your transformation code to the data transformer with the necessary changes.
Step 8: Once, the pipeline is ready, add GCP credentials credentials to the configuration 'io_config.yaml' file. You can easily get the credentials from the APIs and Services tab from Google Console.
Step 9: Using BigQuery to query the data, perform ETL operations so that data can be used for Data Analysis like creating dashboards, reporting, etc.
big_query.-.Made.with.Clipchamp.mp4
Step 10: Finally, create a dashboard using any dashboarding/reporting software, I used Looker Studio but we can also use other tools like Power BI, Tableau, Qlik, etc.
View Live Dashboard Here: https://lookerstudio.google.com/s/nQI06ax2wMY
-- top 10 pickup locations based on number of trips
SELECT pickup_location_id, COUNT(trip_id) as No_of_Trips FROM uber_dataset.fact_table GROUP BY pickup_location_id ORDER BY No_of_Trips DESC LIMIT 10;
-- total number of trips by passenger count
SELECT passenger_count, COUNT(passenger_count) AS No_of_Trips FROM uber-big-data-analysis.uber_dataset.passenger_count_dim GROUP BY passenger_count;
-- Average fare amount by hour of the day
SELECT d.pick_hour, AVG(f.fare_amount) AS Avg_Fare_Amt FROM uber-big-data-analysis.uber_dataset.datetime_dim d JOIN uber-big-data-analysis.uber_dataset.fact_table f ON d.datetime_id=f.datetime_id GROUP BY d.pick_hour ORDER BY AVG(f.fare_amount) DESC;