-
Notifications
You must be signed in to change notification settings - Fork 478
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: introduce nycweather features to feature-engineering on fabric …
…single-tech sample
- Loading branch information
1 parent
5a9b9d9
commit a0a9bde
Showing
15 changed files
with
35 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file modified
BIN
+323 KB
(110%)
...es/fabric/feature_engineering_on_fabric/images/data_lineage/feature_lineage.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
-345 KB
(78%)
...ic/feature_engineering_on_fabric/images/data_lineage/model_training_lineage.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
-11 KB
(80%)
.../fabric/feature_engineering_on_fabric/images/data_pipeline/data_pipeline_09.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+16 KB
(140%)
.../fabric/feature_engineering_on_fabric/images/data_pipeline/data_pipeline_10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+38.1 KB
(220%)
...tech_samples/fabric/feature_engineering_on_fabric/images/inferencing_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+54.4 KB
...ch_samples/fabric/feature_engineering_on_fabric/images/inferencing_result_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified
BIN
+228 KB
(120%)
...h_samples/fabric/feature_engineering_on_fabric/images/managed_feature_store.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion
2
single_tech_samples/fabric/feature_engineering_on_fabric/src/notebooks/data_cleansing.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
2 changes: 1 addition & 1 deletion
2
...tech_samples/fabric/feature_engineering_on_fabric/src/notebooks/data_transformation.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
2 changes: 1 addition & 1 deletion
2
...amples/fabric/feature_engineering_on_fabric/src/notebooks/exploratory_data_analysis.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"cells":[{"cell_type":"markdown","id":"fb692fa2","metadata":{},"source":["### Load ingested data from staging zone"]},{"cell_type":"code","execution_count":null,"id":"a1f94d23","metadata":{},"outputs":[],"source":["import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import seaborn as sns"]},{"cell_type":"code","execution_count":null,"id":"6a719fb9","metadata":{},"outputs":[],"source":["# Load Yellow Taxi Trip Records parquet file from staging zone to pandas dataframe\n","year = \"2022\"\n","staging_path = \"02_staging\"\n","\n","pd_df = pd.read_parquet(f\"/lakehouse/default/Files/{staging_path}/yellow_taxi_tripdata_{year}.parquet\", engine=\"pyarrow\")\n","pd_df.head()"]},{"cell_type":"code","execution_count":null,"id":"56e6a87f","metadata":{},"outputs":[],"source":["# Load location zones data from landing zone\n","landing_path = \"01_landing\"\n","zones_df = pd.read_csv(f\"/lakehouse/default/Files/{landing_path}/taxi_zone_lookup.csv\")\n","zones_df.head()\n"]},{"cell_type":"markdown","id":"918cf82d","metadata":{},"source":["## EDA"]},{"cell_type":"code","execution_count":null,"id":"f23ef820","metadata":{},"outputs":[],"source":["# Check null values for columns\n","pd_df.isnull().sum()"]},{"cell_type":"code","execution_count":null,"id":"50904bd6","metadata":{},"outputs":[],"source":["# Check unknown (264 and 265) location for PULocationID columns\n","pd_df[(pd_df[\"PULocationID\"] == 264) | (pd_df[\"PULocationID\"] == 265)]"]},{"cell_type":"code","execution_count":null,"id":"37afb3dc","metadata":{},"outputs":[],"source":["sns.displot(pd_df[\"passenger_count\"], kde=True, stat=\"density\")\n","plt.show()"]},{"cell_type":"code","execution_count":null,"id":"1964d672","metadata":{},"outputs":[],"source":["# Check location zones data\n","zones_df.isnull().sum()"]}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"display_name":"Synapse PySpark","language":"Python","name":"synapse_pyspark"},"language_info":{"name":"python"},"microsoft":{"host":{},"language":"python","ms_spell_check":{"ms_spell_check_language":"en"}},"notebook_environment":{},"nteract":{"version":"[email protected]"},"save_output":true,"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{},"enableDebugMode":false}},"synapse_widget":{"state":{},"version":"0.1"},"widgets":{}},"nbformat":4,"nbformat_minor":5} | ||
{"cells":[{"cell_type":"markdown","id":"fb692fa2","metadata":{},"source":["### Load ingested data from staging zone"]},{"cell_type":"code","execution_count":null,"id":"a1f94d23","metadata":{},"outputs":[],"source":["import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import seaborn as sns"]},{"cell_type":"code","execution_count":null,"id":"6a719fb9","metadata":{},"outputs":[],"source":["# Load Yellow Taxi Trip Records parquet file from staging zone to pandas dataframe\n","year = \"2022\"\n","staging_path = \"02_staging\"\n","\n","pd_df = pd.read_parquet(f\"/lakehouse/default/Files/{staging_path}/yellow_taxi_tripdata_{year}.parquet\", engine=\"pyarrow\")\n","pd_df.head()"]},{"cell_type":"code","execution_count":null,"id":"56e6a87f","metadata":{},"outputs":[],"source":["# Load NYC location zones data from landing zone\n","landing_path = \"01_landing\"\n","zones_df = pd.read_csv(f\"/lakehouse/default/Files/{landing_path}/taxi_zone_lookup.csv\")\n","zones_df.head()\n"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Load NYC weather data from landing zone\n","nycweather_df = pd.read_csv(f\"/lakehouse/default/Files/{landing_path}/nyc_weather_{year}.csv\")\n","nycweather_df.head()\n"]},{"cell_type":"markdown","id":"918cf82d","metadata":{},"source":["## EDA"]},{"cell_type":"code","execution_count":null,"id":"f23ef820","metadata":{},"outputs":[],"source":["# Check null values for columns of NYC yellow taxi trip data\n","pd_df.isnull().sum()"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Check null values for columns of NYC weather data\n","nycweather_df.isnull().sum()\n"]},{"cell_type":"code","execution_count":null,"id":"50904bd6","metadata":{},"outputs":[],"source":["# Check unknown (264 and 265) location for PULocationID columns\n","pd_df[(pd_df[\"PULocationID\"] == 264) | (pd_df[\"PULocationID\"] == 265)]"]},{"cell_type":"code","execution_count":null,"id":"37afb3dc","metadata":{},"outputs":[],"source":["sns.displot(pd_df[\"passenger_count\"], kde=True, stat=\"density\")\n","plt.show()"]},{"cell_type":"code","execution_count":null,"id":"1964d672","metadata":{},"outputs":[],"source":["# Check location zones data\n","zones_df.isnull().sum()"]}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"display_name":"Synapse PySpark","language":"Python","name":"synapse_pyspark"},"language_info":{"name":"python"},"microsoft":{"host":{},"language":"python","ms_spell_check":{"ms_spell_check_language":"en"}},"notebook_environment":{},"nteract":{"version":"[email protected]"},"save_output":true,"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{},"enableDebugMode":false}},"synapse_widget":{"state":{},"version":"0.1"},"widgets":{}},"nbformat":4,"nbformat_minor":5} |
2 changes: 1 addition & 1 deletion
2
...samples/fabric/feature_engineering_on_fabric/src/notebooks/feature_set_registration.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.