Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial feature-engineering-on-fabric single-tech sample check-in #652

Merged
merged 27 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
32f162c
feat: initial feature-eng-on-fabric single-tech sample check-in
thurstonchen Nov 9, 2023
78f56b3
doc: resize screenshots with minor contents updates
thurstonchen Nov 10, 2023
0db5811
update for data source landing
Nick287 Nov 10, 2023
05d9dc0
code: update model training notebook
cchenshu Nov 10, 2023
747a039
update for data loading base url and relative path
Nick287 Nov 13, 2023
44ac774
Apply suggestions from code review (Nov. 13th)
thurstonchen Nov 13, 2023
81ecab0
remove App service code and update images
Nick287 Nov 14, 2023
32bc03d
for simplicity remove option 1 and send it as a footnote info no details
Nick287 Nov 14, 2023
64823b6
fix: use Fabric workspace & lakehouse id in Purview qualified names, …
thurstonchen Nov 14, 2023
a0f74b6
Fixing some linking errors
promisinganuj Nov 14, 2023
e5094f2
Updated introduction and architecture description
promisinganuj Nov 15, 2023
08fc3ac
Updated environment setup details
promisinganuj Nov 15, 2023
2469bdf
Updated 'Source Dataset' section
promisinganuj Nov 15, 2023
7f531f5
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
4769fb9
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
8525ae5
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
e7623dc
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
7bb0749
doc: add contents on verifying lineage in Purview
thurstonchen Nov 15, 2023
2b2225f
doc: add missed bullet to Contents table
thurstonchen Nov 15, 2023
e02b924
Updating Lineage section
promisinganuj Nov 16, 2023
786922d
Updating Lineage section
promisinganuj Nov 16, 2023
597e774
Updating Lineage section
promisinganuj Nov 16, 2023
4cc2d21
Updating Lineage section
promisinganuj Nov 16, 2023
0b4f621
Updating Lineage section
promisinganuj Nov 16, 2023
b5d1586
Updating 'Required resources' header
promisinganuj Nov 16, 2023
4622bb2
Fixing URL checks
promisinganuj Nov 16, 2023
2382f59
Fixing URL checks
promisinganuj Nov 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
365 changes: 365 additions & 0 deletions single_tech_samples/fabric/feature_engineering_on_fabric/README.md

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
promisinganuj marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 17
VisualStudioVersion = 17.5.33414.496
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "DataSourceAPP", "DataSourceAPP\DataSourceAPP.csproj", "{05E8B453-D790-4FE3-BE67-B1E895C51422}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{05E8B453-D790-4FE3-BE67-B1E895C51422}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{05E8B453-D790-4FE3-BE67-B1E895C51422}.Debug|Any CPU.Build.0 = Debug|Any CPU
{05E8B453-D790-4FE3-BE67-B1E895C51422}.Release|Any CPU.ActiveCfg = Release|Any CPU
{05E8B453-D790-4FE3-BE67-B1E895C51422}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {870223E8-315A-4A0F-B392-517A7BD997CA}
EndGlobalSection
EndGlobal
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"version": 1,
"isRoot": true,
"tools": {
"dotnet-ef": {
"version": "7.0.12",
"commands": [
"dotnet-ef"
]
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.VisualBasic;
using System.Diagnostics;
using System.Xml.Linq;

namespace DataSourceAPP.Controllers
{
[Route("api/[controller]")]
[ApiController]
public class DownloadController : ControllerBase
{
//GET api/download/12345abc
[HttpGet("{FileName}")]
public IActionResult Download(string FileName)
{
var folder = "SourceFiles/TLC Trip Record Data";
var filePath = Path.Combine(Directory.GetCurrentDirectory(), folder, FileName);
var fileContents = System.IO.File.ReadAllBytes(filePath);
var contentType = "text/plain";
var fileDownloadName = FileName;
return File(fileContents, contentType, fileDownloadName);

// https://learn.microsoft.com/en-us/azure/app-service/deploy-configure-credentials?tabs=portal
//az resource update--resource - group fsd1--name ftp --namespace Microsoft.Web --resource-type basicPublishingCredentialsPolicies --parent sites/fsd1-webapp --set properties.allow=true
//az resource update --resource-group fsd1 --name scm --namespace Microsoft.Web --resource-type basicPublishingCredentialsPolicies --parent sites/fsd1-webapp --set properties.allow=true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
<Project Sdk="Microsoft.NET.Sdk.Web">

<PropertyGroup>
<TargetFramework>net6.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Swashbuckle.AspNetCore" Version="6.2.3" />
</ItemGroup>

<ItemGroup>
<None Update="SourceFiles\TLC Trip Record Data\taxi_zone_lookup.csv">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-01.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-02.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-03.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-04.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-05.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-06.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-07.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-08.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-09.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-10.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-11.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="SourceFiles\TLC Trip Record Data\yellow_tripdata_2022-12.parquet">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
</ItemGroup>

</Project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
var builder = WebApplication.CreateBuilder(args);

// Add services to the container.

builder.Services.AddControllers();
// Learn more about configuring Swagger/OpenAPI at https://aka.ms/aspnetcore/swashbuckle
builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

var app = builder.Build();

// Configure the HTTP request pipeline.
if (app.Environment.IsDevelopment())
{
app.UseSwagger();
app.UseSwaggerUI();
}

app.UseHttpsRedirection();

app.UseAuthorization();

app.MapControllers();

app.Run();
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning"
}
},
"AllowedHosts": "*"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
dependencies:
- pip:
- azureml-featurestore==0.1.0b5
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
runtime_version: '1.1'
spark_conf:
- spark.fsd.client_id: <sp-client-id>
- spark.fsd.tenant_id: <sp-tenant-id>
- spark.fsd.subscription_id: <subscription-id>
- spark.fsd.rg_name: <feature-store-resouce-group>
- spark.fsd.name: <feature-store-name>
- spark.fsd.fabric.tenant: <fabric-tenant-name> # Fetch from Fabric base URL, like https://<fabric-tenant-name>.powerbi.com/
- spark.fsd.fabric.workspace: <fabric-workspace>
- spark.fsd.fabric.lakehouse: <fabric-lakehouse>
- spark.fsd.purview.account: <purview-account-name>
promisinganuj marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","id":"fb692fa2","metadata":{},"source":["### Load ingested data from staging zone"]},{"cell_type":"code","execution_count":null,"id":"a1f94d23","metadata":{},"outputs":[],"source":["import pandas as pd\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import seaborn as sns"]},{"cell_type":"code","execution_count":null,"id":"6a719fb9","metadata":{},"outputs":[],"source":["# Load Yellow Taxi Trip Records parquet file from staging zone to pandas dataframe\n","year = \"2022\"\n","staging_path = \"02_staging\"\n","\n","pd_df = pd.read_parquet(f\"/lakehouse/default/Files/{staging_path}/yellow_taxi_tripdata_{year}.parquet\", engine=\"pyarrow\")\n","pd_df.head()"]},{"cell_type":"code","execution_count":null,"id":"56e6a87f","metadata":{},"outputs":[],"source":["# Load location zones data from landing zone\n","landing_path = \"01_landing\"\n","zones_df = pd.read_csv(f\"/lakehouse/default/Files/{landing_path}/taxi_zone_lookup.csv\")\n","zones_df.head()\n"]},{"cell_type":"markdown","id":"918cf82d","metadata":{},"source":["## EDA"]},{"cell_type":"code","execution_count":null,"id":"f23ef820","metadata":{},"outputs":[],"source":["# Check null values for columns\n","pd_df.isnull().sum()"]},{"cell_type":"code","execution_count":null,"id":"50904bd6","metadata":{},"outputs":[],"source":["# Check unknown (264 and 265) location for PULocationID columns\n","pd_df[(pd_df[\"PULocationID\"] == 264) | (pd_df[\"PULocationID\"] == 265)]"]},{"cell_type":"code","execution_count":null,"id":"37afb3dc","metadata":{},"outputs":[],"source":["sns.displot(pd_df[\"passenger_count\"], kde=True, stat=\"density\")\n","plt.show()"]},{"cell_type":"code","execution_count":null,"id":"1964d672","metadata":{},"outputs":[],"source":["# Check location zones data\n","zones_df.isnull().sum()"]}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"display_name":"Synapse PySpark","language":"Python","name":"synapse_pyspark"},"language_info":{"name":"python"},"microsoft":{"host":{},"language":"python","ms_spell_check":{"ms_spell_check_language":"en"}},"notebook_environment":{},"nteract":{"version":"[email protected]"},"save_output":true,"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{},"enableDebugMode":false}},"synapse_widget":{"state":{},"version":"0.1"},"widgets":{}},"nbformat":4,"nbformat_minor":5}
promisinganuj marked this conversation as resolved.
Show resolved Hide resolved

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","id":"f6cc8419-b0c9-448a-a510-901e14519b7c","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Parameter Setup"]},{"cell_type":"code","execution_count":null,"id":"70870927-0f1d-486a-873a-f4c1d3cceeae","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}},"tags":["parameters"]},"outputs":[],"source":["fabric_tenant = spark.conf.get(\"spark.fsd.fabric.tenant\")\n","fabric_workspace = spark.conf.get(\"spark.fsd.fabric.workspace\")\n","fabric_lakehouse = spark.conf.get(\"spark.fsd.fabric.lakehouse\")"]},{"cell_type":"markdown","id":"a98f3aac-4908-4f7f-b54e-eb28a5e11a38","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Load ingested data from staging zone"]},{"cell_type":"code","execution_count":null,"id":"0b57611a-575c-484e-b668-51e5fdadc824","metadata":{},"outputs":[],"source":["import pandas as pd\n","\n","# Load Yellow Taxi Trip Records parquet file from staging zone to pandas dataframe\n","year = \"2022\"\n","staging_path = \"02_staging\"\n","\n","pd_df = pd.read_parquet(f\"/lakehouse/default/Files/{staging_path}/yellow_taxi_tripdata_{year}.parquet\", engine=\"pyarrow\")\n","pd_df.head()\n"]},{"cell_type":"code","execution_count":null,"id":"f38d4394-076c-46c5-86fe-3c54eca92080","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["# Load location zones data from landing zone\n","landing_path = \"01_landing\"\n","zones_df = pd.read_csv(f\"/lakehouse/default/Files/{landing_path}/taxi_zone_lookup.csv\")\n","zones_df.head()\n"]},{"cell_type":"markdown","id":"1a167bbe-f0b1-4351-b18c-93fe3bbbd480","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Data cleansing"]},{"cell_type":"code","execution_count":null,"id":"f7dc8421-e1b5-4ede-8869-c661ee03b2a7","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["# Remove rows with null passenger_count\n","pd_df = pd_df.dropna(subset=[\"passenger_count\"])\n","pd_df.isnull().sum()\n"]},{"cell_type":"code","execution_count":null,"id":"7a2304ea-4269-4e77-95f1-af079cafc61b","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["# Remove rows with unknown location ID (264 and 265) for PULocationID and DOLocationID columns\n","pd_df = pd_df.drop(pd_df[\"PULocationID\"].loc[(pd_df[\"PULocationID\"] == 264) | (pd_df[\"PULocationID\"] == 265)].index)\n","pd_df = pd_df.drop(pd_df[\"DOLocationID\"].loc[(pd_df[\"DOLocationID\"] == 264) | (pd_df[\"DOLocationID\"] == 265)].index)\n"]},{"cell_type":"code","execution_count":null,"id":"ba4bdfcb-6f55-4e92-addf-fb34095186ee","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["# Remove rows with null values for service_zone column of zones data \n","zones_df = zones_df.dropna(subset=[\"service_zone\"])\n","zones_df.isnull().sum()\n"]},{"cell_type":"markdown","id":"12bf196f-5096-4850-bb87-5c34ad685838","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Sink cleaned data to standardization zone"]},{"cell_type":"code","execution_count":null,"id":"3d354eb6-44b5-482c-b9da-03c0516529d5","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["# Sink concatenated trip records to 03_standard path of Fabric OneLake\n","standard_path = \"03_standard\"\n","mssparkutils.fs.mkdirs(f\"Files/{standard_path}\")\n","\n","pd_df.to_parquet(f\"/lakehouse/default/Files/{standard_path}/cleaned_yellow_taxi_tripdata_{year}.parquet\")\n","zones_df.to_parquet(f\"/lakehouse/default/Files/{standard_path}/nyc_zones.parquet\")\n"]},{"cell_type":"markdown","id":"0d1c6ab9-fef3-46de-b394-a2994855e71c","metadata":{"nteract":{"transient":{"deleting":false}}},"source":["### Register data assets and lineage of data pipeline to Purview"]},{"cell_type":"code","execution_count":null,"id":"2f7cb196-1b42-4f17-9d5f-9a05cc9b3d57","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["%run data_catalog_and_lineage"]},{"cell_type":"code","execution_count":null,"id":"1f8f353c-4b41-4c53-b276-f68338408399","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["%run utils"]},{"cell_type":"code","execution_count":null,"id":"898ba087-d7af-4da2-aab3-e1236b8fefa7","metadata":{"jupyter":{"outputs_hidden":false,"source_hidden":false},"nteract":{"transient":{"deleting":false}}},"outputs":[],"source":["purview_data_catalog = PurviewDataCatalog()\n","\n","fabric_onelake_tenant = get_fabric_onelake_tenant()\n","onelake_base_path = f\"abfss://{fabric_workspace}@{fabric_onelake_tenant}.dfs.fabric.microsoft.com/{fabric_lakehouse}.Lakehouse/Files\"\n","\n","# Create source data assets list\n","source_data_assets = []\n","trip_data_source_file = f\"yellow_taxi_tripdata_{year}.parquet\"\n","source_data_asset_1 = DataAsset(trip_data_source_file,\n"," \"parquet\",\n"," f\"{onelake_base_path}/{staging_path}/{trip_data_source_file}\")\n","\n","zones_data_source_file = f\"taxi_zone_lookup.csv\" \n","source_data_asset_2 = DataAsset(zones_data_source_file,\n"," \"csv\",\n"," f\"{onelake_base_path}/{landing_path}/{zones_data_source_file}\")\n","source_data_assets.append(source_data_asset_1)\n","source_data_assets.append(source_data_asset_2)\n","\n","# Create sink data assets list\n","sink_data_assets = []\n","cleaned_trip_data_file = f\"cleaned_yellow_taxi_tripdata_{year}.parquet\"\n","sink_data_asset_1 = DataAsset(cleaned_trip_data_file,\n"," \"parquet\",\n"," f\"{onelake_base_path}/{standard_path}/{cleaned_trip_data_file}\")\n","\n","cleaned_zones_data = \"nyc_zones.parquet\"\n","sink_data_asset_2 = DataAsset(cleaned_zones_data,\n"," \"parquet\",\n"," f\"{onelake_base_path}/{standard_path}/{cleaned_zones_data}\")\n","\n","sink_data_assets.append(sink_data_asset_1)\n","sink_data_assets.append(sink_data_asset_2)\n","\n","# Create process data asset\n","current_notebook_context = mssparkutils.notebook.nb.context\n","workspace_id = current_notebook_context[\"currentWorkspaceId\"]\n","notebook_id = current_notebook_context[\"currentNotebookId\"]\n","# notebook_name = current_notebook_context[\"currentNotebookName\"]\n","process_data_asset = DataAsset(\"data_cleansing (Fabric notebook)\",\n"," \"process\",\n"," f\"https://{fabric_tenant}.powerbi.com/groups/{workspace_id}/synapsenotebooks/{notebook_id}\")\n","\n","# Create lineage for data pipeline\n","data_pipeline_lineage = DataLineage(source_data_assets, sink_data_assets, process_data_asset)\n","\n","# Register lineage of data pipeline to Purview\n","purview_data_catalog.register_lineage(data_pipeline_lineage)\n"]}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"display_name":"Synapse PySpark","language":"Python","name":"synapse_pyspark"},"language_info":{"name":"python"},"microsoft":{"host":{},"language":"python"},"notebook_environment":{},"nteract":{"version":"[email protected]"},"save_output":true,"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{},"enableDebugMode":false}},"synapse_widget":{"state":{},"version":"0.1"},"widgets":{}},"nbformat":4,"nbformat_minor":5}
Loading
Loading