ADS Expertise Notebooks

The Accelerated Data Science (ADS) SDK is maintained by the Oracle Cloud Infrastructure Data Science service team. It speeds up common data science activities by providing tools that automate and/or simplify common data science tasks, along with providing a data scientist friendly pythonic interface to Oracle Cloud Infrastructure (OCI) services, most notably OCI Data Science, Data Flow, Object Storage, and the Autonomous Database. ADS gives you an interface to manage the lifecycle of machine learning models, from data acquisition to model evaluation, interpretation, and model deployment.

The ADS SDK can be downloaded from PyPi, contributions welcome on GitHub

Topics

Audi Autonomous Driving Dataset Repository
Bank Graph Example Notebook
Building a Forecaster using AutoMLx
Building and Explaining a Classifier using AutoMLx
Building and Explaining a Regressor using AutoMLx
Building and Explaining a Text Classifier using AutoMLx
Building and Explaining an Anomaly Detector using AutoMLx - Experimental
Caltech Pedestrian Detection Benchmark Repository
Connect to Oracle Big Data Service
Fairness with AutoMLx
Graph Analytics and Graph Machine Learning with PyPGX
How to Read Data with fsspec from Oracle Big Data Service (BDS)
Intel Extension for Scikit-Learn
Introduction to ADSTuner
Introduction to Model Version Set
Introduction to SQL Magic
Introduction to Streaming
Introduction to the Oracle Cloud Infrastructure Data Flow Studio
Loading Data With Pandas & Dask
Model Evaluation with ADSEvaluator
Natural Language Processing
ONNX Integration with the Accelerated Data Science (ADS) SDK
PySpark
Spark NLP within Oracle Cloud Infrastructure Data Flow Studio
Text Classification and Model Explanations using LIME
Text Classification with Data Labeling Service Integration
Text Extraction Using the Accelerated Data Science (ADS) SDK
Train, Register, and Deploy a Generic Model
Train, Register, and Deploy a LightGBM Model
Train, Register, and Deploy a PyTorch Model
Train, Register, and Deploy a TensorFlow Model
Train, Register, and Deploy an XGBoost Model
Train, register, and deploy HuggingFace Pipeline
Train, register, and deploy Sklearn Model
Using Data Catalog Metastore with DataFlow
Using Data Catalog Metastore with PySpark
Using Livy on the Big Data Service
Visual Genome Repository
Visualizing Data
Working with Pipelines
XGBoost with RAPIDS
Medical data using feature store
Storage of hugging face embeddings using feature store
Storage of open ai embeddings using feature store
Synthetic data generation using feature store
PII redaction using feature store
Querying operations using feature store
Quickstart for feature store
Schema evolution and schema enforcement using feature store
Big data operations using feature store
Streaming operations using feature store

Notebooks

- Building and Explaining an Anomaly Detector using AutoMLx - Experimental

_{Updated: 05/29/2023}

`automlx-anomaly_detection.ipynb`

Build an anomaly detection model using the experimental, fully unsupervised anomaly detection pipeline in Oracle AutoMLx for the public Credit Card Fraud dataset.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v2

automlx anomaly detection

_{Universal Permissive License v 1.0}

- Building and Explaining a Classifier using AutoMLx

_{Updated: 05/29/2023}

`automlx-classifier.ipynb`

Build a classifier using the Oracle AutoMLx tool and binary data set of Census income data.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx classification classifier

_{Universal Permissive License v 1.0}

- Fairness with AutoMLx

_{Updated: 05/29/2023}

`automlx-fairness.ipynb`

Develop a model and evaluate its fairness

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx fairness

_{Universal Permissive License v 1.0}

- Building and Explaining a Regressor using AutoMLx

_{Updated: 05/29/2023}

`automlx-regression.ipynb`

Build a regressor using Oracle AutoMLx and a pricing data set. Training options will be explored and the resulting AutoMLx models will be evaluated.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx regression

_{Universal Permissive License v 1.0}

- Building and Explaining a Text Classifier using AutoMLx

_{Updated: 05/29/2023}

`automlx-text_classification.ipynb`

build a classifier using the Oracle AutoMLx tool for the public 20newsgroup dataset

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx text classification text classifier

_{Universal Permissive License v 1.0.}

- Audi Autonomous Driving Dataset Repository

_{Updated: 03/30/2023}

`audi-autonomous_driving-oracle_open_data.ipynb`

Download, process and display autonomous driving data, and map LiDAR data onto images.

This notebook was developed on the conda pack with slug: computervision_p37_cpu_v1

autonomous driving oracle open data

_{Universal Permissive License v 1.0}

- Using Livy on the Big Data Service

_{Updated: 03/26/2023}

`big_data_service-(BDS)-livy.ipynb`

Work interactively with a BDS cluster using Livy and two different connection techniques, SparkMagic (for a notebook environment) and with REST.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

bds big data service livy

_{Universal Permissive License v 1.0}

- How to Read Data with fsspec from Oracle Big Data Service (BDS)

_{Updated: 03/29/2023}

`read-write-big_data_service-(BDS).ipynb`

Manage data using fsspec file system. Read and save data using pandas and pyarrow through fsspec file system.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

bds fsspec

_{Universal Permissive License v 1.0}

- Caltech Pedestrian Detection Benchmark Repository

_{Updated: 03/30/2023}

`caltech-pedestrian_detection-oracle_open_data.ipynb`

Download and process annotated video data of vehicles and pedestrians.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

caltech pedestrian detection oracle open data

_{Universal Permissive License v 1.0}

- Using Data Catalog Metastore with DataFlow

_{Updated: 03/26/2023}

`pyspark-data_catalog-hive_metastore-data_flow.ipynb`

Write and test a Data Flow batch application using the Oracle Cloud Infrastructure (OCI) Data Catalog Metastore. Configure the job, run the application and clean up resources.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

data catalog metastore data flow

_{Universal Permissive License v 1.0}

- Text Classification with Data Labeling Service Integration

_{Updated: 03/30/2023}

`data_labeling-text_classification.ipynb`

Use the Oracle Cloud Infrastructure (OCI) Data Labeling service to efficiently build enriched, labeled datasets for the purpose of accurately training AI/ML models. This notebook demonstrates operations that can be performed using the Advanced Data Science (ADS) Data Labeling module.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

data labeling text classification

_{Universal Permissive License v 1.0}

- Visualizing Data

_{Updated: 03/30/2023}

`visualizing_data-exploring_data.ipynb`

Perform common data visualization tasks and explore data with the ADS SDK. Plotting approaches include 3D plots, pie chart, GIS plots, and Seaborn pairplot graphs.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

data visualization seaborn plot charts

_{Universal Permissive License v 1.0}

- Using Data Catalog Metastore with PySpark

_{Updated: 03/30/2023}

`pyspark-data_catalog-hive_metastore.ipynb`

Configure and use PySpark to process data in the Oracle Cloud Infrastructure (OCI) Data Catalog metastore, including common operations like creating and loading data from the metastore.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

dcat data catalog metastore pyspark

_{Universal Permissive License v 1.0}

- Train, Register, and Deploy a Generic Model

_{Updated: 03/26/2023}

`train-register-deploy-other-frameworks.ipynb`

Train, register, and deploy a generic model

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

generic model deploy model register model train model

_{Universal Permissive License v 1.0}

- Bank Graph Example Notebook

_{Updated: 06/05/2023}

`graph_insight-autonomous_database.ipynb`

Access

This notebook was developed on the conda pack with slug: pypgx2310_p38_cpu_v1

graph_insight autonomous_database

_{Universal Permissive License v 1.0}

- Train, register, and deploy HuggingFace Pipeline

_{Updated: 03/26/2023}

`train-register-deploy-huggingface-pipeline.ipynb`

Train, register, and deploy a huggingface pipeline.

This notebook was developed on the conda pack with slug: pytorch110_p38_cpu_v1

huggingface deploy model register model train model

_{Universal Permissive License v 1.0}

- Introduction to ADSTuner

_{Updated: 03/30/2023}

`hyperparameter_tuning.ipynb`

Use ADSTuner to optimize an estimator using the scikit-learn API

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

hyperparameter tuning

_{Universal Permissive License v 1.0}

- Intel Extension for Scikit-Learn

_{Updated: 03/26/2023}

`accelerate-scikit_learn-with-intel_extension.ipynb`

Enhance performance of scikit-learn models using the Intel(R) oneAPI Data Analytics Library. Train a k-means model using both sklearn and the accelerated Intel library and compare performance.

This notebook was developed on the conda pack with slug: sklearnex202130_p37_cpu_v1

intel intel extension scikit-learn scikit learn

_{Universal Permissive License v 1.0}

- Connect to Oracle Big Data Service

_{Updated: 03/27/2023}

`big_data_service-(BDS)-kerberos.ipynb`

Connect to Oracle Big Data services using Kerberos.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

kerberos big data service bds

_{Universal Permissive License v 1.0}

- Natural Language Processing

_{Updated: 03/26/2023}

`natural_language_processing.ipynb`

Use the ADS SDK to process and manipulate strings. This notebook includes regular expression matching and natural language (NLP) parsing, including part-of-speech tagging, named entity recognition, and sentiment analysis. It also shows how to create and use custom plugins specific to your specific needs.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

language services string manipulation regex regular expression natural language processing NLP part-of-speech tagging named entity recognition sentiment analysis custom plugins

_{Universal Permissive License v 1.0}

- Building a Forecaster using AutoMLx

_{Updated: 05/29/2023}

`automlx-forecasting.ipynb`

Use Oracle AutoMLx to build a forecast model with real-world data sets.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

language services string manipulation regex regular expression natural language processing NLP part-of-speech tagging named entity recognition sentiment analysis custom plugins

_{Universal Permissive License v 1.0}

- Train, Register, and Deploy a LightGBM Model

_{Updated: 03/26/2023}

`train-register-deploy-lightgbm.ipynb`

Train, register, and deploy a LightGBM model.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

lightgbm deploy model register model train model

_{Universal Permissive License v 1.0}

- Loading Data With Pandas & Dask

_{Updated: 03/26/2023}

`load_data-object_storage-hive-autonomous-database.ipynb`

Load data from sources including ADW, Object Storage, and Hive in formats like parquet, csv etc

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

loading data autonomous database adw hive pandas dask object storage

_{Universal Permissive License v 1.0}

- Introduction to Model Version Set

_{Updated: 03/26/2023}

`model_version_set.ipynb`

A model version set is a way to track the relationships between models. As a container, the model version set takes a collection of models. Those models are assigned a sequential version number based on the order they are entered into the model version set.

This notebook was developed on the conda pack with slug: dbexp_p38_cpu_v1

model model experiments model version set

_{Universal Permissive License v 1.0}

- Model Evaluation with ADSEvaluator

_{Updated: 03/30/2023}

`model_evaluation-with-ADSEvaluator.ipynb`

Train and evaluate different types of models: binary classification using an imbalanced dataset, multi-class classification using a synthetically generated dataset consisting of three equally distributed classes, and a regression using a synthetically generated dataset with positive targets.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

model evaluation binary classification regression multi-class classification imbalanced dataset synthetic dataset

_{Universal Permissive License v 1.0}

- Text Classification and Model Explanations using LIME

_{Updated: 03/30/2023}

`text_classification-model_explanation-lime.ipynb`

Perform model explanations on an NLP classifier using the locally interpretable model explanations technique (LIME).

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

nlp lime model_explanation text_classification text_explanation

_{Universal Permissive License v 1.0}

- Visual Genome Repository

_{Updated: 03/30/2023}

`genome_visualization-oracle_open_data.ipynb`

Load visual data, define regions, and visualize objects using metadata to connect structured images to language.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

object annotation genome visualization oracle open data

_{Universal Permissive License v 1.0 (https://oss.oracle.com/licenses/upl/)}

- ONNX Integration with the Accelerated Data Science (ADS) SDK

_{Updated: 07/17/2023}

`onnx-integration-ads.ipynb`

Extract text from common formats (e.g. PDF and Word) into plain text. Customize this process for individual use cases.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

onnx deploy model

_{Universal Permissive License v 1.0}

- Working with Pipelines

_{Updated: 03/26/2023}

`pipelines-ml_lifecycle.ipynb`

Create and use ML pipelines through the entire machine learning lifecycle

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

pipelines pipeline step jobs pipeline

_{Universal Permissive License v 1.0}

- Graph Analytics and Graph Machine Learning with PyPGX

_{Updated: 03/26/2023}

`pypgx-graph_analytics-machine_learning.ipynb`

Use Oracle's Graph Analytics libraries to demonstrate graph algorithms, graph machine learning models, and use the property graph query language (PGQL)

This notebook was developed on the conda pack with slug: pypgx2310_p38_cpu_v1

pypgx graph analytics pgx

_{Universal Permissive License v 1.0}

- Introduction to the Oracle Cloud Infrastructure Data Flow Studio

_{Updated: 03/26/2023}

`pyspark-data_flow_studio-introduction.ipynb`

Run interactive Spark workloads on a long lasting Oracle Cloud Infrastructure Data Flow Spark cluster through Apache Livy integration. Data Flow Spark Magic is used for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. It includes a set of magic commands for interactively running Spark code.

This notebook was developed on the conda pack with slug: pyspark32_p38_cpu_v2

pyspark data flow

_{Universal Permissive License v 1.0}

- Spark NLP within Oracle Cloud Infrastructure Data Flow Studio

_{Updated: 03/26/2023}

`pyspark-data_flow_studio-spark_nlp.ipynb`

Demonstrates how to use Spark NLP within a long lasting Oracle Cloud Infrastructure Data Flow cluster.

This notebook was developed on the conda pack with slug: pyspark32_p38_cpu_v1

pyspark data flow

_{Universal Permissive License v 1.0}

- PySpark

_{Updated: 06/02/2023}

`pyspark-data_flow-application.ipynb`

Develop local PySpark applications and work with remote clusters using Data Flow.

This notebook was developed on the conda pack with slug: pyspark24_p37_cpu_v3

pyspark data flow

_{Universal Permissive License v 1.0}

- Train, Register, and Deploy a PyTorch Model

_{Updated: 03/26/2023}

`train-register-deploy-pytorch.ipynb`

Train, register, and deploy a PyTorch model.

This notebook was developed on the conda pack with slug: pytorch110_p38_cpu_v1

pytorch deploy model register model train model

_{Universal Permissive License v 1.0}

- Train, register, and deploy Sklearn Model

_{Updated: 03/26/2023}

`train-register-deploy-sklearn.ipynb`

Train, register, and deploy an scikit-learn model.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

scikit-learn deploy model register model train model

_{Universal Permissive License v 1.0}

- Introduction to SQL Magic

_{Updated: 03/30/2023}

`sql_magic-commands-with-autonomous_database.ipynb`

Use SQL Magic commands to work with a database within a Jupyter notebook. This notebook shows how to to use both line and cell magics.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

sql magic autonomous database

_{Universal Permissive License v 1.0}

- Introduction to Streaming

_{Updated: 03/30/2023}

`streaming-service-introduction.ipynb`

Connect to Oracle Cloud Insfrastructure (OCI) Streaming service with kafka.

This notebook was developed on the conda pack with slug: dataexpl_p37_cpu_v3

streaming kafka

_{Universal Permissive License v 1.0}

- Train, Register, and Deploy a TensorFlow Model

_{Updated: 03/26/2023}

`train-register-deploy-tensorflow.ipynb`

Train, register, and deploy a TensorFlow model.

This notebook was developed on the conda pack with slug: tensorflow28_p38_cpu_v1

tensorflow deploy model register model train model

_{Universal Permissive License v 1.0}

- Text Extraction Using the Accelerated Data Science (ADS) SDK

_{Updated: 03/26/2023}

`document-text_extraction.ipynb`

Extract text from common formats (e.g. PDF and Word) into plain text. Customize this process for individual use cases.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

text extraction nlp

_{Universal Permissive License v 1.0}

- Train, Register, and Deploy an XGBoost Model

_{Updated: 03/26/2023}

`train-register-deploy-xgboost.ipynb`

Train, register, and deploy an XGBoost model.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

xgboost deploy model register model train model

_{Universal Permissive License v 1.0}

- XGBoost with RAPIDS

_{Updated: 03/30/2023}

`xgboost-with-rapids.ipynb`

Compare training time between CPU and GPU trained models using XGBoost

This notebook was developed on the conda pack with slug: rapids2110_p37_gpu_v1

xgboost rapids gpu machine learning classification

_{Universal Permissive License v 1.0}

- Medical Data Management Using Feature Store

_{Updated: 11/13/2023}

`feature_store_ehr_data.ipynb`

Manage and utilize medical data efficiently using a feature store. This notebook demonstrates the storage, retrieval, and manipulation of Electronic Health Record (EHR) data within a feature store framework.

feature store medical data data management

_{License: Universal Permissive License v 1.0}

- Storage of Hugging Face Embeddings Using Feature Store

_{Updated: 11/13/2023}

`feature_store_embeddings.ipynb`

Explore the storage and retrieval of Hugging Face embeddings within a feature store setup. This notebook provides insights into storing and utilizing pre-trained embeddings for various natural language processing tasks using the feature store infrastructure.

feature store Hugging Face embeddings storage

_{License: Universal Permissive License v 1.0}

- Storage of OpenAI Embeddings Using Feature Store

_{Updated: 11/13/2023}

`feature_store_embeddings_openai.ipynb`

Learn how to store and leverage OpenAI embeddings effectively within a feature store environment. This notebook guides users through the process of managing and utilizing OpenAI-generated embeddings for diverse machine learning applications within a feature store framework.

feature store OpenAI embeddings storage

_{License: Universal Permissive License v 1.0}

- Synthetic Data Generation Using Feature Store

_{Updated: 11/13/2023}

`feature_store_medical_synthetic_data_openai.ipynb`

Generate synthetic medical data leveraging OpenAI tools within a feature store. This notebook illustrates the process of creating synthetic medical data for various research and analysis purposes using the capabilities of a feature store.

feature store synthetic data generation medical data

_{License: Universal Permissive License v 1.0}

- PII Redaction Using Feature Store

_{Updated: 11/13/2023}

`feature_store_pii_redaction_and_transformation.ipynb`

Explore techniques and methods for Personally Identifiable Information (PII) redaction and transformation within a feature store environment. This notebook demonstrates how to manage sensitive data securely by implementing PII masking and transformation techniques using a feature store.

feature store PII redaction data security

_{License: Universal Permissive License v 1.0}

- Querying Operations Using Feature Store

_{Updated: 11/13/2023}

`feature_store_querying.ipynb`

Understand and perform querying operations within a feature store setup. This notebook covers querying techniques, data retrieval, and manipulation strategies to efficiently access and utilize stored features in a feature store environment.

feature store querying data operations

_{License: Universal Permissive License v 1.0}

- Quickstart for Feature Store

_{Updated: 11/13/2023}

`feature_store_quickstart.ipynb`

Get started quickly with a feature store setup using this introductory notebook. It provides step-by-step guidance and essential information for setting up and utilizing a feature store environment for efficient data management and analysis.

feature store quickstart data management

_{License: Universal Permissive License v 1.0}

- Schema Evolution and Schema Enforcement Using Feature Store

_{Updated: 11/13/2023}

`feature_store_schema_evolution.ipynb`

Learn about schema evolution and enforcement techniques within a feature store. This notebook explores methods to handle schema changes, enforce data integrity, and manage evolving data structures effectively in a feature store environment.

feature store schema evolution data integrity

_{License: Universal Permissive License v 1.0}

- Big Data Operations Using Feature Store

_{Updated: 11/13/2023}

`feature_store_spark_magic.ipynb`

Explore big data operations within a feature store using Spark magic commands. This notebook demonstrates how to leverage the power of Spark for efficient data handling and analysis in a feature store environment.

feature store big data operations Spark

_{License: Universal Permissive License v 1.0}

- Streaming Operations Using Feature Store

_{Updated: 11/13/2023}

`feature_store_streaming_data_frame.ipynb`

Explore streaming operations within a feature store using Spark. This notebook demonstrates how to leverage the power of Spark Streaming for efficient data handling and analysis in a feature store environment.

feature store big data operations Spark Spark Streaming

_{License: Universal Permissive License v 1.0}

Files

README.md

Latest commit

History

README.md

File metadata and controls

ADS Expertise Notebooks

Topics

Contents

Notebooks

- Building and Explaining an Anomaly Detector using AutoMLx - Experimental

- Building and Explaining a Classifier using AutoMLx

- Fairness with AutoMLx

- Building and Explaining a Regressor using AutoMLx

- Building and Explaining a Text Classifier using AutoMLx

- Audi Autonomous Driving Dataset Repository

- Using Livy on the Big Data Service

- How to Read Data with fsspec from Oracle Big Data Service (BDS)

- Caltech Pedestrian Detection Benchmark Repository

- Using Data Catalog Metastore with DataFlow

- Text Classification with Data Labeling Service Integration

- Visualizing Data

- Using Data Catalog Metastore with PySpark

- Train, Register, and Deploy a Generic Model

- Bank Graph Example Notebook

- Train, register, and deploy HuggingFace Pipeline

- Introduction to ADSTuner

- Intel Extension for Scikit-Learn

- Connect to Oracle Big Data Service

- Natural Language Processing

- Building a Forecaster using AutoMLx

- Train, Register, and Deploy a LightGBM Model

- Loading Data With Pandas & Dask

- Introduction to Model Version Set

- Model Evaluation with ADSEvaluator

- Text Classification and Model Explanations using LIME

- Visual Genome Repository

- ONNX Integration with the Accelerated Data Science (ADS) SDK

- Working with Pipelines

- Graph Analytics and Graph Machine Learning with PyPGX

- Introduction to the Oracle Cloud Infrastructure Data Flow Studio

- Spark NLP within Oracle Cloud Infrastructure Data Flow Studio

- PySpark

- Train, Register, and Deploy a PyTorch Model

- Train, register, and deploy Sklearn Model

- Introduction to SQL Magic

- Introduction to Streaming

- Train, Register, and Deploy a TensorFlow Model

- Text Extraction Using the Accelerated Data Science (ADS) SDK

- Train, Register, and Deploy an XGBoost Model

- XGBoost with RAPIDS

- Medical Data Management Using Feature Store

- Storage of Hugging Face Embeddings Using Feature Store

- Storage of OpenAI Embeddings Using Feature Store

- Synthetic Data Generation Using Feature Store

- PII Redaction Using Feature Store

- Querying Operations Using Feature Store

- Quickstart for Feature Store

- Schema Evolution and Schema Enforcement Using Feature Store

- Big Data Operations Using Feature Store

- Streaming Operations Using Feature Store