Skip to content

Latest commit

 

History

History
858 lines (458 loc) · 35.9 KB

File metadata and controls

858 lines (458 loc) · 35.9 KB

ADS Expertise Notebooks

The Accelerated Data Science (ADS) SDK is maintained by the Oracle Cloud Infrastructure Data Science service team. It speeds up common data science activities by providing tools that automate and/or simplify common data science tasks, along with providing a data scientist friendly pythonic interface to Oracle Cloud Infrastructure (OCI) services, most notably OCI Data Science, Data Flow, Object Storage, and the Autonomous Database. ADS gives you an interface to manage the lifecycle of machine learning models, from data acquisition to model evaluation, interpretation, and model deployment.

The ADS SDK can be downloaded from PyPi, contributions welcome on GitHub

PyPI Python

Topics

Contents

Notebooks

- Building and Explaining an Anomaly Detector using AutoMLx - Experimental

Updated: 05/29/2023

Build an anomaly detection model using the experimental, fully unsupervised anomaly detection pipeline in Oracle AutoMLx for the public Credit Card Fraud dataset.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v2

automlx anomaly detection

Universal Permissive License v 1.0


- Building and Explaining a Classifier using AutoMLx

Updated: 05/29/2023

Build a classifier using the Oracle AutoMLx tool and binary data set of Census income data.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx classification classifier

Universal Permissive License v 1.0


- Fairness with AutoMLx

Updated: 05/29/2023

Develop a model and evaluate its fairness

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx fairness

Universal Permissive License v 1.0


- Building and Explaining a Regressor using AutoMLx

Updated: 05/29/2023

Build a regressor using Oracle AutoMLx and a pricing data set. Training options will be explored and the resulting AutoMLx models will be evaluated.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx regression

Universal Permissive License v 1.0


- Building and Explaining a Text Classifier using AutoMLx

Updated: 05/29/2023

build a classifier using the Oracle AutoMLx tool for the public 20newsgroup dataset

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

automlx text classification text classifier

Universal Permissive License v 1.0.


- Audi Autonomous Driving Dataset Repository

Updated: 03/30/2023

Download, process and display autonomous driving data, and map LiDAR data onto images.

This notebook was developed on the conda pack with slug: computervision_p37_cpu_v1

autonomous driving oracle open data

Universal Permissive License v 1.0


- Using Livy on the Big Data Service

Updated: 03/26/2023

Work interactively with a BDS cluster using Livy and two different connection techniques, SparkMagic (for a notebook environment) and with REST.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

bds big data service livy

Universal Permissive License v 1.0


- How to Read Data with fsspec from Oracle Big Data Service (BDS)

Updated: 03/29/2023

Manage data using fsspec file system. Read and save data using pandas and pyarrow through fsspec file system.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

bds fsspec

Universal Permissive License v 1.0


- Caltech Pedestrian Detection Benchmark Repository

Updated: 03/30/2023

Download and process annotated video data of vehicles and pedestrians.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

caltech pedestrian detection oracle open data

Universal Permissive License v 1.0


- Using Data Catalog Metastore with DataFlow

Updated: 03/26/2023

Write and test a Data Flow batch application using the Oracle Cloud Infrastructure (OCI) Data Catalog Metastore. Configure the job, run the application and clean up resources.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

data catalog metastore data flow

Universal Permissive License v 1.0


- Text Classification with Data Labeling Service Integration

Updated: 03/30/2023

Use the Oracle Cloud Infrastructure (OCI) Data Labeling service to efficiently build enriched, labeled datasets for the purpose of accurately training AI/ML models. This notebook demonstrates operations that can be performed using the Advanced Data Science (ADS) Data Labeling module.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

data labeling text classification

Universal Permissive License v 1.0


- Visualizing Data

Updated: 03/30/2023

Perform common data visualization tasks and explore data with the ADS SDK. Plotting approaches include 3D plots, pie chart, GIS plots, and Seaborn pairplot graphs.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

data visualization seaborn plot charts

Universal Permissive License v 1.0


- Using Data Catalog Metastore with PySpark

Updated: 03/30/2023

Configure and use PySpark to process data in the Oracle Cloud Infrastructure (OCI) Data Catalog metastore, including common operations like creating and loading data from the metastore.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

dcat data catalog metastore pyspark

Universal Permissive License v 1.0


- Train, Register, and Deploy a Generic Model

Updated: 03/26/2023

Train, register, and deploy a generic model

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

generic model deploy model register model train model

Universal Permissive License v 1.0


- Bank Graph Example Notebook

Updated: 06/05/2023

Access

This notebook was developed on the conda pack with slug: pypgx2310_p38_cpu_v1

graph_insight autonomous_database

Universal Permissive License v 1.0


- Train, register, and deploy HuggingFace Pipeline

Updated: 03/26/2023

Train, register, and deploy a huggingface pipeline.

This notebook was developed on the conda pack with slug: pytorch110_p38_cpu_v1

huggingface deploy model register model train model

Universal Permissive License v 1.0


- Introduction to ADSTuner

Updated: 03/30/2023

Use ADSTuner to optimize an estimator using the scikit-learn API

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

hyperparameter tuning

Universal Permissive License v 1.0


- Intel Extension for Scikit-Learn

Updated: 03/26/2023

Enhance performance of scikit-learn models using the Intel(R) oneAPI Data Analytics Library. Train a k-means model using both sklearn and the accelerated Intel library and compare performance.

This notebook was developed on the conda pack with slug: sklearnex202130_p37_cpu_v1

intel intel extension scikit-learn scikit learn

Universal Permissive License v 1.0


- Connect to Oracle Big Data Service

Updated: 03/27/2023

Connect to Oracle Big Data services using Kerberos.

This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5

kerberos big data service bds

Universal Permissive License v 1.0


- Natural Language Processing

Updated: 03/26/2023

Use the ADS SDK to process and manipulate strings. This notebook includes regular expression matching and natural language (NLP) parsing, including part-of-speech tagging, named entity recognition, and sentiment analysis. It also shows how to create and use custom plugins specific to your specific needs.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

language services string manipulation regex regular expression natural language processing NLP part-of-speech tagging named entity recognition sentiment analysis custom plugins

Universal Permissive License v 1.0


- Building a Forecaster using AutoMLx

Updated: 05/29/2023

Use Oracle AutoMLx to build a forecast model with real-world data sets.

This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3

language services string manipulation regex regular expression natural language processing NLP part-of-speech tagging named entity recognition sentiment analysis custom plugins

Universal Permissive License v 1.0


- Train, Register, and Deploy a LightGBM Model

Updated: 03/26/2023

Train, register, and deploy a LightGBM model.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

lightgbm deploy model register model train model

Universal Permissive License v 1.0


- Loading Data With Pandas & Dask

Updated: 03/26/2023

Load data from sources including ADW, Object Storage, and Hive in formats like parquet, csv etc

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

loading data autonomous database adw hive pandas dask object storage

Universal Permissive License v 1.0


- Introduction to Model Version Set

Updated: 03/26/2023

A model version set is a way to track the relationships between models. As a container, the model version set takes a collection of models. Those models are assigned a sequential version number based on the order they are entered into the model version set.

This notebook was developed on the conda pack with slug: dbexp_p38_cpu_v1

model model experiments model version set

Universal Permissive License v 1.0


- Model Evaluation with ADSEvaluator

Updated: 03/30/2023

Train and evaluate different types of models: binary classification using an imbalanced dataset, multi-class classification using a synthetically generated dataset consisting of three equally distributed classes, and a regression using a synthetically generated dataset with positive targets.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

model evaluation binary classification regression multi-class classification imbalanced dataset synthetic dataset

Universal Permissive License v 1.0


- Text Classification and Model Explanations using LIME

Updated: 03/30/2023

Perform model explanations on an NLP classifier using the locally interpretable model explanations technique (LIME).

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

nlp lime model_explanation text_classification text_explanation

Universal Permissive License v 1.0


- Visual Genome Repository

Updated: 03/30/2023

Load visual data, define regions, and visualize objects using metadata to connect structured images to language.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

object annotation genome visualization oracle open data

Universal Permissive License v 1.0 (https://oss.oracle.com/licenses/upl/)


- ONNX Integration with the Accelerated Data Science (ADS) SDK

Updated: 07/17/2023

Extract text from common formats (e.g. PDF and Word) into plain text. Customize this process for individual use cases.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

onnx deploy model

Universal Permissive License v 1.0


- Working with Pipelines

Updated: 03/26/2023

Create and use ML pipelines through the entire machine learning lifecycle

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

pipelines pipeline step jobs pipeline

Universal Permissive License v 1.0


- Graph Analytics and Graph Machine Learning with PyPGX

Updated: 03/26/2023

Use Oracle's Graph Analytics libraries to demonstrate graph algorithms, graph machine learning models, and use the property graph query language (PGQL)

This notebook was developed on the conda pack with slug: pypgx2310_p38_cpu_v1

pypgx graph analytics pgx

Universal Permissive License v 1.0


- Introduction to the Oracle Cloud Infrastructure Data Flow Studio

Updated: 03/26/2023

Run interactive Spark workloads on a long lasting Oracle Cloud Infrastructure Data Flow Spark cluster through Apache Livy integration. Data Flow Spark Magic is used for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. It includes a set of magic commands for interactively running Spark code.

This notebook was developed on the conda pack with slug: pyspark32_p38_cpu_v2

pyspark data flow

Universal Permissive License v 1.0


- Spark NLP within Oracle Cloud Infrastructure Data Flow Studio

Updated: 03/26/2023

Demonstrates how to use Spark NLP within a long lasting Oracle Cloud Infrastructure Data Flow cluster.

This notebook was developed on the conda pack with slug: pyspark32_p38_cpu_v1

pyspark data flow

Universal Permissive License v 1.0


- PySpark

Updated: 06/02/2023

Develop local PySpark applications and work with remote clusters using Data Flow.

This notebook was developed on the conda pack with slug: pyspark24_p37_cpu_v3

pyspark data flow

Universal Permissive License v 1.0


- Train, Register, and Deploy a PyTorch Model

Updated: 03/26/2023

Train, register, and deploy a PyTorch model.

This notebook was developed on the conda pack with slug: pytorch110_p38_cpu_v1

pytorch deploy model register model train model

Universal Permissive License v 1.0


- Train, register, and deploy Sklearn Model

Updated: 03/26/2023

Train, register, and deploy an scikit-learn model.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

scikit-learn deploy model register model train model

Universal Permissive License v 1.0


- Introduction to SQL Magic

Updated: 03/30/2023

Use SQL Magic commands to work with a database within a Jupyter notebook. This notebook shows how to to use both line and cell magics.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

sql magic autonomous database

Universal Permissive License v 1.0


- Introduction to Streaming

Updated: 03/30/2023

Connect to Oracle Cloud Insfrastructure (OCI) Streaming service with kafka.

This notebook was developed on the conda pack with slug: dataexpl_p37_cpu_v3

streaming kafka

Universal Permissive License v 1.0


- Train, Register, and Deploy a TensorFlow Model

Updated: 03/26/2023

Train, register, and deploy a TensorFlow model.

This notebook was developed on the conda pack with slug: tensorflow28_p38_cpu_v1

tensorflow deploy model register model train model

Universal Permissive License v 1.0


- Text Extraction Using the Accelerated Data Science (ADS) SDK

Updated: 03/26/2023

Extract text from common formats (e.g. PDF and Word) into plain text. Customize this process for individual use cases.

This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2

text extraction nlp

Universal Permissive License v 1.0


- Train, Register, and Deploy an XGBoost Model

Updated: 03/26/2023

Train, register, and deploy an XGBoost model.

This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1

xgboost deploy model register model train model

Universal Permissive License v 1.0


- XGBoost with RAPIDS

Updated: 03/30/2023

Compare training time between CPU and GPU trained models using XGBoost

This notebook was developed on the conda pack with slug: rapids2110_p37_gpu_v1

xgboost rapids gpu machine learning classification

Universal Permissive License v 1.0


- Medical Data Management Using Feature Store

Updated: 11/13/2023

Manage and utilize medical data efficiently using a feature store. This notebook demonstrates the storage, retrieval, and manipulation of Electronic Health Record (EHR) data within a feature store framework.

feature store medical data data management

License: Universal Permissive License v 1.0


- Storage of Hugging Face Embeddings Using Feature Store

Updated: 11/13/2023

Explore the storage and retrieval of Hugging Face embeddings within a feature store setup. This notebook provides insights into storing and utilizing pre-trained embeddings for various natural language processing tasks using the feature store infrastructure.

feature store Hugging Face embeddings storage

License: Universal Permissive License v 1.0


- Storage of OpenAI Embeddings Using Feature Store

Updated: 11/13/2023

Learn how to store and leverage OpenAI embeddings effectively within a feature store environment. This notebook guides users through the process of managing and utilizing OpenAI-generated embeddings for diverse machine learning applications within a feature store framework.

feature store OpenAI embeddings storage

License: Universal Permissive License v 1.0


- Synthetic Data Generation Using Feature Store

Updated: 11/13/2023

Generate synthetic medical data leveraging OpenAI tools within a feature store. This notebook illustrates the process of creating synthetic medical data for various research and analysis purposes using the capabilities of a feature store.

feature store synthetic data generation medical data

License: Universal Permissive License v 1.0


- PII Redaction Using Feature Store

Updated: 11/13/2023

Explore techniques and methods for Personally Identifiable Information (PII) redaction and transformation within a feature store environment. This notebook demonstrates how to manage sensitive data securely by implementing PII masking and transformation techniques using a feature store.

feature store PII redaction data security

License: Universal Permissive License v 1.0


- Querying Operations Using Feature Store

Updated: 11/13/2023

Understand and perform querying operations within a feature store setup. This notebook covers querying techniques, data retrieval, and manipulation strategies to efficiently access and utilize stored features in a feature store environment.

feature store querying data operations

License: Universal Permissive License v 1.0


- Quickstart for Feature Store

Updated: 11/13/2023

Get started quickly with a feature store setup using this introductory notebook. It provides step-by-step guidance and essential information for setting up and utilizing a feature store environment for efficient data management and analysis.

feature store quickstart data management

License: Universal Permissive License v 1.0


- Schema Evolution and Schema Enforcement Using Feature Store

Updated: 11/13/2023

Learn about schema evolution and enforcement techniques within a feature store. This notebook explores methods to handle schema changes, enforce data integrity, and manage evolving data structures effectively in a feature store environment.

feature store schema evolution data integrity

License: Universal Permissive License v 1.0


- Big Data Operations Using Feature Store

Updated: 11/13/2023

Explore big data operations within a feature store using Spark magic commands. This notebook demonstrates how to leverage the power of Spark for efficient data handling and analysis in a feature store environment.

feature store big data operations Spark

License: Universal Permissive License v 1.0


- Streaming Operations Using Feature Store

Updated: 11/13/2023

Explore streaming operations within a feature store using Spark. This notebook demonstrates how to leverage the power of Spark Streaming for efficient data handling and analysis in a feature store environment.

feature store big data operations Spark Spark Streaming

License: Universal Permissive License v 1.0