The Accelerated Data Science (ADS) SDK is maintained by the Oracle Cloud Infrastructure Data Science service team. It speeds up common data science activities by providing tools that automate and/or simplify common data science tasks, along with providing a data scientist friendly pythonic interface to Oracle Cloud Infrastructure (OCI) services, most notably OCI Data Science, Data Flow, Object Storage, and the Autonomous Database. ADS gives you an interface to manage the lifecycle of machine learning models, from data acquisition to model evaluation, interpretation, and model deployment.
The ADS SDK can be downloaded from PyPi, contributions welcome on GitHub
- Audi Autonomous Driving Dataset Repository
- Bank Graph Example Notebook
- Building a Forecaster using AutoMLx
- Building and Explaining a Classifier using AutoMLx
- Building and Explaining a Regressor using AutoMLx
- Building and Explaining a Text Classifier using AutoMLx
- Building and Explaining an Anomaly Detector using AutoMLx - Experimental
- Caltech Pedestrian Detection Benchmark Repository
- Connect to Oracle Big Data Service
- Fairness with AutoMLx
- Graph Analytics and Graph Machine Learning with PyPGX
- How to Read Data with fsspec from Oracle Big Data Service (BDS)
- Intel Extension for Scikit-Learn
- Introduction to ADSTuner
- Introduction to Model Version Set
- Introduction to SQL Magic
- Introduction to Streaming
- Introduction to the Oracle Cloud Infrastructure Data Flow Studio
- Loading Data With Pandas & Dask
- Model Evaluation with ADSEvaluator
- Natural Language Processing
- ONNX Integration with the Accelerated Data Science (ADS) SDK
- PySpark
- Spark NLP within Oracle Cloud Infrastructure Data Flow Studio
- Text Classification and Model Explanations using LIME
- Text Classification with Data Labeling Service Integration
- Text Extraction Using the Accelerated Data Science (ADS) SDK
- Train, Register, and Deploy a Generic Model
- Train, Register, and Deploy a LightGBM Model
- Train, Register, and Deploy a PyTorch Model
- Train, Register, and Deploy a TensorFlow Model
- Train, Register, and Deploy an XGBoost Model
- Train, register, and deploy HuggingFace Pipeline
- Train, register, and deploy Sklearn Model
- Using Data Catalog Metastore with DataFlow
- Using Data Catalog Metastore with PySpark
- Using Livy on the Big Data Service
- Visual Genome Repository
- Visualizing Data
- Working with Pipelines
- XGBoost with RAPIDS
- Medical data using feature store
- Storage of hugging face embeddings using feature store
- Storage of open ai embeddings using feature store
- Synthetic data generation using feature store
- PII redaction using feature store
- Querying operations using feature store
- Quickstart for feature store
- Schema evolution and schema enforcement using feature store
- Big data operations using feature store
- Streaming operations using feature store
Updated: 05/29/2023
Build an anomaly detection model using the experimental, fully unsupervised anomaly detection pipeline in Oracle AutoMLx for the public Credit Card Fraud dataset.
This notebook was developed on the conda pack with slug: automlx_p38_cpu_v2
automlx
anomaly detection
Universal Permissive License v 1.0
Updated: 05/29/2023
Build a classifier using the Oracle AutoMLx tool and binary data set of Census income data.
This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3
automlx
classification
classifier
Universal Permissive License v 1.0
Updated: 05/29/2023
Develop a model and evaluate its fairness
This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3
automlx
fairness
Universal Permissive License v 1.0
Updated: 05/29/2023
Build a regressor using Oracle AutoMLx and a pricing data set. Training options will be explored and the resulting AutoMLx models will be evaluated.
This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3
automlx
regression
Universal Permissive License v 1.0
Updated: 05/29/2023
build a classifier using the Oracle AutoMLx tool for the public 20newsgroup dataset
This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3
automlx
text classification
text classifier
Universal Permissive License v 1.0.
Updated: 03/30/2023
Download, process and display autonomous driving data, and map LiDAR data onto images.
This notebook was developed on the conda pack with slug: computervision_p37_cpu_v1
autonomous driving
oracle open data
Universal Permissive License v 1.0
Updated: 03/26/2023
Work interactively with a BDS cluster using Livy and two different connection techniques, SparkMagic (for a notebook environment) and with REST.
This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5
bds
big data service
livy
Universal Permissive License v 1.0
Updated: 03/29/2023
Manage data using fsspec file system. Read and save data using pandas and pyarrow through fsspec file system.
This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5
bds
fsspec
Universal Permissive License v 1.0
Updated: 03/30/2023
Download and process annotated video data of vehicles and pedestrians.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
caltech
pedestrian detection
oracle open data
Universal Permissive License v 1.0
Updated: 03/26/2023
Write and test a Data Flow batch application using the Oracle Cloud Infrastructure (OCI) Data Catalog Metastore. Configure the job, run the application and clean up resources.
This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5
data catalog metastore
data flow
Universal Permissive License v 1.0
Updated: 03/30/2023
Use the Oracle Cloud Infrastructure (OCI) Data Labeling service to efficiently build enriched, labeled datasets for the purpose of accurately training AI/ML models. This notebook demonstrates operations that can be performed using the Advanced Data Science (ADS) Data Labeling module.
This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2
data labeling
text classification
Universal Permissive License v 1.0
Updated: 03/30/2023
Perform common data visualization tasks and explore data with the ADS SDK. Plotting approaches include 3D plots, pie chart, GIS plots, and Seaborn pairplot graphs.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
data visualization
seaborn plot
charts
Universal Permissive License v 1.0
Updated: 03/30/2023
Configure and use PySpark to process data in the Oracle Cloud Infrastructure (OCI) Data Catalog metastore, including common operations like creating and loading data from the metastore.
This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5
dcat
data catalog metastore
pyspark
Universal Permissive License v 1.0
Updated: 03/26/2023
Train, register, and deploy a generic model
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
generic model
deploy model
register model
train model
Universal Permissive License v 1.0
Updated: 06/05/2023
Access
This notebook was developed on the conda pack with slug: pypgx2310_p38_cpu_v1
graph_insight
autonomous_database
Universal Permissive License v 1.0
Updated: 03/26/2023
Train, register, and deploy a huggingface pipeline.
This notebook was developed on the conda pack with slug: pytorch110_p38_cpu_v1
huggingface
deploy model
register model
train model
Universal Permissive License v 1.0
Updated: 03/30/2023
Use ADSTuner to optimize an estimator using the scikit-learn API
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
hyperparameter tuning
Universal Permissive License v 1.0
Updated: 03/26/2023
Enhance performance of scikit-learn models using the Intel(R) oneAPI Data Analytics Library. Train a k-means model using both sklearn and the accelerated Intel library and compare performance.
This notebook was developed on the conda pack with slug: sklearnex202130_p37_cpu_v1
intel
intel extension
scikit-learn
scikit learn
Universal Permissive License v 1.0
Updated: 03/27/2023
Connect to Oracle Big Data services using Kerberos.
This notebook was developed on the conda pack with slug: pyspark30_p37_cpu_v5
kerberos
big data service
bds
Universal Permissive License v 1.0
Updated: 03/26/2023
Use the ADS SDK to process and manipulate strings. This notebook includes regular expression matching and natural language (NLP) parsing, including part-of-speech tagging, named entity recognition, and sentiment analysis. It also shows how to create and use custom plugins specific to your specific needs.
This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2
language services
string manipulation
regex
regular expression
natural language processing
NLP
part-of-speech tagging
named entity recognition
sentiment analysis
custom plugins
Universal Permissive License v 1.0
Updated: 05/29/2023
Use Oracle AutoMLx to build a forecast model with real-world data sets.
This notebook was developed on the conda pack with slug: automlx_p38_cpu_v3
language services
string manipulation
regex
regular expression
natural language processing
NLP
part-of-speech tagging
named entity recognition
sentiment analysis
custom plugins
Universal Permissive License v 1.0
Updated: 03/26/2023
Train, register, and deploy a LightGBM model.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
lightgbm
deploy model
register model
train model
Universal Permissive License v 1.0
Updated: 03/26/2023
Load data from sources including ADW, Object Storage, and Hive in formats like parquet, csv etc
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
loading data
autonomous database
adw
hive
pandas
dask
object storage
Universal Permissive License v 1.0
Updated: 03/26/2023
A model version set is a way to track the relationships between models. As a container, the model version set takes a collection of models. Those models are assigned a sequential version number based on the order they are entered into the model version set.
This notebook was developed on the conda pack with slug: dbexp_p38_cpu_v1
model
model experiments
model version set
Universal Permissive License v 1.0
Updated: 03/30/2023
Train and evaluate different types of models: binary classification using an imbalanced dataset, multi-class classification using a synthetically generated dataset consisting of three equally distributed classes, and a regression using a synthetically generated dataset with positive targets.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
model evaluation
binary classification
regression
multi-class classification
imbalanced dataset
synthetic dataset
Universal Permissive License v 1.0
Updated: 03/30/2023
Perform model explanations on an NLP classifier using the locally interpretable model explanations technique (LIME).
This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2
nlp
lime
model_explanation
text_classification
text_explanation
Universal Permissive License v 1.0
Updated: 03/30/2023
Load visual data, define regions, and visualize objects using metadata to connect structured images to language.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
object annotation
genome visualization
oracle open data
Universal Permissive License v 1.0 (https://oss.oracle.com/licenses/upl/)
Updated: 07/17/2023
Extract text from common formats (e.g. PDF and Word) into plain text. Customize this process for individual use cases.
This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2
onnx
deploy model
Universal Permissive License v 1.0
Updated: 03/26/2023
Create and use ML pipelines through the entire machine learning lifecycle
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
pipelines
pipeline step
jobs pipeline
Universal Permissive License v 1.0
Updated: 03/26/2023
Use Oracle's Graph Analytics libraries to demonstrate graph algorithms, graph machine learning models, and use the property graph query language (PGQL)
This notebook was developed on the conda pack with slug: pypgx2310_p38_cpu_v1
pypgx
graph analytics
pgx
Universal Permissive License v 1.0
Updated: 03/26/2023
Run interactive Spark workloads on a long lasting Oracle Cloud Infrastructure Data Flow Spark cluster through Apache Livy integration. Data Flow Spark Magic is used for interactively working with remote Spark clusters through Livy, a Spark REST server, in Jupyter notebooks. It includes a set of magic commands for interactively running Spark code.
This notebook was developed on the conda pack with slug: pyspark32_p38_cpu_v2
pyspark
data flow
Universal Permissive License v 1.0
Updated: 03/26/2023
Demonstrates how to use Spark NLP within a long lasting Oracle Cloud Infrastructure Data Flow cluster.
This notebook was developed on the conda pack with slug: pyspark32_p38_cpu_v1
pyspark
data flow
Universal Permissive License v 1.0
Updated: 06/02/2023
Develop local PySpark applications and work with remote clusters using Data Flow.
This notebook was developed on the conda pack with slug: pyspark24_p37_cpu_v3
pyspark
data flow
Universal Permissive License v 1.0
Updated: 03/26/2023
Train, register, and deploy a PyTorch model.
This notebook was developed on the conda pack with slug: pytorch110_p38_cpu_v1
pytorch
deploy model
register model
train model
Universal Permissive License v 1.0
Updated: 03/26/2023
Train, register, and deploy an scikit-learn model.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
scikit-learn
deploy model
register model
train model
Universal Permissive License v 1.0
Updated: 03/30/2023
Use SQL Magic commands to work with a database within a Jupyter notebook. This notebook shows how to to use both line and cell magics.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
sql magic
autonomous database
Universal Permissive License v 1.0
Updated: 03/30/2023
Connect to Oracle Cloud Insfrastructure (OCI) Streaming service with kafka.
This notebook was developed on the conda pack with slug: dataexpl_p37_cpu_v3
streaming
kafka
Universal Permissive License v 1.0
Updated: 03/26/2023
Train, register, and deploy a TensorFlow model.
This notebook was developed on the conda pack with slug: tensorflow28_p38_cpu_v1
tensorflow
deploy model
register model
train model
Universal Permissive License v 1.0
Updated: 03/26/2023
Extract text from common formats (e.g. PDF and Word) into plain text. Customize this process for individual use cases.
This notebook was developed on the conda pack with slug: nlp_p37_cpu_v2
text extraction
nlp
Universal Permissive License v 1.0
Updated: 03/26/2023
Train, register, and deploy an XGBoost model.
This notebook was developed on the conda pack with slug: generalml_p38_cpu_v1
xgboost
deploy model
register model
train model
Universal Permissive License v 1.0
Updated: 03/30/2023
Compare training time between CPU and GPU trained models using XGBoost
This notebook was developed on the conda pack with slug: rapids2110_p37_gpu_v1
xgboost
rapids
gpu
machine learning
classification
Universal Permissive License v 1.0
Updated: 11/13/2023
Manage and utilize medical data efficiently using a feature store. This notebook demonstrates the storage, retrieval, and manipulation of Electronic Health Record (EHR) data within a feature store framework.
feature store
medical data
data management
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Explore the storage and retrieval of Hugging Face embeddings within a feature store setup. This notebook provides insights into storing and utilizing pre-trained embeddings for various natural language processing tasks using the feature store infrastructure.
feature store
Hugging Face embeddings
storage
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Learn how to store and leverage OpenAI embeddings effectively within a feature store environment. This notebook guides users through the process of managing and utilizing OpenAI-generated embeddings for diverse machine learning applications within a feature store framework.
feature store
OpenAI embeddings
storage
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Generate synthetic medical data leveraging OpenAI tools within a feature store. This notebook illustrates the process of creating synthetic medical data for various research and analysis purposes using the capabilities of a feature store.
feature store
synthetic data generation
medical data
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Explore techniques and methods for Personally Identifiable Information (PII) redaction and transformation within a feature store environment. This notebook demonstrates how to manage sensitive data securely by implementing PII masking and transformation techniques using a feature store.
feature store
PII redaction
data security
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Understand and perform querying operations within a feature store setup. This notebook covers querying techniques, data retrieval, and manipulation strategies to efficiently access and utilize stored features in a feature store environment.
feature store
querying
data operations
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Get started quickly with a feature store setup using this introductory notebook. It provides step-by-step guidance and essential information for setting up and utilizing a feature store environment for efficient data management and analysis.
feature store
quickstart
data management
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Learn about schema evolution and enforcement techniques within a feature store. This notebook explores methods to handle schema changes, enforce data integrity, and manage evolving data structures effectively in a feature store environment.
feature store
schema evolution
data integrity
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Explore big data operations within a feature store using Spark magic commands. This notebook demonstrates how to leverage the power of Spark for efficient data handling and analysis in a feature store environment.
feature store
big data operations
Spark
License: Universal Permissive License v 1.0
Updated: 11/13/2023
Explore streaming operations within a feature store using Spark. This notebook demonstrates how to leverage the power of Spark Streaming for efficient data handling and analysis in a feature store environment.
feature store
big data operations
Spark
Spark Streaming
License: Universal Permissive License v 1.0