#

apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,801 public repositories matching this topic...

mlflow / mlflow

Open source platform for the machine learning lifecycle

machine-learning ai apache-spark ml model-management mlflow

Updated Nov 18, 2024
Python

SynapseML

microsoft / SynapseML

Simple and Distributed Machine Learning

Updated Nov 13, 2024
Scala

lakeFS

treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

go golang apache-spark aws-s3 google-cloud-storage data-engineering data-lake azure-storage data-version-control object-storage datalake hadoop-filesystem data-quality data-versioning azure-blob-storage apache-sparksql git-for-data lakefs datalakes

Updated Nov 18, 2024
Go

lw-lin / CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

spark apache-spark sparkcore spark-streaming structured-streaming

Updated May 18, 2022
Scala

spark-notebook / spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

data-science reactive scala spark apache-spark notebook

Updated May 16, 2023
JavaScript

kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

kubernetes spark apache-spark kubernetes-operator kubernetes-controller kubernetes-crd google-cloud-dataproc

Updated Nov 18, 2024
Go

intel-analytics / BigDL-2.x

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

python scala apache-spark pytorch keras-tensorflow bigdl distributed-deep-learning deep-neural-network analytics-zoo

Updated Nov 1, 2024
Jupyter Notebook

big-data-europe / docker-spark

Apache Spark docker image

docker kubernetes apache-spark spark-kubernetes k8s-spark

Updated Apr 21, 2023
Shell

dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Updated Jul 12, 2024
C#

feathr-ai / feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

data-science machine-learning apache-spark azure artificial-intelligence data-engineering feature-engineering data-quality mlops feature-store feature-management feature-marketplace feature-governance feature-metadata feature-platform

Updated Apr 4, 2024
Scala

OryxProject / oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

java machine-learning kafka apache-spark cloudera apache-kafka lambda-architecture oryx

Updated Aug 16, 2021
Java

awesome-spark / awesome-spark

A curated list of awesome Apache Spark packages and resources.

awesome apache-spark pyspark sparkr

Updated Oct 24, 2024
Shell

japila-books / apache-spark-internals

The Internals of Apache Spark

spark apache-spark book internals

Updated Sep 14, 2024

goodreads_etl_pipeline

san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

ptyadana / SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

mysql python postgres sql apache-spark sqlite postgresql challenges pyspark mysql-database data-analysis exercises tableau sql-queries pgadmin mysqlworkbench mysql-notes digital-music-store sql-data-analysis

Updated Jul 18, 2022
Jupyter Notebook

databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

spark apache-spark mllib structured-streaming spark-sql spark-mllib mlflow delta-lake

Updated May 8, 2024
Scala

lensacom / sparkit-learn

PySpark + Scikit-learn = Sparkit-learn

python machine-learning apache-spark scikit-learn distributed-computing

Updated Dec 31, 2020
Python

databricks / spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

machine-learning apache-spark scikit-learn grid-search parameter-tuning

Updated Dec 3, 2019
Python

mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

python java machine-learning scala apache-spark distributed-computing design-patterns pyspark mapreduce reducers partitioning hadoop-mapreduce distributed-algorithms mappers data-algorithms apache-hadoop

Updated Oct 14, 2024
Java

sparklyr

sparklyr / sparklyr

R interface for Apache Spark

machine-learning r spark apache-spark dplyr ide distributed rstats sparklyr livy remote-clusters

Updated Oct 28, 2024
R

Created by Matei Zaharia

Released May 26, 2014

Followers: 424 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics