Skip to content
Change the repository type filter

All

    Repositories list

    • Python
      Apache License 2.0
      1000Updated Sep 5, 2024Sep 5, 2024
    • Patterns and concepts for building resilient data pipelines in Python and Scala
      0000Updated Aug 27, 2024Aug 27, 2024
    • Repository for PII Anonymizer code package and sample FastAPI API to use it to talk to LLM
      Jupyter Notebook
      0000Updated Jun 21, 2024Jun 21, 2024
    • nuxtjs-template

      Public template
      JavaScript
      0000Updated Jan 21, 2024Jan 21, 2024
    • sso-sync tool to help with the SCIM setup for bigspark.
      Go
      Apache License 2.0
      0000Updated Oct 26, 2023Oct 26, 2023
    • To test glue job
      Python
      2000Updated Aug 1, 2023Aug 1, 2023
    • test_glue

      Public
      To test glue job
      Python
      2000Updated Jul 14, 2023Jul 14, 2023
    • General Purpose repo for NW AI Hackathon 2023
      Jupyter Notebook
      Apache License 2.0
      2001Updated Apr 20, 2023Apr 20, 2023
    • A streamsets dc sample processor for validation records with a specified JSON schema
      Java
      Apache License 2.0
      0101Updated Apr 14, 2023Apr 14, 2023
    • Shell
      0000Updated Dec 1, 2022Dec 1, 2022
    • vcs_demoo

      Public
      0000Updated Sep 12, 2022Sep 12, 2022
    • Mirror of Apache livy (Incubating)
      Scala
      Apache License 2.0
      602000Updated Jul 22, 2022Jul 22, 2022
    • 1000Updated Jun 30, 2022Jun 30, 2022
    • barcode-server

      Public archive
      Java
      0000Updated Apr 20, 2022Apr 20, 2022
    • Shell
      MIT No Attribution
      0000Updated Mar 25, 2022Mar 25, 2022
    • Data profiling example using Snowflake sample datasets and Scala
      Jupyter Notebook
      Apache License 2.0
      0000Updated Feb 15, 2022Feb 15, 2022
    • This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
      Scala
      Apache License 2.0
      145000Updated Feb 1, 2022Feb 1, 2022
    • Shell
      4000Updated Feb 1, 2022Feb 1, 2022
    • JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
      Java
      Other
      342000Updated Dec 2, 2021Dec 2, 2021
    • emr-uber-profiler-notebooks
      0000Updated Nov 19, 2021Nov 19, 2021
    • Java
      21000Updated Sep 9, 2021Sep 9, 2021
    • deequ

      Public
      Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
      Scala
      Apache License 2.0
      538000Updated Jul 8, 2021Jul 8, 2021
    • tutorials

      Public
      StreamSets Tutorials
      Java
      Apache License 2.0
      192000Updated Apr 19, 2021Apr 19, 2021
    • Java
      0000Updated Mar 16, 2021Mar 16, 2021
    • JavaScript
      Other
      1000Updated Jan 10, 2021Jan 10, 2021
    • JavaScript
      MIT License
      0100Updated Dec 30, 2020Dec 30, 2020
    • kafkademo

      Public
      Java
      1101Updated Oct 13, 2020Oct 13, 2020
    • Basic single broker Kafka cluster - docker compose using confluent image
      1200Updated Aug 3, 2020Aug 3, 2020
    • Java
      0000Updated Apr 16, 2020Apr 16, 2020
    • kafdrop

      Public
      Kafka Web UI
      Java
      Apache License 2.0
      840000Updated Apr 9, 2020Apr 9, 2020