Skip to content

The Deephaven Core Roadmap 1H24

Pete Goddard edited this page Mar 11, 2024 · 5 revisions

Our roadmap for the first half of 2024 is centered around Deephaven’s core usage patterns as (1) a UI framework for live data (and batch data, too), (2) a versatile query engine for live workloads in Python and Java, and (3) a live data pipeline utility.

Annotations:

Mark Status
* work not yet started
🏃 work in progress
work completed
💪 stretch goal
💡 needs research
🟡 particularly important

Project organization

We intend to release new versions of the project at the end of each month. At any time, deliveries intended for the subsequent two months should be found in the appropriate GitHub Milestone, respectively.

Themes

Work will fall into the following categories:

  • UI UX framework capabilities. (To be delivered in this deephaven-core project, as well as web-client-ui, plugins, ipywidgets and other complementary projects.)
  • Live ingestion.
  • Data lake interoperability.
  • Engine capabilities, Python interoperability.
  • Client APIs and the “Barrage” wire protocol (as found in this repo and github/barrage).

UI UX framework

  • 🟡 🏃 MVP version of deephaven.ui, a complete framework for live dashboards and widgets.

    • 🏃 Integration of React Spectrum library of UI interactive and adaptive experiences.
    • 🏃 A rich interface for client-side callbacks and interactivity.
    • 🏃 Programmatic layouts via Python scripts.

  • 🟡 🏃 Ease-of-deployment related to Deephaven’s plug-in infrastructure.

    • 🏃 Deephaven’s integration with PlotlyExpress (“Deephaven Express”)
      • Integrate Deephaven’s smart-downsampler to line plots (- including real-time, ticking ones)
      • Comprehensive documentation of Deephaven Express, plotly. Matplotlib, seaborn, and Java plotting.
  • 💪 An interactive SQL UX (similar to the Python and Java Groovy exploratory and development IDEs provided today).

  • 💡 🟡 UI/UX for writing queries using natural language.

    • Slick integration of LLM-to-SQL utilities, leveraging Deephaven’s declarative Query Syntax Tree (QST) API and Apache Calcite.
    • GUI experiences for typing English and inheriting tables, plots, widgets, and visualizations that update in real time.

Live ingestion

  • 🟡 Packaging of powerful, general-purpose abstractions for ingesting live data from a variety of sources.
  • 🏃 Live mapping of 1-to-N and streaming data with nested formats into live Deephaven tables.
  • 🏃 Improved Kafka support: Payload coverage and performance.
  • 🏃 Improved JSON capabilities:
    • Support metadata and common _metadata files in Parquet (-- this supports adding partitions to Parquet).
    • :strong: 💡 Integrate with Iceberg’s dynamic capabilities.

Data lake interoperability

  • 🟡 🏃 Efficient reading of Parquet files from S3.
    • 🟡 🏃 Continued Parquet feature coverage; predicate pushdown.
    • 🟡 💡 Iceberg integration.
      • Metadata management.
      • Data cataloging and discovery.

Engine capabilities & Python interoperability

  • 🟡 🏃 Multi-key indexes for batch and live data.

  • 🏃 A multi-dimensional integration with NumPy.

  • Data exhaust utilities:

    • More elegant publishing of cell-, array-, and chunk-data to client applications.
    • General-purpose abstraction for streaming egress.
  • 🟡 Formula decomposition and improved parsing to accelerate the processing of UDFs.

  • 🟡 💡 Battle-hardening of the deephaven.learn library to support bidirectional support of live (real-time, updating) arrays into and out of NumPy, Torch, Tensor Flow, and RAG-related libraries.

  • Increased concurrency for more sophisticated uses of Deephaven’s select() and where() operations.

Deephaven's wire format & client APIs

  • The JavaScript API:

    • Refactor the JS API’s Barrage subscription.
    • Better documentation of the JS API.
    • More examples of JS applications.
  • Greater coverage of operating systems for the C++ API.

Leading possibilities for the remainder of 2024

  • Support for custom data types.
  • 💡 Direct-from-CDC connector (without a Debezium translation).
  • Support for multi-dimensional arrays for Python data science and Gen-AI-integrated use cases.
  • 💡 Integration with RAG-frameworks to deliver Deephaven’s live arrays into best-of-breed, enterprise-grade “LLM-meets-proprietary-data-sets” solution.

The Deephaven team is excited by the strategic and tactical initiatives underway. Please help us imagine, design, and prioritize work by tracking our GitHub milestones, filing issues, and communicating with us on Slack.

Deephaven’s live dataframes offer unique, broad, and exciting capabilities at the intersection of modern data-driven workloads, analytics, and applications – particularly those driven by live, real-time data.