Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project: improved viewer support for table-style data #8692

Open
2 tasks
jleibs opened this issue Jan 14, 2025 · 1 comment
Open
2 tasks

Project: improved viewer support for table-style data #8692

jleibs opened this issue Jan 14, 2025 · 1 comment
Labels
project Tracking issues for so-called "Projects"

Comments

@jleibs
Copy link
Member

jleibs commented Jan 14, 2025

Under development by @jleibs

Summary description

Make the viewer work more nicely for table-style data such as data sourced from the recording catalog

Background and problem

The viewer currently makes fairly deep assumptions that the primary workflow of a user is to understand how 1 or more entities evolve over time.

  • By default, views will show all of the entities, but only data for a single time-point for each (unless using range queries). This is efficient when the number of entities is small (see the assorted many-entities performance issues).
  • Additionally, the hover/selection model is entity-centric. You can select an inspect an entity. There is no equivalent selection for "index" -- there is always a single time-point implicitly selected by the position of the time cursor.

These assumptions flow all the way into the storage layer -- each entity gets its own parallel data-structure, and our IPC format requires a separate chunk per-entity. Each entity logically exists as one or more columns.

However, this model is at odds with the way users think about some kinds of tabular data. In particular data where we think of the row as "entity-like". For example in a catalog view each row corresponds to a "recording."

Currently we have two choices for how we think about mapping a table of recordings into the viewer:

  1. We can give each recording its own entity-path
  • This totally falls over after a few hundred recordings due to the many-entities problem
  • The dataframe view is effectively useless -- we just end up with a super-wide table and no way to correlated data between recordings
  • But, other views, conceptually work, and you can select a recording and view its user-component-metadata
  1. We can use index such as registration time or even just row-id.
  • This has the benefit of working reasonably for the dataframe view
  • However, the timeline view isn't particularly useful
  • There's also no way to "select" a recording
  • The tree shows a hierarchy of "recording properties" rather than a hierarchical organization of recordings
  • Non-table views generally need to be configured to show VisibleTimeRange to make sense or else they only show data associated with a single recording.

Grounding use-cases

Assume we have added Lat/Lon metadata for every recording in our catalog.

Send the catalog of recordings to the viewer:

  • Initially view it as a table
  • Switch to a map view and see all the recordings as points on the map
  • Be able to "select" a single recording

How to describe, demo and evaluate

TODO

Designs and plans

Two possible initial ideas

Option 1

Expand the types of indexes we support and remove special-casing about temporal/sequential indexing:

  • add an "id" index type
  • if the index is "id" type, we could default views to show the full index range (need to rename visible time range)
  • make it possible to hover/select select a "id" (index-value)

Option 2

More optimizations around a very specific many-entity workflow:

  • Introduce a multi-entity chunk where we store entity as a column. (This could be a configuration option chosen at the top-level of a recording).
  • This would maybe necessitate an entire alternative store that organized and indexed things differently... not sure if it's possible to ultimately map this to the queries the views do efficiently, or if this would also require more uniform-entity optimizations across views.
  • Make the dataframe-view detect this and do the right thing, placing entity values in their own column

Tasks

  • Simple inline tasks
  • Links to sub-issues

Non-goals and won't do

@jleibs jleibs added the project Tracking issues for so-called "Projects" label Jan 14, 2025
@nikolausWest
Copy link
Member

The idea of introducing a non-time index maxes a lot of sense to me. That s also something we'd want in the viewer to for instance be able to represent datasets of image samples better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project Tracking issues for so-called "Projects"
Projects
None yet
Development

No branches or pull requests

2 participants