You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make the viewer work more nicely for table-style data such as data sourced from the recording catalog
Background and problem
The viewer currently makes fairly deep assumptions that the primary workflow of a user is to understand how 1 or more entities evolve over time.
By default, views will show all of the entities, but only data for a single time-point for each (unless using range queries). This is efficient when the number of entities is small (see the assorted many-entities performance issues).
Additionally, the hover/selection model is entity-centric. You can select an inspect an entity. There is no equivalent selection for "index" -- there is always a single time-point implicitly selected by the position of the time cursor.
These assumptions flow all the way into the storage layer -- each entity gets its own parallel data-structure, and our IPC format requires a separate chunk per-entity. Each entity logically exists as one or more columns.
However, this model is at odds with the way users think about some kinds of tabular data. In particular data where we think of the row as "entity-like". For example in a catalog view each row corresponds to a "recording."
Currently we have two choices for how we think about mapping a table of recordings into the viewer:
We can give each recording its own entity-path
This totally falls over after a few hundred recordings due to the many-entities problem
The dataframe view is effectively useless -- we just end up with a super-wide table and no way to correlated data between recordings
But, other views, conceptually work, and you can select a recording and view its user-component-metadata
We can use index such as registration time or even just row-id.
This has the benefit of working reasonably for the dataframe view
However, the timeline view isn't particularly useful
There's also no way to "select" a recording
The tree shows a hierarchy of "recording properties" rather than a hierarchical organization of recordings
Non-table views generally need to be configured to show VisibleTimeRange to make sense or else they only show data associated with a single recording.
Grounding use-cases
Assume we have added Lat/Lon metadata for every recording in our catalog.
Send the catalog of recordings to the viewer:
Initially view it as a table
Switch to a map view and see all the recordings as points on the map
Be able to "select" a single recording
How to describe, demo and evaluate
TODO
Designs and plans
Two possible initial ideas
Option 1
Expand the types of indexes we support and remove special-casing about temporal/sequential indexing:
add an "id" index type
if the index is "id" type, we could default views to show the full index range (need to rename visible time range)
make it possible to hover/select select a "id" (index-value)
Option 2
More optimizations around a very specific many-entity workflow:
Introduce a multi-entity chunk where we store entity as a column. (This could be a configuration option chosen at the top-level of a recording).
This would maybe necessitate an entire alternative store that organized and indexed things differently... not sure if it's possible to ultimately map this to the queries the views do efficiently, or if this would also require more uniform-entity optimizations across views.
Make the dataframe-view detect this and do the right thing, placing entity values in their own column
Tasks
Simple inline tasks
Links to sub-issues
Non-goals and won't do
The text was updated successfully, but these errors were encountered:
The idea of introducing a non-time index maxes a lot of sense to me. That s also something we'd want in the viewer to for instance be able to represent datasets of image samples better
Under development by @jleibs
Summary description
Make the viewer work more nicely for table-style data such as data sourced from the recording catalog
Background and problem
The viewer currently makes fairly deep assumptions that the primary workflow of a user is to understand how 1 or more entities evolve over time.
These assumptions flow all the way into the storage layer -- each entity gets its own parallel data-structure, and our IPC format requires a separate chunk per-entity. Each entity logically exists as one or more columns.
However, this model is at odds with the way users think about some kinds of tabular data. In particular data where we think of the row as "entity-like". For example in a catalog view each row corresponds to a "recording."
Currently we have two choices for how we think about mapping a table of recordings into the viewer:
Grounding use-cases
Assume we have added Lat/Lon metadata for every recording in our catalog.
Send the catalog of recordings to the viewer:
How to describe, demo and evaluate
TODO
Designs and plans
Two possible initial ideas
Option 1
Expand the types of indexes we support and remove special-casing about temporal/sequential indexing:
Option 2
More optimizations around a very specific many-entity workflow:
Tasks
Non-goals and won't do
The text was updated successfully, but these errors were encountered: