-
Notifications
You must be signed in to change notification settings - Fork 21
Develop CSV importer #4
Comments
Mind sharing those use cases and how a CSV file would map to the structure of an index? |
The mapping for relational data is outlined in our docs at https://www.pilosa.com/docs/latest/data-model/#relational-analogy, and we have a few use case writeups at https://www.pilosa.com/use-cases/. I believe the two referenced in this ticket are transportation and network traffic. Note that these pages are overdue for some updates; you can see up to date PDK use case code in the repo: https://github.com/pilosa/pdk/tree/master/usecase. |
Thanks. I found the table in the Python notebook you put together helpful as well as the suggestion for binning strategies. The general recommendation for row IDs is that they are contiguous to optimize the bitmap compression (via roaring)? Is this handled if a field is created that supports keys? |
@bruth it isn't as crucial that row IDs be continuous, but column IDs should be as close to continuous as possible. It is handled if you use keys. |
For the use case work, we put together a CSV import system that is specific to the two use cases, but lays some groundwork for working with more general data sources. The scope is limited to well-formatted, well-defined tabular data, so users will be responsible for providing clean data.
The text was updated successfully, but these errors were encountered: