You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 18, 2023. It is now read-only.
Right now we're using NumPy saved files that store structural arrays for the results but this might change in the future (see #14), especially to accommodate some visualization utilities that would benefit from having all the results for an image in one container indexed intelligently.
Another annoyance is that each CLI utility uses duplicate code to open/inspect/read/write from/to the result files. Ideally this should be refactored into some common set of functions.
Proposal
Implement a "drivers" for each format (so just NumPy for now) that contains the logic for inspecting/reading/writing to/etc. each format. Eventually this will necessitate updating the configuration files to specify what result storage driver should be used.
The currently implemented iter_records, for example, would still iterate over result records, but would do so in a way that makes sense for the format. For the current NumPy saved files, we'd yield one row worth of records at a time. If we used something that stores results in blocks, maybe it would be chunks of data irregardless of the row:
driver=drivers.register(result_format)
forrecindriver.iter_records(config):
# do stuff
We usually want to perform a query on the records based on the segment dates, so there could be some higher level API access that would perform a query optimized for the format (NumPy files would just use simple np.where against them but we could use in kernel searches if using pytables):
driver=drivers.register(result_format)
formatching_recindriver.query_records(config, start='2000-01-01', end='2001-01-01'):
# do more stuff
Justification
If we refactor out all of the result IO from the CLI scripts, we'll make testing much easier and probably reduce the overall amount of code. Refactoring out just the NumPy format probably won't take too much time and would set us up to easily transition to a better file format.
The text was updated successfully, but these errors were encountered:
Motivation
Right now we're using NumPy saved files that store structural arrays for the results but this might change in the future (see #14), especially to accommodate some visualization utilities that would benefit from having all the results for an image in one container indexed intelligently.
Another annoyance is that each CLI utility uses duplicate code to open/inspect/read/write from/to the result files. Ideally this should be refactored into some common set of functions.
Proposal
Implement a "drivers" for each format (so just NumPy for now) that contains the logic for inspecting/reading/writing to/etc. each format. Eventually this will necessitate updating the configuration files to specify what result storage driver should be used.
The currently implemented
iter_records
, for example, would still iterate over result records, but would do so in a way that makes sense for the format. For the current NumPy saved files, we'd yield one row worth of records at a time. If we used something that stores results in blocks, maybe it would be chunks of data irregardless of the row:We usually want to perform a query on the records based on the segment dates, so there could be some higher level API access that would perform a query optimized for the format (NumPy files would just use simple
np.where
against them but we could use in kernel searches if usingpytables
):Justification
If we refactor out all of the result IO from the CLI scripts, we'll make testing much easier and probably reduce the overall amount of code. Refactoring out just the NumPy format probably won't take too much time and would set us up to easily transition to a better file format.
The text was updated successfully, but these errors were encountered: