-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RADPS (NRAO) Requirements and Feature Requests #64
Comments
Here is an IPython notebook that compares the performance of casa-formats-io and python-casacore: IPython notebook: casa_formats_io_vs_python_casacore.ipynb Dataset (3.36 GB): VLASS3.2.sb45755730.eb46170641.60480.16266136574_spw10_split.ms.zip On my Mac M3, casa-formats-io takes approximately 11 seconds and python-casacore takes approximately 3 seconds to read all of the main table data. Initial tests show that the time taken to read the data and perform reshaping (using np.fromfile and casa_formats_io._casa_chunking._combine_chunks) is comparable between the two libraries. Therefore, the performance difference is likely related to how the data gets organized. |
Thanks, this is very useful! I haven't really done any performance optimisation in casa-formats-io at this point so I am sure there is a lot of low hanging fruit. I will have a think about the requirements and will follow up soon. |
@astrofrog, any update on your thoughts about the requirements? |
Sorry for not getting back to you sooner, I was off work for a significant fraction of the summer. I have had a chance to think about the requirements you mention, and have a few follow-up questions/comments. First, do you need to be able to access just part of the data, or would you always load an entire column into memory? Second, you mention 'Single-threaded (no Dask)' - note that it is possible to use dask in single-threaded mode, so just to make sure we are on the same page, do you object in general to making use of the dask API (specifically the fact that the astropy table we currently return has dask arrays that require The high-level API I was striving for here aims to completely hide away the details of a table to a user, and they would inspect the table using e.g. If the use of the dask API is a deal breaker, maybe we could agree on a public lower-level API that both you and the dask interface could use. |
|
@astrofrog, we now have some developer time available and plan to start looking into this. Have you had a chance to consider it further? |
@Jan-Willem - sorry for the delay, I'll try and reply tonight! I'm sure we can find a way forward to avoid duplicating efforts, I'll write up some thoughts/suggestions this evening. In any case, one thing that definitely needs doing is documenting the format, so that would be worthwhile starting if you have immediate time available. |
As mentioned in #63 here is what the NRAO would be interested in working on (casa-formats-io might already have some of these features):
@astrofrog, @keflavich, @e-koch, please let us know if you think something like this is feasible. We would, of course, be happy to contribute developer effort to achieve this.
The text was updated successfully, but these errors were encountered: