Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASATable not working for dataset #46

Open
miguelcarcamov opened this issue May 6, 2022 · 5 comments
Open

CASATable not working for dataset #46

miguelcarcamov opened this issue May 6, 2022 · 5 comments

Comments

@miguelcarcamov
Copy link

miguelcarcamov commented May 6, 2022

I'm working with this PDS70 dataset.
I am running this lines:

rslt = CASATable.read(ms_name)
tables = rslt.as_astropy_table(data_desc_id="all")

And I'm getting the following error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Input In [10], in <cell line: 2>()
      1 rslt = CASATable.read(ms_name)
----> 2 tables = rslt.as_astropy_table(data_desc_id="all")

File ~/Documents/casa-formats-io/casa_formats_io/casa_low_level_io/table.py:365, in CASATable.as_astropy_table(self, data_desc_id, include_columns)
    362         colindex_in_dm += 1
    364 if hasattr(dm, 'read_column'):
--> 365     coldata = dm.read_column(self._filename, seqnr, self.column_set.columns[colindex], coldesc[colindex], colindex_in_dm)
    366     if coldata is not None:
    367         table_columns[colname] = coldata

File ~/Documents/casa-formats-io/casa_formats_io/casa_low_level_io/data_managers/standard.py:201, in StandardStMan.read_column(self, filename, seqnr, column, coldesc, colindex_in_dm)
    199 for irow in range(rows_in_bucket[bucket_id]):
    200     offset = read_int64(f)
--> 201     fi.seek(offset)
    202     ndim = read_int32(fi)
    203     subshape = []

File ~/Documents/casa-formats-io/casa_formats_io/casa_low_level_io/core.py:47, in EndianAwareFileHandle.seek(self, n)
     46 def seek(self, n):
---> 47     return self.file_handle.seek(n)

OSError: [Errno 22] Invalid argument
@keflavich
Copy link
Contributor

keflavich commented May 13, 2022

I can reproduce this:

import casa_formats_io
from astropy.table import Table
ms_name = 'residuals.ms'
rslt = Table.read(ms_name)
tables = rslt.as_astropy_table(data_desc_id="all")

yields

Traceback (most recent call last):
  File "<ipython-input-5-e56083fa4cb2>", line 1, in <module>
    rslt = Table.read(ms_name)
  File "/home/adam/anaconda3/lib/python3.8/site-packages/astropy/table/connect.py", line 62, in __call__
    out = self.registry.read(cls, *args, **kwargs)
  File "/home/adam/anaconda3/lib/python3.8/site-packages/astropy/io/registry/core.py", line 199, in read
    data = reader(*args, **kwargs)
  File "/home/adam/repos/casa-formats-io/casa_formats_io/table_reader.py", line 18, in read_casa_table
    return table.as_astropy_table(data_desc_id=data_desc_id)
  File "/home/adam/repos/casa-formats-io/casa_formats_io/casa_low_level_io/table.py", line 365, in as_astropy_table
    coldata = dm.read_column(self._filename, seqnr, self.column_set.columns[colindex], coldesc[colindex], colindex_in_dm)
  File "/home/adam/repos/casa-formats-io/casa_formats_io/casa_low_level_io/data_managers/standard.py", line 201, in read_column
    fi.seek(offset)
  File "/home/adam/repos/casa-formats-io/casa_formats_io/casa_low_level_io/core.py", line 47, in seek
    return self.file_handle.seek(n)
OSError: [Errno 22] Invalid argument

n is -1:

ipdb> print(n)
-1
ipdb> self.file_handle.tell()
12

@astrofrog any chance you can help figure this out?

@keflavich
Copy link
Contributor

@miguelcarcamov can you say anything more about how this MS was made? I can't tell yet whether this is a bug we have or a not-yet-supported file type, but it woudl be helpful to know where this comes from

@miguelcarcamov
Copy link
Author

miguelcarcamov commented May 13, 2022

Yes @keflavich . The measurement set comes from the residuals of gpuvmem (an image reconstruction software), at this point the only thing that gpuvmem does is just change the data column and add a model_data column. The original dataset comes from the ALMA 2019 Long Baseline data of PDS70 (paper here). I can confirm that the data can be read with casa and also can be read with dask-ms.

@miguelcarcamov
Copy link
Author

Also note that in the POLARIZATION table there are two rows. One of the rows has only 1 correlation and is never used in the DATA_DESCRIPTION table.

@astrofrog
Copy link
Member

I can try and look at this soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants