You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the slide_etl cli function as a python function from src/luna/pathology/cli/slide_etl.py, I am unable to write the etl dataframe to a parquet file. It looks like the data is generated correctly, but an error with the actual write process. I was able to get a workaround by writing to a .csv file.
Code snippet:
from luna.pathology.cli.slide_etl import cli as slide_etl
from dask.distributed import Client
ETL_OUTPUT_PATH = "./slide_etls/"
def create_slide_etl(path, project):
with Client() as client:
slide_etl(
slide_urlpath = path,
output_urlpath = ETL_OUTPUT_PATH,
project_name = project,
comment = "Automated slide etl",
no_copy = True
)
if __name__ == "__main__":
create_slide_etl("/gpfs/mskmind_ess/pathology_images/spectrum_hnes", "SPECTRUM")
Error traceback:
2023-11-02 13:33:48.758 | INFO | luna.pathology.cli.slide_etl:cli:95 - id project_name comment slide_size uuid ... properties.aperio.Parmset properties.aperio.Filtered properties.aperio.Gamma properties.aperio.Rack properties.aperio.Slide
0 1054708 SPECTRUM Automated slide etl 150163869 1b8e0567-4fab-3297-b1c8-7ab9737b8448 ... NaN NaN NaN NaN NaN
1 1054710 SPECTRUM Automated slide etl 165100695 02d61566-1a26-307a-a079-ac96d7973afc ... NaN NaN NaN NaN NaN
2 1148882 SPECTRUM Automated slide etl 254362059 1493d7fe-d8ae-3ba9-8a89-7b1e9d082fec ... NaN NaN NaN NaN NaN
3 1448993 SPECTRUM Automated slide etl 203616883 010f15aa-a8e5-3e5b-907e-be5d3480aba3 ... NaN NaN NaN NaN NaN
4 1465759 SPECTRUM Automated slide etl 88301569 e25c7caf-9ced-3cae-9587-70cfc436b2bf ... NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ...
1297 936675 SPECTRUM Automated slide etl 501182037 53780d71-5a8e-3741-8d62-9491bb3ffc9f ... NaN NaN NaN NaN NaN
1298 955274 SPECTRUM Automated slide etl 566795321 ecb9deac-f076-37eb-b038-cdec4df4823c ... NaN NaN NaN NaN NaN
1299 955292 SPECTRUM Automated slide etl 486125355 8767d44e-cd48-3340-b7c4-f1e40c975e59 ... NaN NaN NaN NaN NaN
1300 957788 SPECTRUM Automated slide etl 479193387 6b871890-3a2d-3d79-afaa-02ce7423a2b9 ... NaN NaN NaN NaN NaN
1301 957796 SPECTRUM Automated slide etl 361443771 cbf3b996-69a7-3120-bbc2-6f42f4f027e5 ... NaN NaN NaN NaN NaN
[1302 rows x 76 columns]
2023-11-02 13:33:48.838 | INFO | luna.pathology.cli.slide_etl:cli:107 - Writing to parquet file
Traceback (most recent call last):
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/slide_inventory/create_slide_etl.py", line 17, in <module>
create_slide_etl("/gpfs/mskmind_ess/pathology_images/spectrum_hnes", "SPECTRUM")
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/slide_inventory/create_slide_etl.py", line 8, in create_slide_etl
slide_etl(
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/src/luna/common/utils.py", line 143, in wrapper
result = func(*args, **kwargs)
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/src/luna/pathology/cli/slide_etl.py", line 108, in cli
df.to_parquet(of)
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pandas/core/frame.py", line 2976, in to_parquet
return to_parquet(
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pandas/io/parquet.py", line 430, in to_parquet
impl.write(
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pandas/io/parquet.py", line 174, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 611, in dataframe_to_arrays
arrays = [convert_column(c, f)
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 611, in <listcomp>
arrays = [convert_column(c, f)
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 598, in convert_column
raise e
File "/gpfs/mskmind_emc/data_user/shared_data_folder/moored2/luna/.venv/luna/lib/python3.9/site-packages/pyarrow/pandas_compat.py", line 592, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'int' object", 'Conversion failed for column properties.aperio.DSR ID with type object')
The text was updated successfully, but these errors were encountered:
When using the slide_etl cli function as a python function from
src/luna/pathology/cli/slide_etl.py
, I am unable to write the etl dataframe to a parquet file. It looks like the data is generated correctly, but an error with the actual write process. I was able to get a workaround by writing to a .csv file.Code snippet:
Error traceback:
The text was updated successfully, but these errors were encountered: