Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with reading tables from deltalake using AzureBlobFileSystem in adlfs #python #44

Open
shr-poojary opened this issue Sep 8, 2022 · 1 comment

Comments

@shr-poojary
Copy link

Hi,
I am facing an issue while reading delta tables from Azure Delta Lake using AzureBlobFileSystem in adlfs. I am not sure if its an issue with syntax or library versions. Kindly help!!

Libraries:

  1. delta-lake-reader (Version: 0.2.13)
  2. adlfs ((Version: 0.7.7)
  3. azure-identity (Version: 1.7.1)
token_credential = ClientSecretCredential(active_directory_tenant_id, active_directory_client_id, 
                                          active_directory_client_secret)

fs = AzureBlobFileSystem(
        account_name = storage_account, 
        credential = token_credential
    )

dt = DeltaTable('containername/folder1/folder2/tablefolder', file_system=fs)  #tablefolder contains the **_delta_log**

Error:


TypeError                                 Traceback (most recent call last)
<ipython-input-24-35f4abc79a4b> in <module>
      7 pathtosa = deltaLakeFinalUri + conatainerPathToDeltaTable + deltaTableName
      8 print(pathtosa)
----> 9 dt = DeltaTable('dqdata/RAW/Revenue/actual_revenue_and_margin', file_system=fs)
     10 df=dt.to_pandas()

~\AppData\Roaming\Python\Python38\site-packages\deltalake\deltatable.py in __init__(self, path, file_system)
     43                 Make sure you point to the root of a delta table"""
     44             )
---> 45         self._as_newest_version()
     46 
     47         # The PyArrow Dataset is exposed by a factory class,

~\AppData\Roaming\Python\Python38\site-packages\deltalake\deltatable.py in _as_newest_version(self)
    149         # apply remaining versions. This can be a maximum of 9 versions.
    150         # we will just break when we don't find any newer logs
--> 151         self._apply_partial_logs(version=self.checkpoint + 9)
    152 
    153     def to_table(self, *args, **kwargs):

~\AppData\Roaming\Python\Python38\site-packages\deltalake\deltatable.py in _apply_partial_logs(self, version)
    130                     elif "metaData" in meta_data.keys():
    131                         schema_string = meta_data["metaData"]["schemaString"]
--> 132                         self.schema = schema_from_string(schema_string)
    133                 # Stop if we have reatched the desired version
    134                 if self.version == version:

~\AppData\Roaming\Python\Python38\site-packages\deltalake\schema.py in schema_from_string(schema_string)
     17         pa_type = map_type(type)
     18 
---> 19         fields.append(pa.field(name, pa_type, nullable=nullable, metadata=metadata))
     20     return pa.schema(fields)
     21 

C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\types.pxi in pyarrow.lib.field()

C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\types.pxi in pyarrow.lib.ensure_metadata()

C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\types.pxi in pyarrow.lib.KeyValueMetadata.__init__()

C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\lib.cp38-win_amd64.pyd in string.from_py.__pyx_convert_string_from_py_std__in_string()

TypeError: expected bytes, int found


@jeppe742
Copy link
Owner

jeppe742 commented Oct 3, 2022

Hey @shr-poojary
Sorry for the long wait.
Do you still have this issue?
Would you be able to provide one of the files from the _delta_log folder?
Maybe there is something in there, that isn't handled correctly in the reader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants