You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear Doclinbg Team, we have a problem in the pipeline with llama index pipelining in the hierarchal chunker. Please find attached the document. Latest used verion is 2.2.0 and inlined doclingparser from feature request of official llama index doclingparser.
Parsing seems to be fine -> so the converter in JSON ExportType is working fine..
then adding to miluvs-vector store with nodeparser/transformations, which is based on HierarchalChunker fails with the
2024-10-25 16:51:57,840 - ERROR - 58ebd550-a879-466f-9903-007bf468ebf5 - Exception details:
Traceback (most recent call last):
[ACCIONA_compressed-1-1.pdf](https://github.com/user-attachments/files/17529462/ACCIONA_compressed-1-1.pdf)
File "/Users/C/test/processor.py", line 336, in process_single_pdf
index = self.upload_to_milvus(pdf_path, milvus_url, cleaned_milvus_coll, ingest)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/C/test/processor.py", line 292, in upload_to_milvus
index = VectorStoreIndex.from_documents(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_index/core/indices/base.py", line 112, in from_documents
nodes = run_transformations(
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_index/core/ingestion/pipeline.py", line 100, in run_transformations
nodes = transform(nodes, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 311, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_index/core/node_parser/interface.py", line 193, in __call__
return self.get_nodes_from_documents(nodes, **kwargs) # type: ignore
[Greenalia_.pdf](https://github.com/user-attachments/files/17529473/Greenalia_.pdf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_index/core/node_parser/interface.py", line 165, in get_nodes_from_documents
nodes = self._parse_nodes(documents, show_progress=show_progress, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 311, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/C/test/processor.py", line 211, in _parse_nodes
for i, chunk in enumerate(chunk_iter):
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/docling_core/transforms/chunker/hierarchical_chunker.py", line 211, in chunk
text = self._triplet_serialize(table_df=table_df)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/docling_core/transforms/chunker/hierarchical_chunker.py", line 132, in _triplet_serialize
rows = [item.strip() for item in table_df.iloc[:, 0].to_list()]
^^^^^^^^^^
AttributeError: 'int' object has no attribute 'strip'`
Some converted PDF-Documents seems to be fine..but some are failing with that error. They could be converted with the 1.x inline. and JSON export .
Do you have any idea, where i can look into? Thank you.
The text was updated successfully, but these errors were encountered:
Dear Doclinbg Team, we have a problem in the pipeline with llama index pipelining in the hierarchal chunker. Please find attached the document. Latest used verion is 2.2.0 and inlined doclingparser from feature request of official llama index doclingparser.
Greenalia_.pdf
ACCIONA_compressed-1-1.pdf
Some converted PDF-Documents seems to be fine..but some are failing with that error. They could be converted with the 1.x inline. and JSON export .
Do you have any idea, where i can look into? Thank you.
The text was updated successfully, but these errors were encountered: