Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with chunking + GDS library #856

Open
prisciliapangg opened this issue Nov 12, 2024 · 4 comments
Open

Issue with chunking + GDS library #856

prisciliapangg opened this issue Nov 12, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@prisciliapangg
Copy link

message': 'Failed To Process File:xx.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for xx.pdf. Please re-upload file and try.', 'file_name': 'xx.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://xx.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 02:42:18 UTC'}
Traceback (most recent call last):

and

Failed to create GDS driver: The Graph Data Science library is not correctly installed on the Neo4j server.

@kartikpersistent kartikpersistent added the bug Something isn't working label Nov 12, 2024
@aashipandya
Copy link
Collaborator

Have you tried re-uploading the file and generate graph ?

If you still get error, please share full trace of the error and pdf file if possible.

@prisciliapangg
Copy link
Author

prisciliapangg commented Nov 12, 2024

this is the document that i am working on:
government-data-security-policies.pdf

[INFO]{'api_name': 'extract', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'source_url': None, 'aws_access_key_id': None, 'model': 'openai-gpt-4o', 'gcs_bucket_name': None, 'gcs_bucket_folder': None, 'source_type': 'local file', 'gcs_blob_filename': None, 'file_name': 'government-data-security-policies.pdf', 'gcs_project_id': None, 'wiki_query': None, 'allowedNodes': '', 'allowedRelationship': '', 'language': None, 'retry_condition': '', 'logging_time': '2024-11-12 02:42:17 UTC'}
2024-11-12 10:42:17,761 - File path:/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf
2024-11-12 10:42:17,761 - Process file name :government-data-security-policies.pdf
2024-11-12 10:42:17,959 - Time taken database connection: 0.20 seconds
2024-11-12 10:42:18,184 - Deleted File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf and Deleted File Name : government-data-security-policies.pdf
2024-11-12 10:42:18,184 - file government-data-security-policies.pdf deleted successfully
[ERROR]{'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 02:42:18 UTC'}
2024-11-12 10:42:18,184 - File Failed in extraction: {'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 02:42:18 UTC'}
Traceback (most recent call last):
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/score.py", line 193, in extract_knowledge_graph_from_file
uri_latency, result = await extract_graph_from_file_local_file(uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 226, in extract_graph_from_file_local_file
return await processing_source(uri, userName, password, database, model, fileName, [], allowedNodes, allowedRelationship, True, merged_file_path, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 308, in processing_source
total_chunks, chunkId_chunkDoc_list = get_chunkId_chunkDoc_list(graph, file_name, pages, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 525, in get_chunkId_chunkDoc_list
raise Exception(f"Chunks are not created for {file_name}. Please re-upload file and try.")
Exception: Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.

@aashipandya
Copy link
Collaborator

It is processing at our end.
image

image

Try to select and delete this file from UI and then upload it again.

@prisciliapangg
Copy link
Author

I tried that and it the problem still remains:
INFO: 127.0.0.1:63194 - "POST /post_processing HTTP/1.1" 200 OK
[INFO]{'api_name': 'delete_document_and_entities', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'filenames': '["government-data-security-policies.pdf"]', 'deleteEntities': 'true', 'source_types': '["local file"]', 'logging_time': '2024-11-12 12:25:13 UTC'}
2024-11-12 20:25:14,105 - Deleted File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf and Deleted File Name : government-data-security-policies.pdf
2024-11-12 20:25:14,419 - Deleting 1 documents = '['government-data-security-policies.pdf']' from '['local file']' from database
[INFO]{'api_name': 'delete_document_and_entities', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'logging_time': '2024-11-12 12:25:14 UTC', 'elapsed_api_time': '0.76'}
INFO: 127.0.0.1:63194 - "POST /delete_document_and_entities HTTP/1.1" 200 OK
[INFO]{'api_name': 'upload', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'chunkNumber': '1', 'totalChunks': '1', 'original_file_name': 'government-data-security-policies.pdf', 'model': 'openai-gpt-4o', 'logging_time': '2024-11-12 12:25:18 UTC'}
2024-11-12 20:25:19,174 - gcs file cache: False
2024-11-12 20:25:19,174 - Chunk File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/chunks/government-data-security-policies.pdf_part_1
2024-11-12 20:25:19,175 - Merged File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files
2024-11-12 20:25:19,176 - Chunk File Path While Merging Parts:/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/chunks/government-data-security-policies.pdf_part_1
2024-11-12 20:25:19,176 - Chunks merged successfully and return file size
2024-11-12 20:25:19,176 - File merged successfully
2024-11-12 20:25:19,176 - creating source node if does not exist
[INFO]{'api_name': 'upload', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'logging_time': '2024-11-12 12:25:19 UTC', 'elapsed_api_time': '0.52'}
INFO: 127.0.0.1:63194 - "POST /upload HTTP/1.1" 200 OK
[INFO]{'api_name': 'extract', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'source_url': None, 'aws_access_key_id': None, 'model': 'openai-gpt-4o', 'gcs_bucket_name': None, 'gcs_bucket_folder': None, 'source_type': 'local file', 'gcs_blob_filename': None, 'file_name': 'government-data-security-policies.pdf', 'gcs_project_id': None, 'wiki_query': None, 'allowedNodes': '', 'allowedRelationship': '', 'language': None, 'retry_condition': '', 'logging_time': '2024-11-12 12:25:21 UTC'}
2024-11-12 20:25:21,432 - File path:/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf
2024-11-12 20:25:21,432 - Process file name :government-data-security-policies.pdf
2024-11-12 20:25:21,742 - Time taken database connection: 0.31 seconds
2024-11-12 20:25:21,898 - Deleted File Path: /Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/merged_files/government-data-security-policies.pdf and Deleted File Name : government-data-security-policies.pdf
2024-11-12 20:25:21,899 - file government-data-security-policies.pdf deleted successfully
[ERROR]{'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 12:25:21 UTC'}
2024-11-12 20:25:21,899 - File Failed in extraction: {'message': 'Failed To Process File:government-data-security-policies.pdf or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.', 'file_name': 'government-data-security-policies.pdf', 'status': 'Failed', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-11-12 12:25:21 UTC'}
Traceback (most recent call last):
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/score.py", line 193, in extract_knowledge_graph_from_file
uri_latency, result = await extract_graph_from_file_local_file(uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 226, in extract_graph_from_file_local_file
return await processing_source(uri, userName, password, database, model, fileName, [], allowedNodes, allowedRelationship, True, merged_file_path, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 308, in processing_source
total_chunks, chunkId_chunkDoc_list = get_chunkId_chunkDoc_list(graph, file_name, pages, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/priscilia/Desktop/folder/llmgraphbuilder/llm-graph-builder/backend/src/main.py", line 525, in get_chunkId_chunkDoc_list
raise Exception(f"Chunks are not created for {file_name}. Please re-upload file and try.")
Exception: Chunks are not created for government-data-security-policies.pdf. Please re-upload file and try.
INFO: 127.0.0.1:63194 - "POST /extract HTTP/1.1" 200 OK
INFO: 127.0.0.1:63711 - "GET /update_extract_status/government-data-security-policies.pdf?url=neo4j+s://915323c5.databases.neo4j.io:7687&userName=neo4j&password=cjY1cTZsSDUwVVA3cWdFaGxObzVIVmVULUVoRS1JRVI4dEdXTHRhSlJPbw==&database=neo4j HTTP/1.1" 200 OK
[INFO]{'api_name': 'post_processing', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'userName': 'neo4j', 'database': 'neo4j', 'tasks': '["materialize_text_chunk_similarities","enable_hybrid_search_and_fulltext_search_in_bloom","materialize_entity_similarities","enable_communities"]', 'logging_time': '2024-11-12 12:25:22 UTC'}
2024-11-12 20:25:23,381 - update KNN graph
2024-11-12 20:25:23,615 - SSE Client disconnected
2024-11-12 20:25:23,615 - Updated KNN Graph
2024-11-12 20:25:23,615 - Starting the process of creating full-text indexes.
2024-11-12 20:25:23,948 - Database connectivity verified.
2024-11-12 20:25:23,948 - Creating a full-text index for type 'entities'.
2024-11-12 20:25:23,967 - Dropped existing index (if any) in 0.02 seconds.
2024-11-12 20:25:23,997 - Full text index is not created as labels are empty
2024-11-12 20:25:23,997 - Process completed in 0.05 seconds.
2024-11-12 20:25:23,997 - Full-text index for type 'entities' created successfully.
2024-11-12 20:25:23,997 - Creating a full-text index for type 'hybrid'.
2024-11-12 20:25:24,021 - Dropped existing index (if any) in 0.02 seconds.
2024-11-12 20:25:24,050 - Created full-text index in 0.03 seconds.
2024-11-12 20:25:24,053 - Process completed in 0.06 seconds.
2024-11-12 20:25:24,053 - Full-text index for type 'hybrid' created successfully.
2024-11-12 20:25:24,053 - Creating a vector index for type 'vector'.
2024-11-12 20:25:24,053 - Starting the process to create vector index.
2024-11-12 20:25:24,069 - Dropped existing index (if any) in 0.02 seconds.
2024-11-12 20:25:24,094 - Created vector index in 0.02 seconds.
2024-11-12 20:25:24,097 - Vector index for chunk created successfully.
2024-11-12 20:25:24,097 - Driver closed successfully.
2024-11-12 20:25:24,098 - Full-text and vector index creation process completed.
2024-11-12 20:25:24,098 - Full Text index created
2024-11-12 20:25:24,174 - Entity Embeddings created
2024-11-12 20:25:24,571 - Failed to create GDS driver: The Graph Data Science library is not correctly installed on the Neo4j server.
Please refer to https://neo4j.com/docs/graph-data-science/current/installation/.

2024-11-12 20:25:24,571 - Failed to create communities: The Graph Data Science library is not correctly installed on the Neo4j server.
Please refer to https://neo4j.com/docs/graph-data-science/current/installation/.

2024-11-12 20:25:24,571 - created communities
[DEFAULT]{'api_name': 'post_processing/create_communities', 'db_url': 'neo4j+s://915323c5.databases.neo4j.io:7687', 'logging_time': '2024-11-12 12:25:24 UTC'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants