Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR for issue #824 #840

Closed
wants to merge 3 commits into from
Closed

PR for issue #824 #840

wants to merge 3 commits into from

Conversation

parmesant
Copy link
Contributor

Fixes #824 (Temporarily)

Description

PR suggests a temporary fix for issue #824. The issue might lie in the way an internal dependency is parsing the path for a given parquet file on Windows as per my conversation with @nikhilsinhaparseable.

Windows uses \ as path separator which, if present in a string, appears as an escaped \\. Rust is able to reach the manifest.json by following the path given in .stream.json as ..."manifest_list":[{"manifest_path":"G:\\projects\\local_parseable\\parseable\\data\\test/date=2024-07-06/manifest.json",.... By extension, it should be able to read the parquet files from their path present in manifest.json as ..."files":[{"file_path":"G:\\projects\\local_parseable\\parseable\\data\\test/date=2024-07-06/hour=06/minute=52/DESKTOP-C8E53PR.data.Z6KsJTjB2iryjtr.parquet",...
Looking at partitioned_files, it is clear that the issue is not with how Rust is treating the path, but probably with how the physical plan is executed by Datafusion.

Excerpts from the logs-

[2024-07-06T14:33:06Z DEBUG datafusion::physical_planner] Optimized physical plan:
    SortPreservingMergeExec: [timestamp@0 ASC NULLS LAST] ....

                 .... ParquetExec: file_groups={1 group: [[G:%5Cprojects%5Clocal_parseable%5Cparseable%5Cdata%5Ctest/date=2024-07-06/hour=14/minute=32/DESKTOP-C8E53PR.data.IdtF1BXkh0THAzp.parquet]]}, projection=[p_timestamp],....

[2024-07-06T14:33:06Z DEBUG datafusion_physical_plan::aggregates::row_hash] Creating GroupedHashAggregateStream

[2024-07-06T14:33:06Z DEBUG actix_web::middleware::logger] Error in response: Execute(Datafusion(External(ParquetError(General("AsyncChunkReader::get_bytes error: Object at location G:\\%5Cprojects%5Clocal_parseable%5Cparseable%5Cdata%5Ctest\\date=2024-07-06\\hour=14\\minute=32\\DESKTOP-C8E53PR.data.IdtF1BXkh0THAzp.parquet not found: The system cannot find the path specified. (os error 3)")))))

Another strange thing is that, in the error, the path appears to have both the escaped backslash \\ as well as the hex notation of a backslash %5C. Which could indicate that the path is getting corrupted somehow.

For now, a temporary fix in the form of string replacement can be used (since windows works just fine with forward slashes in place of backslashes)

This PR also fixes issues present with the powershell script which is in charge or downloading and setting-up Parseable on a Windows machine. Over time, the nomenclature rules of new releases changed which broke the script. Similar issues exist in the .sh script as well but it relies on some zsh specific commands and thus could not be fixed by me (I would propose changing it to a bash script since macOS supports that also).


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Copy link
Contributor

github-actions bot commented Jul 6, 2024

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

@parmesant
Copy link
Contributor Author

Closing PR in favour of taking more time to find (and hoepfully fix) the actual issue coming from Datafusion.

@parmesant parmesant closed this Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File path issue : Windows
1 participant