Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid ignoring files visible in Hive #16932

Merged
merged 1 commit into from
Apr 11, 2023

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Apr 7, 2023

Description

Ignore in Hive only the files which have their names or the names of
their ancestor beginning with . or _ characters.

As agreed on Slack with @dprophet , this PR is a mere spin-off from #16387 used just to illustrate the creation of product tests corresponding for the initial PR.

@dprophet feel free to use the product tests in the original PR. No need for co-authoring.

Fixes #16386

Additional context and related issues

The newly added product tests can be executed with

testing/bin/ptl test run --environment singlenode-hdp3 --  -t io.trino.tests.product.hive.TestHiveHiddenFiles

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive
* Fix incorrect results when diretories or file names contain hidden characters. ({issue}`16386`)

@cla-bot cla-bot bot added the cla-signed label Apr 7, 2023
// Rename the table files to Hive hidden files (prefixed by `.` or `_` characters)
for (String filename : hdfsClient.listDirectory(tableLocation)) {
try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
hdfsClient.loadFile(tableLocation + "/" + filename, bos);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The renaming of the files could be done much easier with trinodb/tempto#95

cc @ebyhr @findepi

@findinpath findinpath requested review from findepi and ebyhr April 7, 2023 16:03
@github-actions github-actions bot added hive Hive connector tests:hive labels Apr 7, 2023
Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for picking this up to get it over the finish line @findinpath

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change to non-draft PR if it's ready for review?

@findinpath findinpath marked this pull request as ready for review April 11, 2023 16:30
Ignore in Hive only the files which have their names or the
names of their ancestor beginning with `.` or `_` characters.
@ebyhr ebyhr merged commit 03d3e8d into trinodb:master Apr 11, 2023
@ebyhr ebyhr mentioned this pull request Apr 11, 2023
@github-actions github-actions bot added this to the 413 milestone Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

Off by +1 error causing HIVE connector skipping parquet files
4 participants