Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GOBBLIN-2159] Adding support for partition level copy in Iceberg distcp #4058
[GOBBLIN-2159] Adding support for partition level copy in Iceberg distcp #4058
Changes from 5 commits
02ae2fc
981357c
7cd9353
82d10d3
c43d3e1
0cf7638
63bb9aa
6e1cf6b
065cde3
a13220d
e1d812f
4364044
66d81a3
24b4823
d8356e1
4dcc88b
46bd976
e1e6f57
b6163ba
6c73a25
9c35733
cdc863a
1dbe929
383ed91
942ad8d
6a4cf78
2adaa8b
c948854
a55ee61
1afc37a
bb35070
eeb8d25
675e8bb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for good measure you could also make
IcebergTable.TableNotFoundException
a declared/checked exception here.I'm tempted to re-situate the exception as
IcebergCatalog.TableNotFoundException
, but I don't want two classes w/ the same semantics - and renaming public interfaces is probably too late... so I'll make peace with the current nameThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed not throwing here instead catching NoSuchTableException in BaseIcebergCatalog::openTable and throwing IcebergTable.TableNotFoundException from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this makes more sense here, given the synchronous reading of every manifest files happens within this method, rather than in the style of the
Iterator<IcebergSnapshotInfo>
returned byIcebergTable::getIncrementalSnapshotInfosIterator
.that said, I doubt we should still log tracked growth as this very same list is later transformed in
IcebergPartitionDataset::calcDestDataFileBySrcPath
. all the network calls are in this method, rather than over there, so the in-process transformation into CopyEntities should be quite fast. maybe just log once at the end ofcalcDestDataFileBySrcPath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, seems a valid approach let me remove growthMileStonetracker from that function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how returning this
Map
allows you to be so succinct at every point of use:nice work!