Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Deletion of Previous Metadata and Statistics Files #312

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d0cd456
delete manifest, manifest list, prev files, stats when drop table wit…
danielhumanmod Sep 22, 2024
26e03ac
unit test for drop table
danielhumanmod Sep 22, 2024
0f8e8f4
refine warning code
danielhumanmod Sep 22, 2024
1b525de
code format
danielhumanmod Sep 22, 2024
e8b26d2
refine warning code
danielhumanmod Sep 22, 2024
2ee6dee
remove unused code
danielhumanmod Sep 22, 2024
806f46d
remove unused import
danielhumanmod Sep 22, 2024
4f1d3c9
code format
danielhumanmod Sep 23, 2024
0a77bfa
remove additional manifest and manifest list deletion
danielhumanmod Sep 24, 2024
47dc60a
add stat deletion test
danielhumanmod Sep 24, 2024
9d835b3
code format
danielhumanmod Sep 24, 2024
40c6147
add new AsyncTaskType
danielhumanmod Oct 6, 2024
f354d1c
Schedule prev metadata and stat files deletion in seperated tasks
danielhumanmod Oct 6, 2024
88c6651
Table content cleanup task handler
danielhumanmod Oct 6, 2024
af3efab
Unit test for table clean up
danielhumanmod Oct 6, 2024
278ab7e
code format
danielhumanmod Oct 6, 2024
ed30fb0
register task handler
danielhumanmod Oct 7, 2024
05c3dd9
handler table content files in batch
danielhumanmod Oct 7, 2024
49dbe68
adjust unit test after batch processing
danielhumanmod Oct 7, 2024
8eea50d
add unit test for TableContentCleanupTaskHandler
danielhumanmod Oct 7, 2024
d9804e6
code format
danielhumanmod Oct 7, 2024
54511de
Merge branch 'main' into pr-289
danielhumanmod Oct 17, 2024
56ba4f2
Merge branch 'main' into pr-289
danielhumanmod Oct 26, 2024
e92852e
Merge branch 'main' into pr-289
danielhumanmod Nov 4, 2024
27ea1b3
merge cleanup tasks into one
danielhumanmod Nov 4, 2024
4d1b68b
Merge remote-tracking branch 'origin/pr-289' into pr-289
danielhumanmod Nov 4, 2024
eb533d7
code format
danielhumanmod Nov 4, 2024
47f760f
Merge branch 'main' into pr-289
flyrain Nov 9, 2024
988e530
refactor manifest cleanup handler based on comments
danielhumanmod Nov 14, 2024
4965d5c
refactor table cleanup handler based on comments
danielhumanmod Nov 14, 2024
5f81483
add TODO
danielhumanmod Nov 14, 2024
097189c
Merge branch 'pr-289' of https://github.com/danielhumanmod/polaris in…
danielhumanmod Nov 14, 2024
651ece0
Merge branch 'main' into pr-289
danielhumanmod Nov 14, 2024
16bb5fe
renaming
danielhumanmod Nov 14, 2024
d276ae6
split the task type in cleanup task handler
danielhumanmod Nov 20, 2024
187b47e
error handling
danielhumanmod Nov 20, 2024
ba5c47c
Merge branch 'main' into pr-289
danielhumanmod Nov 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,13 @@
import org.slf4j.LoggerFactory;

/**
* {@link TaskHandler} responsible for deleting all of the files in a manifest and the manifest
* itself. Since data files may be present in multiple manifests across different snapshots, we
* assume a data file that doesn't exist is missing because it was already deleted by another task.
* {@link TaskHandler} responsible for deleting table files: 1. Manifest files: It contains all the
* files in a manifest and the manifest itself. Since data files may be present in multiple
* manifests across different snapshots, we assume a data file that doesn't exist is missing because
* it was already deleted by another task. 2. Table metadata files: It contains previous metadata
* and statistics files, which are grouped and deleted in batch
*/
// TODO: Rename this class since we introducing metadata cleanup here
public class ManifestFileCleanupTaskHandler implements TaskHandler {
Copy link
Author

@danielhumanmod danielhumanmod Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming this task will triger lots of relevent changes. If a rename is needed, we may want to handle it in a separate PR to avoid too much changes (Leave a TODO here)

public static final int MAX_ATTEMPTS = 3;
public static final int FILE_DELETION_RETRY_MILLIS = 100;
Expand All @@ -68,58 +71,110 @@ public boolean canHandleTask(TaskEntity task) {
@Override
public boolean handleTask(TaskEntity task) {
ManifestCleanupTask cleanupTask = task.readData(ManifestCleanupTask.class);
ManifestFile manifestFile = decodeManifestData(cleanupTask.getManifestFileData());
TableIdentifier tableId = cleanupTask.getTableId();
try (FileIO authorizedFileIO = fileIOSupplier.apply(task)) {

// if the file doesn't exist, we assume that another task execution was successful, but failed
// to drop the task entity. Log a warning and return success
if (!TaskUtils.exists(manifestFile.path(), authorizedFileIO)) {
if (cleanupTask.getManifestFileData() != null) {
ManifestFile manifestFile = decodeManifestData(cleanupTask.getManifestFileData());
return cleanUpManifestFile(manifestFile, authorizedFileIO, tableId);
} else if (cleanupTask.getMetadataFiles() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to belabor this, but I don't want to overload this class with logic for handling many different file types. As I mentioned, the AsyncTaskType.FILE_CLEANUP enum is poorly named, making it sound like this class can just handle any kind of file clean up.

In order to avoid bogging down this PR too much, can we just add a second task type for the metadata files and predicate the logic here on the task type rather than testing the presence of getManifestFileData or getMetadataFiles? A future PR can refactor this code to use a common base class for two handlers to avoid duplicating logic while maintaining a clear separation of responsibilities.

return cleanUpMetadataFiles(cleanupTask.getMetadataFiles(), authorizedFileIO, tableId);
} else {
LOGGER
.atWarn()
.addKeyValue("manifestFile", manifestFile.path())
.addKeyValue("tableId", tableId)
.log("Manifest cleanup task scheduled, but manifest file doesn't exist");
.log("Cleanup task scheduled, but input file doesn't exist");
return true;
}
}
}

ManifestReader<DataFile> dataFiles = ManifestFiles.read(manifestFile, authorizedFileIO);
List<CompletableFuture<Void>> dataFileDeletes =
StreamSupport.stream(
Spliterators.spliteratorUnknownSize(dataFiles.iterator(), Spliterator.IMMUTABLE),
false)
.map(
file ->
tryDelete(
tableId, authorizedFileIO, manifestFile, file.path().toString(), null, 1))
.toList();
LOGGER.debug(
"Scheduled {} data files to be deleted from manifest {}",
dataFileDeletes.size(),
manifestFile.path());
try {
// wait for all data files to be deleted, then wait for the manifest itself to be deleted
CompletableFuture.allOf(dataFileDeletes.toArray(CompletableFuture[]::new))
.thenCompose(
(v) -> {
LOGGER
.atInfo()
.addKeyValue("manifestFile", manifestFile.path())
.log("All data files in manifest deleted - deleting manifest");
return tryDelete(
tableId, authorizedFileIO, manifestFile, manifestFile.path(), null, 1);
})
.get();
return true;
} catch (InterruptedException e) {
LOGGER.error(
"Interrupted exception deleting data files from manifest {}", manifestFile.path(), e);
throw new RuntimeException(e);
} catch (ExecutionException e) {
LOGGER.error("Unable to delete data files from manifest {}", manifestFile.path(), e);
return false;
}
private boolean cleanUpManifestFile(
Copy link
Author

@danielhumanmod danielhumanmod Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lots of changes here, but don’t worry—it’s mainly because I refactored the deletion logic for the manifest and all its data into a new method; no other changes were made in lines 91-135.

ManifestFile manifestFile, FileIO fileIO, TableIdentifier tableId) {
// if the file doesn't exist, we assume that another task execution was successful, but
// failed to drop the task entity. Log a warning and return success
if (!TaskUtils.exists(manifestFile.path(), fileIO)) {
LOGGER
.atWarn()
.addKeyValue("manifestFile", manifestFile.path())
.addKeyValue("tableId", tableId)
.log("Manifest cleanup task scheduled, but manifest file doesn't exist");
return true;
}

ManifestReader<DataFile> dataFiles = ManifestFiles.read(manifestFile, fileIO);
List<CompletableFuture<Void>> dataFileDeletes =
StreamSupport.stream(
Spliterators.spliteratorUnknownSize(dataFiles.iterator(), Spliterator.IMMUTABLE),
false)
.map(file -> tryDelete(tableId, fileIO, manifestFile, file.path().toString(), null, 1))
.toList();
LOGGER.debug(
"Scheduled {} data files to be deleted from manifest {}",
dataFileDeletes.size(),
manifestFile.path());
try {
// wait for all data files to be deleted, then wait for the manifest itself to be deleted
CompletableFuture.allOf(dataFileDeletes.toArray(CompletableFuture[]::new))
.thenCompose(
(v) -> {
LOGGER
.atInfo()
.addKeyValue("manifestFile", manifestFile.path())
.log("All data files in manifest deleted - deleting manifest");
return tryDelete(tableId, fileIO, manifestFile, manifestFile.path(), null, 1);
})
.get();
return true;
} catch (InterruptedException e) {
LOGGER.error(
"Interrupted exception deleting data files from manifest {}", manifestFile.path(), e);
throw new RuntimeException(e);
} catch (ExecutionException e) {
LOGGER.error("Unable to delete data files from manifest {}", manifestFile.path(), e);
return false;
}
}

private boolean cleanUpMetadataFiles(
List<String> metadataFiles, FileIO fileIO, TableIdentifier tableId) {
List<String> validFiles =
metadataFiles.stream().filter(file -> TaskUtils.exists(file, fileIO)).toList();
if (validFiles.isEmpty()) {
LOGGER
.atWarn()
.addKeyValue("metadataFiles", metadataFiles.toString())
.addKeyValue("tableId", tableId)
.log("Table metadata cleanup task scheduled, but the none of the file in batch exists");
return true;
}
if (validFiles.size() < metadataFiles.size()) {
List<String> missingFiles =
metadataFiles.stream().filter(file -> !TaskUtils.exists(file, fileIO)).toList();
LOGGER
.atWarn()
.addKeyValue("metadataFiles", metadataFiles.toString())
.addKeyValue("missingFiles", missingFiles)
.addKeyValue("tableId", tableId)
.log(
"Table metadata cleanup task scheduled, but {} files in the batch are missing",
missingFiles.size());
}

// Schedule the deletion for each file asynchronously
List<CompletableFuture<Void>> deleteFutures =
validFiles.stream().map(file -> tryDelete(tableId, fileIO, null, file, null, 1)).toList();

try {
// Wait for all delete operations to finish
CompletableFuture<Void> allDeletes =
CompletableFuture.allOf(deleteFutures.toArray(new CompletableFuture[0]));
allDeletes.join();
} catch (Exception e) {
LOGGER.error("Exception detected during metadata file deletion", e);
return false;
}

return true;
}

private static ManifestFile decodeManifestData(String manifestFileData) {
Expand All @@ -134,16 +189,16 @@ private CompletableFuture<Void> tryDelete(
TableIdentifier tableId,
FileIO fileIO,
ManifestFile manifestFile,
String dataFile,
String file,
Throwable e,
int attempt) {
if (e != null && attempt <= MAX_ATTEMPTS) {
LOGGER
.atWarn()
.addKeyValue("dataFile", dataFile)
.addKeyValue("file", file)
.addKeyValue("attempt", attempt)
.addKeyValue("error", e.getMessage())
.log("Error encountered attempting to delete data file");
.log("Error encountered attempting to delete file");
}
if (attempt > MAX_ATTEMPTS && e != null) {
return CompletableFuture.failedFuture(e);
Expand All @@ -155,27 +210,27 @@ private CompletableFuture<Void> tryDelete(
// file's existence, but then it is deleted before we have a chance to
// send the delete request. In such a case, we <i>should</i> retry
// and find
if (TaskUtils.exists(dataFile, fileIO)) {
fileIO.deleteFile(dataFile);
if (TaskUtils.exists(file, fileIO)) {
fileIO.deleteFile(file);
} else {
LOGGER
.atInfo()
.addKeyValue("dataFile", dataFile)
.addKeyValue("manifestFile", manifestFile.path())
.addKeyValue("file", file)
.addKeyValue("manifestFile", manifestFile != null ? manifestFile.path() : "")
.addKeyValue("tableId", tableId)
.log("Manifest cleanup task scheduled, but data file doesn't exist");
.log("table file cleanup task scheduled, but data file doesn't exist");
}
},
executorService)
.exceptionallyComposeAsync(
newEx -> {
LOGGER
.atWarn()
.addKeyValue("dataFile", dataFile)
.addKeyValue("tableIdentifer", tableId)
.addKeyValue("manifestFile", manifestFile.path())
.addKeyValue("dataFile", file)
.addKeyValue("tableIdentifier", tableId)
.addKeyValue("manifestFile", manifestFile != null ? manifestFile.path() : "")
.log("Exception caught deleting data file from manifest", newEx);
return tryDelete(tableId, fileIO, manifestFile, dataFile, newEx, attempt + 1);
return tryDelete(tableId, fileIO, manifestFile, file, newEx, attempt + 1);
},
CompletableFuture.delayedExecutor(
FILE_DELETION_RETRY_MILLIS, TimeUnit.MILLISECONDS, executorService));
Expand All @@ -185,12 +240,18 @@ private CompletableFuture<Void> tryDelete(
public static final class ManifestCleanupTask {
private TableIdentifier tableId;
private String manifestFileData;
private List<String> metadataFiles;

public ManifestCleanupTask(TableIdentifier tableId, String manifestFileData) {
this.tableId = tableId;
this.manifestFileData = manifestFileData;
}

public ManifestCleanupTask(TableIdentifier tableId, List<String> metadataFiles) {
this.tableId = tableId;
this.metadataFiles = metadataFiles;
}

public ManifestCleanupTask() {}

public TableIdentifier getTableId() {
Expand All @@ -209,17 +270,26 @@ public void setManifestFileData(String manifestFileData) {
this.manifestFileData = manifestFileData;
}

public List<String> getMetadataFiles() {
return metadataFiles;
}

public void setMetadataFiles(List<String> metadataFiles) {
this.metadataFiles = metadataFiles;
}

@Override
public boolean equals(Object object) {
if (this == object) return true;
if (!(object instanceof ManifestCleanupTask that)) return false;
return Objects.equals(tableId, that.tableId)
&& Objects.equals(manifestFileData, that.manifestFileData);
&& Objects.equals(manifestFileData, that.manifestFileData)
&& Objects.equals(metadataFiles, that.metadataFiles);
}

@Override
public int hashCode() {
return Objects.hash(tableId, manifestFileData);
return Objects.hash(tableId, manifestFileData, metadataFiles);
}
}
}
Loading