Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator #43502

Closed
wants to merge 37 commits into from

Conversation

zhaomin1423
Copy link
Member

@zhaomin1423 zhaomin1423 commented Oct 24, 2023

What changes were proposed in this pull request?

use java.lang.ref.Cleaner instead of finalize() for RocksDBIterator

Why are the changes needed?

The finalize() method has been marked as deprecated since Java 9 and will be removed in the future, java.lang.ref.Cleaner is the more recommended solution.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass actions and new tests

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Oct 24, 2023
@zhaomin1423
Copy link
Member Author

cc @LuciferYang

@LuciferYang
Copy link
Contributor

also cc @mridulm @beliefer @srowen @dongjoon-hyun

@LuciferYang
Copy link
Contributor

LuciferYang commented Oct 24, 2023

Although the current pr only fixes part of RocksDB now, I suggest adding the LevelDB part in this PR once the RocksDB part is deemed feasible. I prefer to complete them in one.

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we finish the tests?

@zhaomin1423
Copy link
Member Author

Although the current pr only fixes part of RocksDB now, I suggest adding the LevelDB part in this PR once the RocksDB part is deemed feasible. I prefer to complete them in one.

ok, will add LevelDB part later.

@zhaomin1423
Copy link
Member Author

How do we finish the tests?

I haven’t thought of a good testing method yet, do you have any suggestions?

@LuciferYang
Copy link
Contributor

LuciferYang commented Oct 24, 2023

@zhaomin1423

Firstly, we may need to add two helper methods:

  1. Add a method in RocksDB that can get its corresponding WeakReference through RocksDBIterator, here I assume it's called iteratorReference
  2. Add a method in RocksDBIterator to get RocksIterator, here I assume it's called internalIterator

Then, the test case may like the following:

@Test
public void testResourceCleaner() throws Exception {
  File dbPathForCleanerTest = File.createTempFile(
    "test_db_cleaner.", ".rdb");
  dbPathForCleanerTest.delete();

  RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
  for (int i = 0; i < 8192; i++) {
    dbForCleanerTest.write(createCustomType1(i));
  }

  RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
  Optional<Reference<RocksDBIterator<?>>> referenceOpt = dbForCleanerTest.iteratorReference(rocksDBIterator);
  assertTrue(referenceOpt.isPresent());
  RocksIterator it = rocksDBIterator.internalIterator();
  assertTrue(it.isOwningHandle()); // it has not been closed yet, isOwningHandle should be true.
  rocksDBIterator = null; // Manually set rocksDBIterator to null, to be GC.
  Reference<RocksDBIterator<?>> ref = referenceOpt.get();
  // 100 times gc, the rocksDBIterator should be GCed.
  int count = 0;
  while (count < 100 && !ref.refersTo(null)) {
    System.gc();
    count++;
    Thread.sleep(100);
  }
  assertTrue(ref.refersTo(null)); // check rocksDBIterator should be GCed
  // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
  assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));

  dbForCleanerTest.close();
}

Of course, we can also add a state in ResourceCleaner to indicate whether it has been executed, initially set to false, and turns true after execution. These are just some of my thoughts, there might be better ways to test.

@zhaomin1423
Copy link
Member Author

@zhaomin1423

Firstly, we may need to add two helper methods:

  1. Add a method in RocksDB that can get its corresponding WeakReference through RocksDBIterator, here I assume it's called iteratorReference
  2. Add a method in RocksDBIterator to get RocksIterator, here I assume it's called internalIterator

Then, the test case may like the following:

@Test
public void testResourceCleaner() throws Exception {
  File dbPathForCleanerTest = File.createTempFile(
    "test_db_cleaner.", ".rdb");
  dbPathForCleanerTest.delete();

  RocksDB dbForCleanerTest = new RocksDB(dbPathForCleanerTest);
  for (int i = 0; i < 8192; i++) {
    dbForCleanerTest.write(createCustomType1(i));
  }

  RocksDBIterator<CustomType1> rocksDBIterator = (RocksDBIterator<CustomType1>) dbForCleanerTest.view(CustomType1.class).iterator();
  Optional<Reference<RocksDBIterator<?>>> referenceOpt = dbForCleanerTest.iteratorReference(rocksDBIterator);
  assertTrue(referenceOpt.isPresent());
  RocksIterator it = rocksDBIterator.internalIterator();
  assertTrue(it.isOwningHandle()); // it has not been closed yet, isOwningHandle should be true.
  rocksDBIterator = null; // Manually set rocksDBIterator to null, to be GC.
  Reference<RocksDBIterator<?>> ref = referenceOpt.get();
  // 100 times gc, the rocksDBIterator should be GCed.
  int count = 0;
  while (count < 100 && !ref.refersTo(null)) {
    System.gc();
    count++;
    Thread.sleep(100);
  }
  assertTrue(ref.refersTo(null)); // check rocksDBIterator should be GCed
  // Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
  assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));

  dbForCleanerTest.close();
}

Of course, we can also add a state in ResourceCleaner to indicate whether it has been executed, initially set to false, and turns true after execution. These are just some of my thoughts, there might be better ways to test.

Adding a state is simpler and works for different classes, but I don't know if adding variables for testing is a common approach.

@LuciferYang
Copy link
Contributor

Let's add a test case first.

@LuciferYang LuciferYang changed the title [SPARK-45533][CORE]Use j.l.r.Cleaner instead of finalize for RocksDBIterator [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator Oct 25, 2023
// Verify that the Cleaner will be executed after a period of time, and it.isOwningHandle() will become false.
assertTimeout(java.time.Duration.ofSeconds(5), () -> assertFalse(it.isOwningHandle()));

dbForCleanerTest.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move dbForCleanerTest.close() to finally block and t would be better to add FileUtils.deleteQuietly(dbPathForCleanerTest); after close in finally block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's fine as long as we can ensure there are no residues after testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, if remove it, there is an exception.

org.rocksdb.RocksDBException: `/Users/zhaomin/Files/project/spark/common/kvstore/target/tmp/test_db_cleaner.6329921973463604622.rdb' exists but is not a directory

	at org.rocksdb.RocksDB.open(Native Method)
	at org.rocksdb.RocksDB.open(RocksDB.java:249)
	at org.apache.spark.util.kvstore.RocksDB.<init>(RocksDB.java:122)
	at org.apache.spark.util.kvstore.RocksDB.<init>(RocksDB.java:116)

return iteratorTracker;
}

@VisibleForTesting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. this method can be moved to RocksDBSuite.java as a private method due to RocksDB already has getIteratorTracker(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@LuciferYang
Copy link
Contributor

friendly ping @dongjoon-hyun @mridulm @beliefer Could you please take another look at this pr if you have time? Thanks ~

@zhaomin1423 zhaomin1423 changed the title [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator Oct 26, 2023
@LuciferYang
Copy link
Contributor

Merged into master for Spark 4.0.
Thanks @zhaomin1423 ! Your work is greatly appreciated.
Thank you for your review! @mridulm @beliefer

@zhaomin1423
Copy link
Member Author

Merged into master for Spark 4.0. Thanks @zhaomin1423 ! Your work is greatly appreciated. Thank you for your review! @mridulm @beliefer

Thank you very much for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants