Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue-967 #969

Closed
wants to merge 15 commits into from
Closed

Issue-967 #969

wants to merge 15 commits into from

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Nov 14, 2017

Initial implementation for getsAsync()
Created the following classes:
AsyncReader
TransactionImplTest

Created the following classes:
  AsyncReader
  TransactionImplTest
@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 14, 2017

This is my first pass at implementing getsAsync, though I haven't had a good chance to look it over and test properly. I started to try and test it but was having trouble creating a transaction to test with. I'm going to give it another go tomorrow.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 14, 2017

Also need to implement close() still

import org.apache.fluo.core.impl.TransactionImpl;

public class AsyncReader {
private PriorityQueue<AsyncGet> getsQueue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using a BlockingQueue and not a PriorityQueue. No need to sort request.


public AsyncReader(TransactionImpl tx) {
this.tx = tx;
getsQueue = new PriorityQueue<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could create a LinkedBlockingQueue

AsyncGet curAsyncGet = new AsyncGet(row, column, defaultValue);
getsQueue.add(curAsyncGet);
executorService.submit(() -> {
int i = 0;
Copy link
Contributor

@keith-turner keith-turner Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the face of concurrency this lambda is not correct. I would recommend something like the following and I will try to explain why.

  //use arraylist instead of hashset because it has better performance and no need for unique
  List <AsyncGet> gets = new ArrayList();  

  asyncGets.drainTo(gets);

  //not sure if I have the syntax below correct, but I think Guava has this method
  Collection<RowColumn> rowColumns = Collections2.transform(gets, ag -> ag.rc); 
  Map<RowColumn, Bytes> getsMap = tx.get(rowColumns);

  //process everything we drained off the queue... do not want to touch the queue as new things may have been added.
  for(AsyncGet asynGet : gets) {
    Bytes result = getsMap.get(asyncGet.rc);
    asyncGet.res.complete(result);  //TODO handle default value
  }

With the current code if the execservice had multiple threads then multiple threads could process the same AsyncGet request. Using drainTo with LinkedBlockingQueue avoids this. This is not true for all collections, but this particular collection impl gets a lock in drainTo.

Map<RowColumn, Bytes> getsMap = tx.get(rowColumns);

while (!getsQueue.isEmpty()) {
AsyncGet asyncGet = getsQueue.poll();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, should not access queue here. Consider the case where things were added to the queue after get was called above. For those cases null or defaultValue may be incorrectly returned.

import org.junit.Assert;
import org.junit.Test;

public class TransactionImplTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be in a different place. Fluo has Integration test (called ITs) that run against a real Fluo instance (backed by real Accumulo and Zookeeper instances). These test are at fluo/modules/integration/src/test/java/org/apache/fluo/integration. I would recommend putting your test there under client or impl and extending ITBaseImpl. After that each test should have a FluoClient called client available. So should be able to do the following.

@Test
 public void testBasic() {
     try (Transaction tx = client.newTransaction()) {
          tx.set("row1", new Column("col1"), "val1");
          tx.set("row2", new Column("col2"), "val2");
          tx.set("row3", new Column("col3"), "val3");
 
          tx.commit();
     }

     try (Transaction tx = client.newTransaction()) {
         //do async gets....
     }
     
}

Changed getsAsync tasks to drain the AsyncGet Queue.
Deleted TransactionImplTest and created TransactionImplIT.
Assert.assertEquals("val2", tx.getsAsync("row2", new Column("col2"), "foo").get());
Assert.assertEquals("val3", tx.getsAsync("row3", new Column("col3")).get());
Assert.assertEquals("val4", tx.getsAsync("row4", new Column("col4"), "val4").get());
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can make the test method throw an exception and drop this catch.

Assert.assertEquals("val4", tx.getsAsync("row4", new Column("col4"), "val4").get());
} catch (Exception e) {
Assert.fail("getsAsync caused the following exception: " + e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to also test the following :

 try (Transaction tx = client.newTransaction()) {
  Future<Bytes> val1 = tx.getsAsync("row1", new Column("col1"));
  Future<Bytes> val2 = tx.getsAsync("row2", new Column("col2"), "foo");
  Future<Bytes> val3 = tx.getsAsync("row3", new Column("col3"));
  Future<Bytes> val4 = tx.getsAsync("row4", new Column("col4"), "val4");
 
  //TODO check values
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkosh44 did you see the comment above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, does the testgetAsync() method not cover these cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, the test immediately calls get() on the Future and get() blocks so its never doing more than one async operation at time. I modified one of the test to do the following and ran it locally, I put in the timing info to see make sure the getAsync calls w/o the get() were fast.

    try (Transaction tx = client.newTransaction()) {
      long t1 = System.currentTimeMillis();
      Assert.assertEquals("val1", tx.getsAsync("row1", new Column("col1")).get());
      Assert.assertEquals("val2", tx.getsAsync("row2", new Column("col2"), "foo").get());
      Assert.assertEquals("val3", tx.getsAsync("row3", new Column("col3")).get());
      Assert.assertEquals("val4", tx.getsAsync("row4", new Column("col4"), "val4").get());
      long t2 = System.currentTimeMillis();
      System.out.println(t2-t1);
    }

    try (Transaction tx = client.newTransaction()) {
      long t1 = System.currentTimeMillis();
      CompletableFuture<String> val1 = tx.getsAsync("row1", new Column("col1"));
      CompletableFuture<String> val2 = tx.getsAsync("row2", new Column("col2"), "foo");
      CompletableFuture<String> val3 = tx.getsAsync("row3", new Column("col3"));
      CompletableFuture<String> val4 = tx.getsAsync("row4", new Column("col4"), "val4");

      long t2 = System.currentTimeMillis();

      Assert.assertEquals("val1", val1.get());
      Assert.assertEquals("val2", val2.get());
      Assert.assertEquals("val3", val3.get());
      Assert.assertEquals("val4", val4.get());
      long t3 = System.currentTimeMillis();

      System.out.println((t2-t1) + " " + (t3- t2));
    }

@@ -165,4 +167,14 @@
* @return transactions start timestamp allocated from Oracle.
*/
long getStartTimestamp();

default CompletableFuture<String> getsAsync(String row, Column column) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need Bytes getAsync(Bytes row, Column column) and Bytes getAsync(Bytes row, Column column) methods also. The string methods could just call the bytes methods.


asyncGetsQueue.drainTo(getsList);

Collection<RowColumn> rowColumns = Collections2.transform(getsList, ag -> ag.rc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its possible this will throw an exception. If it does, need to complete all of the futures exceptionally.

…nd added unit tests.

Added an exception catch during async methods to complete exceptionally
@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 18, 2017

Just a note, I noticed all of the get methods with Byte were called "get(...)" while all the get methods with Strings were called "gets(...)". I kept this consistent with the async methods.

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkosh44 this is looking good. I am thinking of holding this PR until after 1.2.0 is released. The reason I would like to do that is because there are some other places in the API where it would be nice to have async functionality, like LoaderExecutor and the commit method.

I would like all of the async changes to be considered together, and not done across releases. If we release this in 1.2.0 and then discover we would like to change this API while workiing on other async functionality then it can not be changed w/o deprecation.

Also I would like to do some performance testing of this change. When do that, I will post the results here.

executorService.shutdown();
}

class AsyncGet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make this static. Since it is not static it has an extra hidden reference to AsyncReader.this which is not used and makes the objects slightly larger.


public CompletableFuture<String> gets(String row, Column column, String defaultValue) {
Bytes defaultValueBytes = defaultValue == null ? new Bytes() : Bytes.of(defaultValue);
return get(Bytes.of(row), column, defaultValueBytes).thenApply(b -> b.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like the use of thenApply here.

}
} catch (Exception e) {
for (AsyncGet asyncGet : getsList) {
asyncGet.res.completeExceptionally(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an exception were to occur in the for loop inside the try, then it is possible that some of the futures are already completed. That case seems unexpected, not sure what if anything we should do about that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about removing each AsyncGet object from getsList as we complete it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the javadoc and I don't think anything needs to be done. The javadoc for completeExceptionally says the following

If not already completed, causes invocations of get() and related methods to throw the given exception.

So it seems if its already completed, then nothing will happen.


@Override
public CompletableFuture<String> getsAsync(String row, Column column) {
if (asyncReader == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could shorten these method with a method like the following

private AsyncReader getAsyncReader() {
  if (asyncReader == null) {
    asyncReader = new AsyncReader(this);
  }
}

Then could do the following below

  return getAsyncReader().gets(row, column);

}

default CompletableFuture<String> getsAsync(String row, Column column, String defaultValue) {
return CompletableFuture.supplyAsync(() -> gets(row, column, defaultValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see my comment on the issue about supplyAsync? These request will be put on a JVM wide thread pool and these request do I/O. So they may block other unrelated work in the JVM, but not completely sure. I will try to look into this some more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see but I thought it was OK for the default method to be blocking as long as the implementations weren't blocking. I can remove the default implementation all together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The supplyAsync method will run the task on the ForkJoin common pool which is a JVM wide pool. I was thinking the default method should just do something like the following to avoid this.

  return CompletableFuture.completedFuture(gets(row, column, defaultValue));

@keith-turner
Copy link
Contributor

@jkosh44 I noticed your branch is based off an older commit. I usually add a git remote called upstream like the following.

git remote add upstream https://github.com/apache/fluo.git

Then with that I added I update my local master with the latest commits from upstream before branching for a new issue.

# update all remote tracking branches
git fetch --all
git checkout master
# bring local master branch up to date with latest commits
git merge upstream/master
# create new branch from latest master
git checkout -b issue-xyz

Don't worry about this PR being based on an older commit, its ok.

…emoved from the list so it is not later completed excpetionally in the case of an exception
@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 21, 2017

Is it too late to follow the git upstream steps for this issue?

@keith-turner
Copy link
Contributor

Is it too late to follow the git upstream steps for this issue?

No, you can try. If you are new to git I would recommend experimenting in another branch. Try the following just to see what happens. Does the rebase command have conflicts?

git checkout issue-967
# create a new branch to test rebase
git checkout -b issue-967-rebase-test
git fetch --all
git rebase upstream/master

for (AsyncGet asyncGet : getsList) {
Bytes result = getsMap.get(asyncGet.rc);
asyncGet.res.complete(result == null ? defaultValue : result);
getsList.remove(asyncGet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following comments are moot because I don't think things needs to be removed based on earlier comment. Just sharing some knowledge in following comments.

This could cause a concurrent modification exception. The foreach has a hidden iterator. In java, many iterators will throw a concurrent modification exception if the collection is modified during iteration. Using a while loop with an explicit iterator and calling remove on the iterator is the best way to remove while iterating.

Also removing items from an arraylist could be an O(N^2) operation. In the situation of removing from the middle of a list a linked list may be better.


for (AsyncGet asyncGet : getsList) {
Bytes result = getsMap.get(asyncGet.rc);
asyncGet.res.complete(result == null ? defaultValue : result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be asyncGet.defaultValue

Corrected errors involved with defaultValue
No longer remove completed Futures from AsyncGet list
@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 22, 2017

My latest commit should address all the issues mentioned.
I changed to asyncGet.defaultValue.
I no longer remove AsyncGets from getsList after completing them.
I changed the default method to your suggestion.
The rebase command had a conflict at TransactionImpl.java because of one of the imports, I fixed the conflict and then the rebase continued and completed fine

@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 22, 2017

@keith-turner Also I never commented on the 1.2.0 release. That sounds good to me. If you need someone to work on other Async functionality let me know and I'd be happy to help.

Created the following classes:
  AsyncReader
  TransactionImplTest
Changed getsAsync tasks to drain the AsyncGet Queue.
Deleted TransactionImplTest and created TransactionImplIT.
…nd added unit tests.

Added an exception catch during async methods to complete exceptionally
…emoved from the list so it is not later completed excpetionally in the case of an exception
Corrected errors involved with defaultValue
No longer remove completed Futures from AsyncGet list
…e-967

Conflicts:
	modules/core/src/main/java/org/apache/fluo/core/impl/TransactionImpl.java
@keith-turner
Copy link
Contributor

@jkosh44 these changes are looking good. I am still puzzling out the async story for Fluo. Currently there are a few areas I have identified and opened issue about.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Nov 28, 2017

I left a comment on #722 and will give it a go. On an unrelated note, I feel like all my commits fail on the continuous-integration Travis CI build. Is there a specific reason? I'm not really sure what I'm doing to cause this.

@keith-turner
Copy link
Contributor

On an unrelated note, I feel like all my commits fail on the continuous-integration Travis CI build. Is there a specific reason?

The builds on travis are not always predictable. Sometimes this is an issue with Travis and sometimes its Fluo. I have been trying to correct Fluo's flakiness as we run into it. For your latest commit, the new IT is missing an import stmt for CompletableFuture. Lets see what happens with Travis after that is fixed.

https://travis-ci.org/apache/fluo/builds/307577351#L2882

@jkosh44 jkosh44 changed the title [WIP] Issue-967 Issue-967 Jan 22, 2018
@keith-turner
Copy link
Contributor

I squashed and rebased this and added a few comments in #1099. Looking at this, it would be nice to have in the next release.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Jul 1, 2020

Just reviewed it and compared it to this one and it looks good to me

@keith-turner
Copy link
Contributor

This work was merged in #1099

@ctubbsii ctubbsii added this to the 2.0.0 milestone Jul 1, 2020
@ctubbsii ctubbsii linked an issue Jul 1, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prototype adding async get methods to Transaction
3 participants