Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Parallel Download of Forms using Multi-Threading and Locks #60

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

prabs3257
Copy link

Closes #7

What has been done to verify that this works as intended?

Tested the code by artificially increasing the download time by adding delays, results show that the code works as intended. Verified that all the form files are being downloaded and saved properly.

Why is this the best possible solution? Were any other approaches considered?

Increases the performance of the app exponentially.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

This change does not affect the user

Do we need any specific form for testing your changes? If so, please attach one.

No

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No

Before submitting this PR, please make sure you have:

  • run ./gradlew checkAll and confirmed all checks still pass OR confirm CircleCI build passes and run ./gradlew connectedDebugAndroidTest locally.
  • verified that any code or assets from external sources are properly credited in comments and/or in the about file.
  • verified that any new UI elements use theme colors. UI Components Style guidelines

Comment on lines 143 to 147
try {
installEverything(tempMediaPath, fileResult, parsedFields, formsDirPath, newAttachmentsDetected);
lock.unlock();
} catch (FormDownloadException.DiskError e) {
cleanUp(fileResult, tempMediaPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't released the lock in catch. This could lead to a deadlock if an exception occurs. This also needs to be handled in other catch blocks as well.


index++;
}
executorService.shutdown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executorService may shut down before all the threads have completed work, it is not a blocking method. In such a scenario results might not contain all the forms data.

new Handler(Looper.getMainLooper()).post(new Runnable() {
@Override
public void run() {
results.put(serverFormDetails, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMaps are not thread safe.

}
String currentFormNumber = String.valueOf(index);
String totalForms = String.valueOf(values[0].size());
publishProgress(serverFormDetails.getFormName(), currentFormNumber, totalForms);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this process is not synchronous now, currentFormNumber will return wrong data when used in a concurrent threaded system.

// download media files if there are any
lock.lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've acquired the lock at the beginning of the process and released the lock at the end of the download process. What this essentially does is:

  • A new thread acquires the lock to download form media.
  • The thread downloads the form media by making a web request.
  • The thread releases the lock.

How is different than synchronous processing if the other threads are anyways waiting for the lock to be released while the download process is on?

@@ -94,8 +98,8 @@ private void processOneForm(ServerFormDetails fd, OngoingWorkListener stateListe
// get the xml file
// if we've downloaded a duplicate, this gives us the file
fileResult = downloadXform(fd.getFormName(), fd.getDownloadUrl(), stateListener, tempDir, formsDirPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this method is thread safe? If so, how?

@prabs3257
Copy link
Author

prabs3257 commented Jun 7, 2023

@chinmoy12c These changes should make the media downloads also parallel and should properly shut down the executorservice and I have also released the locks in the catch blocks also. Also used locks in the downloadXForm function so that its thread safe. Is this correct ?

Copy link
Member

@chinmoy12c chinmoy12c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also find a way to test these with a large number of forms. Could you set up such an environment and share the test results?

import org.odk.collect.forms.FormSourceException
import org.odk.collect.forms.FormsRepository
import org.odk.collect.forms.MediaFile
import org.odk.collect.forms.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove * imports and use specific imports.


class FormMediaDownloader(
private val formsRepository: FormsRepository,
private val formSource: FormSource
) {

private val lock: Lock = ReentrantLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, add a newline.

executorService.shutdown();
// Wait for all tasks to complete
try {
executorService.awaitTermination(1, TimeUnit.MINUTES);
Copy link
Member

@chinmoy12c chinmoy12c Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the basis for choosing 1 minute for timeout? If there are 500 forms, will those be completed in 500 forms?

String totalForms = String.valueOf(values[0].size());
publishProgress(serverFormDetails.getFormName(), currentFormNumber, totalForms);

Thread thread = new Thread() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we limiting resource consumption on this? If there are a large number of forms let's say 1000 forms, this would open 1000 threads which is not exactly resource friendly. We should provide a configurable way to limit the number of threads.

@prabs3257
Copy link
Author

prabs3257 commented Jun 15, 2023

@chinmoy12c @charanpreet-samagra I have set up my central project with 100 forms to simulate a demanding task. The proposed solution completes the download of all the forms in roughly 2 seconds (where download start time = 03:40:23.052 and end time = 03:40:25.000)

When the number of forms was increased in the project to 514. The proposed solution completes the download of all the forms in roughly 8 seconds (where download start time = 15:04:27.172 and end time = 15:04:35.285)

All these forms were simple forms with only one input field and no media attached. The bulk upload was done using ODK Central API and a self-written python script

This is the profiling when there are 514 forms to download
image

@chinmoy12c
Copy link
Member

@chinmoy12c @charanpreet-samagra I have set up my central project with 100 forms to simulate a demanding task. The proposed solution completes the download of all the forms in roughly 2 seconds (where download start time = 03:40:23.052 and end time = 03:40:25.000)

When the number of forms was increased in the project to 514. The proposed solution completes the download of all the forms in roughly 8 seconds (where download start time = 15:04:27.172 and end time = 15:04:35.285)

All these forms were simple forms with only one input field and no media attached. The bulk upload was done using ODK Central API and a self-written python script

This is the profiling when there are 514 forms to download image

This is great @prabs3257. Could you make the final changes that I requested?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement parallel download of forms
2 participants