Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to remove broadcast variables when they are no longer used #771

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RongGu
Copy link

@RongGu RongGu commented Aug 2, 2013

In Spark, users can create broadcast variables to share read-only variables across tasks or operations,especially when they are large. However, the current Spark does not allow users to remove those variables in one SparkContext. This becomes a major issue for long running Shark servers which uses one SparkContext. To address this issue, this patch allows users to remove broadcast variables when they are no longer used. To remove a broadcast variable, users only need to call the Broadcast.rm(toClearSource:Boolean) methond, the broadcast variable across the slaves will be deleted. If toClearSource is set true, data source (e.g., file used by HttpServer) will be deleted too.

…iables across tasks or operations,especially when they are large. However, the current Spark does not allow user to remove those variables in one SparkContext. This becomes a major issue for long running Shark server which uses one SparkContext. To address this issue, this patch allow user to remove broadcast variables when they are no longer used. To remove a broadcast variable, users only need to call the Broadcast.rm(toClearSource:Boolean) methond,the broadcast variable across the slaves will be deleted. If toClearSource is set true, data source (e.g., file used by HttpServer) will be deleted too.
@AmplabJenkins
Copy link

Thank you for your pull request. An admin will review this request soon.

@@ -46,6 +47,21 @@ extends Broadcast[T](id) with Logging with Serializable {
if (!isLocal) {
sendBroadcast()
}

override def rm(toClearSource: Boolean = false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this function to remove, and toClearSource to releaseSource?

2.Add a parameter to determine whether block managers report the broadcast block to master or not.
@AmplabJenkins
Copy link

Thank you for your pull request. An admin will review this request soon.

@jerryshao
Copy link
Contributor

Hi @RongGu , AFAIK Spark already has a time based automatic clean way in HttpBroadcast when spark.cleaner.ttl is enabled, this can mostly clean JobConf in HadoopRDD, But this mechanism has a issue with Spark Streaming (https://spark-project.atlassian.net/browse/STREAMING-38?jql=project%20%3D%20STREAMING), it would be a great help to use a memory track way to clean the broadcast var automatically, not the time based way.

@RongGu
Copy link
Author

RongGu commented Aug 6, 2013

Hi, @jerryshao , Thanks for your comment. It is nice to make a automatic memory cleaner for broadcast variables. Nevertheless, the purpose of this patch is providing a removing broadcast API to users. These two things do not conflict in essence. For memory cleanup tasks, the lesson I learned is that, whatever program-monitoring mechanisms seems not better than clear the memory explicitly by users if possible. GC can not always be in time and it has overhead costs. Moreover, in this case, it is hard to determine whether a broadcast needed be used by users any more,TTL may lead to error as the issue in the Spark Streaming said. On the other side, it is a problem to leave large unused broadcast variables in memory, and users have no means to handle that. Therefore, ,here we provide a explicit removing broadcast method to users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants