Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesos Elasticsearch framework eating up all the offers. #575

Open
rsingh2411 opened this issue Jul 19, 2016 · 9 comments
Open

Mesos Elasticsearch framework eating up all the offers. #575

rsingh2411 opened this issue Jul 19, 2016 · 9 comments

Comments

@rsingh2411
Copy link

Hi All,

We are using mesos elasticsearch framework to bring up elastic search docker containers. We are using flocker along wih docker. There has been a observation that sometimes the executors come up quite quickly and every thing runs fine. However when there is an network latency then or issue with openstack(openstack is being used as infra) then executors take a lot of time to come up or they don't come up at all and task remains in staging state only.

The issue is during this time all offers are hogged by scheduler and they not processes. when I suspend the framework from marathon, then all the offers are released. Is this the intended behavior, as we have other frameworks and tasks running parallelly and they are in starvation mode. why isn't the decline offer working,

Is it because until and unless task execution completes scheduler thread cannot process the offers( driver.launchTasks call resource offers method in ElasticsearchScheduler.java).

is there any workaround.. I have seen in marathon they have implemented the filter in declineoffers method and revive offer calls, Is the same fix required here.
Refer mesosphere/marathon#1931

@rsingh2411 rsingh2411 changed the title Mesos Elasticsearch framework using all the offers. Mesos Elasticsearch framework eating up all the offers. Jul 19, 2016
@philwinder
Copy link
Contributor

This is one (of many) aspect of Mesos that I don't like. In the Offer's tab in the UI, you would intuitively think that this represents the in use offers. It doesn't. It represents the offers to frameworks that have not accepted or declined them. I.e. they are in limbo.

You have a concoction of infrastructure and applications, so I'm unable to give you any specific help. But I can point you in the right direction.

I can't comment on the executor start speed. That is more likely to be an infra/setup problem. But without evidence, I can't be much more help here. Keep digging.

The offer hogging should definitely not happen. The ES framework should only ever accept a certain number of offers then start staging. During the staging, it should not accept any more offers because as far as the framework is concerned, all tasks are now isolated.

I have never seen this before, so I'm reasonably confident that it's not the fault of the framework. Again, keep digging and try to provide evidence.

And I'm sure that this workaround isn't any help here. The ES framework simply declines the offers when it receives them. It never attempts to revive offers. You will see that Mesos offers tasks repeatedly every 2 seconds or so, and ES should simply decline them each time. Check the logs.

Sorry I can't be of more help at this point.

@rsingh2411
Copy link
Author

rsingh2411 commented Jul 21, 2016

Hi Phil,

I agree with most of the points, but the scenario which we faced is.
no of elasticsearch nodes = 1

  1. The first elastic search node was not responding due to some flocker issue
  2. The offers UI tab showed that almost all resources were offered to the elasticsearch scheduler.
  3. In the logs the scheduler was not sending declined offers frequently, which means it was slowly processing the offers.

Question why is master offering all the resources to elasticsearch scheduler?
Answer: Mesos offers tasks repeatedly every 2 seconds, to the scheduler
also found out that offers which are offered to framework don't time out on thier own unless --offer_timeout is specified in master configuration.
As a result of that the master is offering the frameworks almost every time to elasticsearch and other frameworks are starved or get offers quite less frequent.

As per the point 3 , whenever the elastic search node is up and running, the decline offers logs are seen frequently and quite a number of times.
But when first elastic search node is not up and not running, the decline offer logs become less frequent.

I may be wrong but is it that isAtLeastOneESNodeRunning() method in Offerstrategy is taking more time to respond. I checked the Unirest documentation default socket timeout and 60 seconds and connection timeout is 10 seconds. so for every offer decline is taking atleast 10 seconds, may be more than 10.

if node is up, then response time will be atmost two seconds. what do you think?

@philwinder
Copy link
Contributor

Hi @rsingh2411, thanks for the information.

So obviously there should not be unused offers in the offers tab in the UI.

You may be right about the isAtleastOneNode... idea, but I have no way of checking because it seems that the issues are caused by your setup. If you correct your setup and all the nodes are correctly routable, then I suspect these issues will disappear.

@rsingh2411
Copy link
Author

Hi Phil,

one last naive question , is the resourceOffers method Elasticsearchscheduler serial one, i mean Mesos sends batch of offers simulataneously say batch1,batch2,,batch3. Do all three batches execute parallelly or serially.
I think it is a serial one.

@philwinder
Copy link
Contributor

I'm not completely sure about this. I thought that Mesos bundled all offers from different slaves into one offer. So we iterate over the list of offers (serially), but I didn't think that there would be more than one group of offers.

@SlevinBE
Copy link
Contributor

SlevinBE commented Jan 4, 2017

I think I found the cause of this issue. I also encountered something similar with a running elasticsearch cluster receiving resource offers, declining them but resulting in an accepted offer.

Elasticsearch logging indicating it is declining the offer:

[DEBUG] 2017-01-04 12:35:20,640 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler resourceOffers - Declined offer: id { value: "293c719d-9e02-4eae-8c03-736e7a8a4a21-O3275459" }, framework_id { value: "293c719d-9e02-4eae-8c03-736e7a8a4a21-0020" }, slave_id { value: "293c719d-9e02-4eae-8c03-736e7a8a4a21-S2" }, hostname: "10.0.3.188", resources { ......}, Reason: Cluster size already fulfilled

but the mesos logs are showing the following:

I0104 11:27:01.671387  1892 master.cpp:5728] Sending 3 offers to framework 293c719d-9e02-4eae-8c03-736e7a8a4a21-0020 (elasticsearch) at [email protected]:39583
I0104 11:27:01.695638  1888 master.cpp:3342] Processing ACCEPT call for offers: [ 293c719d-9e02-4eae-8c03-736e7a8a4a21-O3267448 ] on agent 293c719d-9e02-4eae-8c03-736e7a8a4a21-S3 at slave(1)@10.0.3.190:5051 (10.0.3.190) for framework 293c719d-9e02-4eae-8c03-736e7a8a4a21-0020 (elasticsearch) at [email protected]:39583
I0104 11:27:01.695823  1888 master.cpp:3342] Processing ACCEPT call for offers: [ 293c719d-9e02-4eae-8c03-736e7a8a4a21-O3267449 ] on agent 293c719d-9e02-4eae-8c03-736e7a8a4a21-S4 at slave(1)@10.0.0.233:5051 (10.0.0.233) for framework 293c719d-9e02-4eae-8c03-736e7a8a4a21-0020 (elasticsearch) at [email protected]:39583
I0104 11:27:01.695963  1888 master.cpp:3342] Processing ACCEPT call for offers: [ 293c719d-9e02-4eae-8c03-736e7a8a4a21-O3267450 ] on agent 293c719d-9e02-4eae-8c03-736e7a8a4a21-S0 at slave(1)@10.0.3.189:5051 (10.0.3.189) for framework 293c719d-9e02-4eae-8c03-736e7a8a4a21-0020 (elasticsearch) at [email protected]:39583

I found the following Mesos issue of a declineOffer resulting in an acceptOffer, which was fixed in Mesos 0.26.0:
https://issues.apache.org/jira/browse/MESOS-3522

However, this elasticsearch project is still using Mesos version 0.25.0:
https://github.com/mesos/elasticsearch/blob/master/build.gradle#L27

So an upgrade to at least Mesos 0.26.0 should resolve this issue.

@philwinder
Copy link
Contributor

Hi @SlevinBE
Thanks for that thorough debugging.

That line you linked referes to the API used by the code. Not the version that it is compatible with.

AFAIK you should be able to use it on any Mesos version (greater 0.25, possibly even below). It depends on whether they change the API or not.

Give it a try and let me know. Thanks.

@SlevinBE
Copy link
Contributor

SlevinBE commented Jan 4, 2017

oh indeed, you're right, it's only the API being used in this project and not the implementation.

@SlevinBE
Copy link
Contributor

SlevinBE commented Jan 5, 2017

I think I found it now. We use the 'mesos/elasticsearch-scheduler:1.0.1' docker image to deploy elasticsearch on DC/OS, which is based on the following common dockerfile (correct me if I'm wrong):
https://github.com/mesos/elasticsearch/blob/master/commons/Dockerfile#L12

However, at the indicated line it installs mesos 0.25.0 on the image, which still has the previously mentioned bug in it. This code will be called by the Mesos Java API via JNI if I'm right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants