fix #1597 #1623

andyxning · 2015-05-11T08:05:28Z

This fix the problem issued in #1597 and WebUI 128.

naparuba · 2015-05-11T09:35:49Z

Thanks, I'll give a look and ifwe got what we need for the regenerators tests so we can add a fail one and apply this fix :)

gst · 2015-05-11T16:31:28Z

otherwise very good catch&job already :)

andyxning · 2015-05-12T14:05:21Z

@gst ，i have fixed the problem you proposed, you can have a look at the code and if something is still wrong, please let me know.

gst · 2015-05-12T15:08:26Z

but first I'm gonna try to reproduce the problem on my side.. though I'm not completely sure that I've an exact/good view/understanding of it.. we'll see :)

With this the Livestatus was simply not usable at all within a scheduler :| Seen while investigating on shinken-solutions/shinken#1623

gst · 2015-05-12T16:29:19Z

damn..

The function modified here (load_from_scheduler) is only called (as far as I see) by / within a shinken Scheduler daemon (when it has LiveStatus or WebUi enabled), but the problem is that it couldn't have been called because there is/was a bug (the one I've just referenced: shinken-monitoring/mod-livestatus#64) within Livestatus preventing that function from ever being called.. :|

ho boy.. have to "untie" this kind of spaghettis situation.. again :s

mohierf · 2015-05-12T16:40:45Z

+1 for you @andyxning

andyxning · 2015-05-13T01:34:30Z

@gst , i have remove all the setattr with instance.attribute=value pattern.

gst · 2015-05-13T14:03:34Z

great @andyxning (that looks like already slightly better, isn't it ?, hold on there is still better, see bottom.. ;))

but as I explained in my previous comment : I'm rather unsure that the proposed changed will not fix the problem of memory consumption in/with the broker daemon simply because the code/function involved here is only called (as far as I see) if LiveStatus or WebUi are used in the scheduler, and not the broker..

Now maybe this would or could also fix some similar things in the scheduler case.. so this is tbc.. ;)

Now back to this commit and this specific part within the function :

         self.notificationways = c.notificationways
         for notificationway in c.notificationways:
             self.notificationways[notificationway.id].regenerator_refs = 0

this is completely equivalent to :

         self.notificationways = c.notificationways
         for nw in c.notificationways:
             nw.regenerator_refs = 0

which is again, now clearly (I hope you agree ;)), better than the previous already slightly better form.. ;)
(same can be done for the contacts just after and also in manage_initial_contact_status_brok with both lines changed).

but again, I say that only/more for information/teach purpose because again : I'm unsure about all this unfortunately :(

though if that would fix some things in your actual setup/configration regarding the memory problems we are hunting then I'd be happy to see that.

I will be able to get back on this only by next week.

andyxning · 2015-05-14T02:01:47Z

thanks for the feedback. I will think about and refactor this again.

To make more clear, i have not enable regenerator module for Scheduler, so
the logical in load_from_scheduler method is not tested in our setup. I
only thought it should be the same as the situation in Broker. :)

However, in my opinion, we always fix the problem that the main reason to
cause the memory leak. As shown in the code below:

nws = c.notificationways
        safe_print("Got notif ways", nws)
        new_notifways = []
        for cnw in nws:
            new_notifways_flag = False
            nwname = cnw.notificationway_name
            nw = self.notificationways.find_by_name(nwname)
            if not nw:
                safe_print("Creating notif way", nwname)
                nw = NotificationWay([])
                new_notifways_flag = True
            # Now update it
            for prop in NotificationWay.properties:
                if hasattr(cnw, prop):
                    setattr(nw, prop, getattr(cnw, prop))

            # we must call self.notificationways.add_item after
            # the initial or update phase(the above for loop), because we need
            # 'notificationway_name' as the key to insert the notificationway
            # into the self.notificationways.
            if new_notifways_flag:
                self.notificationways.add_item(nw)

            new_notifways.append(nw)

as the comments said that we have fix the notificationways double problem
between each reload.

All the other works are used to make sure that when some contacts are
deleted we can still make some contacts and notificationways deletion,
which is currently not supported in regenerator. Currently, regenerator
only support host and service deletion. all other objects, contacts and
notificationways, will just be inserted or updated without deletion. :)

++++++
Ning Xie

2015-05-13 22:03 GMT+08:00 Grégory Starck [email protected]:

great @andyxning https://github.com/andyxning (that looks like already
slightly better, isn't it ?, hold on there is still better, see bottom.. ;))

but as I explained in my previous comment : I'm rather unsure that the
proposed changed will not fix the problem of memory consumption in/with the
broker daemon simply because the code/function involved here is only called
(as far as I see) if LiveStatus or WebUi are used in the scheduler, and not
the broker..

Now maybe this would or could also fix some similar things in the
scheduler case.. so this is tbc.. ;)

Now back to this commit and this specific part within the function :
     self.notificationways = c.notificationways
     for notificationway in c.notificationways:
         self.notificationways[notificationway.id].regenerator_refs = 0
this is completely equivalent to :
     self.notificationways = c.notificationways
     for nw in c.notificationways:
         nw.regenerator_refs = 0
which is again, now clearly (I hope you agree ;)), better than the
previous already slightly better form.. ;)
(same can be done for the contacts just after and also in
manage_initial_contact_status_brok with both lines changed).

but again, I say that only/more for information/teach purpose because
again : I'm unsure about all this unfortunately :(

though if that would fix some things in your actual setup/configration
regarding the memory problems we are hunting then I'd be happy to see that.

I will be able to get back on this only by next week.

—
Reply to this email directly or view it on GitHub
#1623 (comment).

andyxning · 2015-05-15T07:10:15Z

I have more tested the PR in our environment, and it shows some wrong under some situation.

I don't know why it will throw an exception when deleting notificationways with an KeyError, however it is really true that sometimes we get a moment that self.contacts[contact.id] is not equal with self.contacts[contact.contact_name]. So we changed the criteria to find the specified contact or notificationway with contact_name and notificationway_name instead id which may change between each configuration reload considering contact or notificationway addition or deletion.

andyxning · 2015-05-19T09:50:07Z

I am crazy about the memory linear up. With the patch i give in the pull request, i think i have resolved the memory leak problem. However, we run the modified version with this pull request after four days and the memory actually is still rising. That is very disappointed.

naparuba · 2015-05-19T11:07:27Z

Is the schedulers memory raising the same or not? (like with comments or
downtime objects for example).

On Tue, May 19, 2015 at 11:50 AM, Ning Xie [email protected] wrote:

I am crazy about the memory linear up. With the patch i give in the pull
request, i think i have resolved the memory leak problem. However, we run
the modified version with this pull request after four days and the memory
actually is still rising. That is very disappointed.

—
Reply to this email directly or view it on GitHub
#1623 (comment).

andyxning · 2015-05-19T11:53:26Z

Yes, scheduler seems to be in memory leak too. with the initial memory
usage being 1.7GB, after some configuration reload the memory usage reaches
7.2G.

However, regenerator leaks in WebUI seems has nothing to do with memory
leak in scheduler except configuration broks sent from scheduler.

Please help me get out of this. I have worked on this for more than a week.

++++++
Ning Xie

2015-05-19 19:07 GMT+08:00 Gabès Jean [email protected]:

Is the schedulers memory raising the same or not? (like with comments or
downtime objects for example).

On Tue, May 19, 2015 at 11:50 AM, Ning Xie [email protected]
wrote:

I am crazy about the memory linear up. With the patch i give in the pull
request, i think i have resolved the memory leak problem. However, we run
the modified version with this pull request after four days and the
memory
actually is still rising. That is very disappointed.

—
Reply to this email directly or view it on GitHub
#1623 (comment).

—
Reply to this email directly or view it on GitHub
#1623 (comment).

naparuba · 2015-05-19T14:52:43Z

Ok, so the problem is not in the regenerator or such things I think, nor in
the broker part.

If scheduler are impacted, it means that it's upstream the broker, and this
last one is an impact.

Can you try without retention module? Are the memory grow immediate after
the reload, or grow slooooowly during time? (first one is retention or old
conf managment, second one is about queues)

thanks,

On Tue, May 19, 2015 at 1:53 PM, Ning Xie [email protected] wrote:

Yes, scheduler seems to be in memory leak too. with the initial memory
usage being 1.7GB, after some configuration reload the memory usage reaches
7.2G.

However, regenerator leaks in WebUI seems has nothing to do with memory
leak in scheduler except configuration broks sent from scheduler.

Please help me get out of this. I have worked on this for more than a week.

++++++
Ning Xie

2015-05-19 19:07 GMT+08:00 Gabès Jean [email protected]:

Is the schedulers memory raising the same or not? (like with comments or
downtime objects for example).

On Tue, May 19, 2015 at 11:50 AM, Ning Xie [email protected]
wrote:

I am crazy about the memory linear up. With the patch i give in the
pull
request, i think i have resolved the memory leak problem. However, we
run
the modified version with this pull request after four days and the
memory
actually is still rising. That is very disappointed.

—
Reply to this email directly or view it on GitHub
<#1623 (comment)
.

—
Reply to this email directly or view it on GitHub
#1623 (comment).

—
Reply to this email directly or view it on GitHub
#1623 (comment).

andyxning · 2015-05-20T01:29:30Z

I have not try scheduler without retention module disabled. I will give a
try with that.

I think you can reopen this issue to make more people see it and then may
be we can solve it more quickly. :)

++++++
Ning Xie

2015-05-19 22:52 GMT+08:00 Gabès Jean [email protected]:

Ok, so the problem is not in the regenerator or such things I think, nor in
the broker part.

If scheduler are impacted, it means that it's upstream the broker, and this
last one is an impact.

Can you try without retention module? Are the memory grow immediate after
the reload, or grow slooooowly during time? (first one is retention or old
conf managment, second one is about queues)

thanks,

On Tue, May 19, 2015 at 1:53 PM, Ning Xie [email protected]
wrote:

Yes, scheduler seems to be in memory leak too. with the initial memory
usage being 1.7GB, after some configuration reload the memory usage
reaches
7.2G.

However, regenerator leaks in WebUI seems has nothing to do with memory
leak in scheduler except configuration broks sent from scheduler.

Please help me get out of this. I have worked on this for more than a
week.

++++++
Ning Xie

2015-05-19 19:07 GMT+08:00 Gabès Jean [email protected]:

Is the schedulers memory raising the same or not? (like with comments
or
downtime objects for example).

On Tue, May 19, 2015 at 11:50 AM, Ning Xie [email protected]
wrote:

I am crazy about the memory linear up. With the patch i give in the
pull
request, i think i have resolved the memory leak problem. However, we
run
the modified version with this pull request after four days and the
memory
actually is still rising. That is very disappointed.

—
Reply to this email directly or view it on GitHub
<
#1623 (comment)
.

—
Reply to this email directly or view it on GitHub
<#1623 (comment)
.

—
Reply to this email directly or view it on GitHub
#1623 (comment).

—
Reply to this email directly or view it on GitHub
#1623 (comment).

naparuba · 2015-05-20T07:26:22Z

it's reopen

On Wed, May 20, 2015 at 3:29 AM, Ning Xie [email protected] wrote:

I have not try scheduler without retention module disabled. I will give a
try with that.

I think you can reopen this issue to make more people see it and then may
be we can solve it more quickly. :)

++++++
Ning Xie

2015-05-19 22:52 GMT+08:00 Gabès Jean [email protected]:

Ok, so the problem is not in the regenerator or such things I think, nor
in
the broker part.

If scheduler are impacted, it means that it's upstream the broker, and
this
last one is an impact.

Can you try without retention module? Are the memory grow immediate after
the reload, or grow slooooowly during time? (first one is retention or
old
conf managment, second one is about queues)

thanks,

On Tue, May 19, 2015 at 1:53 PM, Ning Xie [email protected]
wrote:

Yes, scheduler seems to be in memory leak too. with the initial memory
usage being 1.7GB, after some configuration reload the memory usage
reaches
7.2G.

However, regenerator leaks in WebUI seems has nothing to do with memory
leak in scheduler except configuration broks sent from scheduler.

Please help me get out of this. I have worked on this for more than a
week.

++++++
Ning Xie

2015-05-19 19:07 GMT+08:00 Gabès Jean [email protected]:

Is the schedulers memory raising the same or not? (like with comments
or
downtime objects for example).

On Tue, May 19, 2015 at 11:50 AM, Ning Xie <[email protected]

wrote:

I am crazy about the memory linear up. With the patch i give in the
pull
request, i think i have resolved the memory leak problem. However,
we
run
the modified version with this pull request after four days and the
memory
actually is still rising. That is very disappointed.

—
Reply to this email directly or view it on GitHub
<
#1623 (comment)
.

—
Reply to this email directly or view it on GitHub
<
#1623 (comment)
.

—
Reply to this email directly or view it on GitHub
<#1623 (comment)
.

—
Reply to this email directly or view it on GitHub
#1623 (comment).

—
Reply to this email directly or view it on GitHub
#1623 (comment).

gst · 2015-12-11T15:51:09Z

I may have some new results about this (about memory leak more exactly) ..

andyxning · 2015-12-14T02:52:06Z

@gst, It has been a long time ago, i have try to figure it out with more than two weeks. However, the PR now may not work now and i can not remember exactly that whether i worked it out at that time. So, it is difficult for me to recall for this one and i am out of this for more than fourth months.

fix shinken-solutions#1597

577fec0

ning.xie added 2 commits May 11, 2015 17:37

fix keyError bug

ea891cc

Enh comments

9b740ed

fix contacts and notifways reference

acff386

gst added a commit to shinken-monitoring/mod-livestatus that referenced this pull request May 12, 2015

Fix "fmt % thing" : fmt missing the %s

121b81f

With this the Livestatus was simply not usable at all within a scheduler :| Seen while investigating on shinken-solutions/shinken#1623

gst mentioned this pull request May 12, 2015

Fix "fmt % thing" : fmt missing the %s shinken-monitoring/mod-livestatus#64

Merged

remove setattr

21fe941

ning.xie added 3 commits May 15, 2015 15:11

refactor reference count for contact and nw

7a4400b

fix items with name_property

ae145fb

fix pep8

20fdf84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #1597 #1623

fix #1597 #1623

andyxning commented May 11, 2015

naparuba commented May 11, 2015

gst commented May 11, 2015

andyxning commented May 12, 2015

gst commented May 12, 2015

gst commented May 12, 2015

mohierf commented May 12, 2015

andyxning commented May 13, 2015

gst commented May 13, 2015

andyxning commented May 14, 2015

andyxning commented May 15, 2015

andyxning commented May 19, 2015

naparuba commented May 19, 2015

andyxning commented May 19, 2015

naparuba commented May 19, 2015

andyxning commented May 20, 2015

naparuba commented May 20, 2015

gst commented Dec 11, 2015

andyxning commented Dec 14, 2015

fix #1597 #1623

Are you sure you want to change the base?

fix #1597 #1623

Conversation

andyxning commented May 11, 2015

naparuba commented May 11, 2015

gst commented May 11, 2015

andyxning commented May 12, 2015

gst commented May 12, 2015

gst commented May 12, 2015

mohierf commented May 12, 2015

andyxning commented May 13, 2015

gst commented May 13, 2015

andyxning commented May 14, 2015

andyxning commented May 15, 2015

andyxning commented May 19, 2015

naparuba commented May 19, 2015

andyxning commented May 19, 2015

naparuba commented May 19, 2015

andyxning commented May 20, 2015

naparuba commented May 20, 2015

gst commented Dec 11, 2015

andyxning commented Dec 14, 2015