Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedora 33 exec and java memory usage of allocation is always 0 bytes #10190

Closed
antongocode opened this issue Mar 17, 2021 · 5 comments · Fixed by #10286
Closed

Fedora 33 exec and java memory usage of allocation is always 0 bytes #10190

antongocode opened this issue Mar 17, 2021 · 5 comments · Fixed by #10286
Assignees
Labels

Comments

@antongocode
Copy link

Nomad version

Version 1.0.4

Operating system and Environment details

Fedora 33
Kernel version 5.10.20-200.fc33.x86_64

awk '{print $1 " " $4}' /proc/cgroups
#subsys_name enabled
cpuset 1
cpu 1
cpuacct 1
blkio 1
memory 1
devices 1
freezer 1
net_cls 1
perf_event 1
net_prio 1
hugetlb 1
pids 1

Issue

On Fedora 33 the java and exec drivers using the cgroup and chroot isolations shows 0 memory usage.

The stats object for the allocation looks as follows:

{
    "ResourceUsage": {
        "MemoryStats": {
            "RSS": 0,
            "Cache": 0,
            "Swap": 1430994944,
            "Usage": 314494976,
            "MaxUsage": 0,
            "KernelUsage": 0,
            "KernelMaxUsage": 0,
            "Measured": [
                "RSS",
                "Cache",
                "Swap",
                "Usage",
                "Max Usage",
                "Kernel Usage",
                "Kernel Max Usage"
            ]
        },
        "CpuStats": {
            "SystemMode": 32.01995405139944,
            "UserMode": 14.847059837776385,
            "TotalTicks": 1218.5424242818265,
            "ThrottledPeriods": 0,
            "ThrottledTime": 0,
            "Percent": 46.86701631853179,
            "Measured": [
                "System Mode",
                "User Mode",
                "Throttled Periods",
                "Throttled Time",
                "Percent"
            ]
        },
        "DeviceStats": []
    },
    "Tasks": {
        "server": {
            "ResourceUsage": {
                "MemoryStats": {
                    "RSS": 0,
                    "Cache": 0,
                    "Swap": 1430994944,
                    "Usage": 314494976,
                    "MaxUsage": 0,
                    "KernelUsage": 0,
                    "KernelMaxUsage": 0,
                    "Measured": [
                        "RSS",
                        "Cache",
                        "Swap",
                        "Usage",
                        "Max Usage",
                        "Kernel Usage",
                        "Kernel Max Usage"
                    ]
                },
                "CpuStats": {
                    "SystemMode": 32.01995405139944,
                    "UserMode": 14.847059837776385,
                    "TotalTicks": 1218.5424242818265,
                    "ThrottledPeriods": 0,
                    "ThrottledTime": 0,
                    "Percent": 46.86701631853179,
                    "Measured": [
                        "System Mode",
                        "User Mode",
                        "Throttled Periods",
                        "Throttled Time",
                        "Percent"
                    ]
                },
                "DeviceStats": null
            },
            "Timestamp": 1615904472393702100,
            "Pids": {
                "105077": {
                    "MemoryStats": {
                        "RSS": 321540096,
                        "Cache": 0,
                        "Swap": 0,
                        "Usage": 0,
                        "MaxUsage": 0,
                        "KernelUsage": 0,
                        "KernelMaxUsage": 0,
                        "Measured": [
                            "RSS",
                            "Swap"
                        ]
                    },
                    "CpuStats": {
                        "SystemMode": 32.001905649477614,
                        "UserMode": 15.000890137819889,
                        "TotalTicks": 0,
                        "ThrottledPeriods": 0,
                        "ThrottledTime": 0,
                        "Percent": 47.00277673603846,
                        "Measured": [
                            "System Mode",
                            "User Mode",
                            "Percent"
                        ]
                    },
                    "DeviceStats": null
                }
            }
        }
    },
    "Timestamp": 1615904472393702100
}

CPU stats seem to be unaffected and running the same job on a ubuntu 20.04 installation shows the memory usage correctly.

Reproduction steps

This can be reproduced with any simple example using the exec or java driver on a fedora 33 vagrant box.

Expected Result

Expecting the memory usage to be populated correctly. I found this previous issue 9120, but I don't think it's related.

@notnoop
Copy link
Contributor

notnoop commented Mar 17, 2021

Thanks @antongocode ! Thanks for the detailed report, we'll examine it and follow up.

Also, to make sure we are looking at the same thing. What memory fields being reported at 0 is relevant here ? In the allocation stats, I see non-zero memory usage but 0 rss and 0 max usage:

            "RSS": 0,
            "Cache": 0,
            "Swap": 1430994944,
            "Usage": 314494976,
            "MaxUsage": 0,

Also, I noticed that one PID is reporting non-zero RSS but zero usage:

                    "MemoryStats": {
                        "RSS": 321540096,
                        "Cache": 0,
                        "Swap": 0,
                        "Usage": 0,
                        "MaxUsage": 0,

We'll dig into both, but not sure which one impacts you more. Thanks!

@antongocode
Copy link
Author

I picked this up from the UI showing 0 memory usage, which I assumed comes from the ResourceUsage.Memory.
My understanding (which could be wrong) is that the ResourceUsage.Memory stat is somehow scraped from the cgroup and I think this is where the issue is.
From what I've seen the PID usage and the ResourceUsage don't match up exactly in the systems where it is reported correctly. I hadn't noticed that stat group also being inconsistent.
Let me know if there's any more information I can provide.

@notnoop
Copy link
Contributor

notnoop commented Mar 17, 2021

Got it. I'll attempt to reproduce and follow up if we need anything. Thanks for the info and quick response!

@notnoop
Copy link
Contributor

notnoop commented Mar 29, 2021

Hi @antongocode. Thanks again for reporting the bug. It found the underlying issue to be Nomad mishandling of cgroup2 - and that affects ~all drivers, including Docker! As I found the issues to be more far reaching that handling of Fedora, and may require us to rethink our handling of metrics - I opened a new issue at #10251 and will keep you posted.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants