Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with /metrics endpoint #28

Open
xsb opened this issue Jul 25, 2019 · 7 comments
Open

Performance issue with /metrics endpoint #28

xsb opened this issue Jul 25, 2019 · 7 comments

Comments

@xsb
Copy link
Contributor

xsb commented Jul 25, 2019

I am trying to use lnd+lndmon on a rock64 board (similar to rpi, with arm64 and 4GB RAM) but Grafana only shows data points coming directly from lnd (Go Runtime + Performance dashboard). Everything supposed to come from lndmon is not there.

I noticed that when running simple queryes with PromQL I immediately got the error: "the queries returned no data for a table". Then went to Explore section and checked for up, there I can see how the lndmon process is reported to be down, which is not true.

After that I tried to get the metrics directly and I realized I was getting slow response times on the metrics endpoint (between 10s and 12s usually):

$ time curl -s --output /dev/null localhost:9092/metrics

real	0m10.717s
user	0m0.022s
sys	0m0.015s

I haven't investigated this deeply yet but the instance has more than enough Ram, and the CPU usage and load average don't look that bad.

Will try to spend more time in another moment but wanted to report soon just in case it's happening to more people.

@valentinewallace
Copy link
Contributor

Hm, admittedly lndmon has not been tested on rpi-type hardware.

@Roasbeef
Copy link
Member

Roasbeef commented Aug 7, 2019

This is you attempting to hit the /metrics endpoint on lnd?

@xsb
Copy link
Contributor Author

xsb commented Aug 7, 2019

@Roasbeef lnd uses port :8989 for the metrics. I forgot to mention that that part works fine, I get the output in just a few milliseconds.

Honestly I haven't spent much time trying to debug this, but neither Prometheus nor myself (from the cli) can hit the metrics endpoint on lndmon (port :9092) fast enough.

@xsb
Copy link
Contributor Author

xsb commented Aug 8, 2019

After some time debugging I found out that what is taking so long is the GraphCollector's DescribeGraph request against lnd. The frequency seems to be too high for that call.

@xsb
Copy link
Contributor Author

xsb commented Aug 8, 2019

GraphCollector is taking more than 30% of the cpu time (understandable, this is the biggest dataset being ingested). pprof is not taking i/o into account so reality is much worse than what is shown in the flamegraph. The main issue then seems to be that lnd is taking a few seconds to serve the whole graph. Would it be possible to make this call less often?

Screenshot 2019-08-08 at 13 29 22

@xsb
Copy link
Contributor Author

xsb commented Aug 8, 2019

I changed my Prometheus config (slower interval + higher timeout) and I am running lndmon on mainnet without issues now 😄.

diff --git a/prometheus.yml b/prometheus.yml
index 01797c0..81d781c 100755
--- a/prometheus.yml
+++ b/prometheus.yml
@@ -1,6 +1,7 @@
 scrape_configs:
 - job_name: "lndmon"
-  scrape_interval: "20s"
+  scrape_interval: "30s"
+  scrape_timeout: "15s"
   static_configs:
   - targets: ['lndmon:9092']
 - job_name: "lnd"

I am not saying this should be merged because it's totally arbitrary. A bigger network and/or a slower hardware device would require even more conservative defaults.

@menzels
Copy link

menzels commented Aug 13, 2019

thanks for the reasearch @xsb i had the same problem. for me the scrape time was 30-50 seconds.
i am using a rpi3 for lnd, connected to lndmon running in the cloud. uplink bandwidth is about 2-3Mb/s. i guess the slowdown is a combination of cpu load and bandwidth limit.
i set the scrape interval and timeout to 60s. like this it seems to be working for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants