-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There's still a need to bump the memcache size #1107
Comments
There is no evidence at all in the graphs that this in fact an issue. I definitely see the issue that you are referring to but I am unable it as all evidence says it shouldn't be down to memcache. |
Could these sessions disconnections be caused by server restarts or does the server never restart? |
I don't see any server restart in the stats, at least for the last 6 months: https://prometheus.openstreetmap.org/d/l4zgNUdMz/memcached?orgId=1&refresh=1m&from=now-6M&to=now Also, the OP didn't provide any details how frequently they have to log in again. There might be external factors, like cookies being removed by the browser or some browser extension, etc. |
I thought everybody else also has to login again at least once every three or four days. |
Ah, the link wasn't that helpful. There are about 11 memcached instances overall. However, for the 3 frontend servers, only 3 memcached instances (spike-06 ... spike-08) are relevant. Items in cache and memory usage are fairly stable for these three. I think this should match the following config in chef: https://github.com/openstreetmap/chef/blob/45dc24b65b23a6c1dcc2f0ba2aa971563555c35e/roles/web.rb#L20 |
A restart would indeed lose all sessions but as @mmd-osm says it's only those three machines that we're talking about here and they last restarted in November last year: At that time it took nearly two months for the caches to fill up which suggests that it should take about that long for things to get expired unless there has been a significant increase in the cache usage since. |
The eviction rate has increased since November but it hasn't consisntently bee more than double. commands/second has remaind the same |
I logged back in 5 days ago: 1 day later my session was still active but today I'm logged out. I suggest to store the sessions in the DB and use memcache only to speed up sessions check for frequently used sessions. |
One of the machines was rebooted yesterday while fighting the DDOS so 1/3 of the the cache entries were lost. |
I'm wondering how many of these entries originate from CGImap (key prefix would be "cgimap:"). For some reason, these entries have the expiration value set to 0 (unlimited). This doesn't make a whole lot of sense for rate limiting requests, where the exact timestamp would be known upfront at which time these entries become irrelevant. |
At least when testing locally, I've noticed that every anonymous user creates a rails session without expiry (that's the "0" in "1 0 73" below), whereas logged in users have an entry with 4-5 weeks expiration. Anonymous user sessions:
Logged in user: Expires at 1723288155 = Sat Aug 10 13:09:15 CEST 2024
|
Expiry shouldn't really matter that much because anything that isn't used just moves down the LRU list and gets discarded eventually when we need space for a new entry. Logged in sessions (with "remember me" checked) do get an expiry of 28 days which matches the cookie expiry while other sessions (not logged in and logged in without "remember me" checked) actually don't have an expiry but issue a session cookie that expires when the browser is closed. |
First of all, I find it a bit difficult to reason about the logged in sessions based on Prometheus stats, in particular after how many days these entries would be discarded. memcached has an LRU crawler which reclaims expired entries even before they're reaching the end of the LRU list. With a non zero TTL, we might get rid of many "non-logged in user" entries early on, before they might evict "logged in user" entries. |
At the current growth rate, we will likely see some evictions in about 10 days (=21 days after last memcached restart). @jidanni : did you notice any issues with lost login sessions in the last 8-9 days? If so, it can’t be memcached related… |
It's not that simple because only one machine was reset I think? So only keys which hash to that machine are currently exempt from being evicted. |
I think spike-06..08 were all restarted, the aggregated cached items count on Prometheus shows 0 entries about 10 days ago. |
@mmd-osm rather than using my misty memory, |
We want to hear from you first hand, as you’ve also raised the issue. Misty memory is ok. If you say it hasn’t bothered you recently then that’s good enough for now. What we see in the charts right now is that no entries are being removed. So chances are that your session is still around. |
Okay. I will remember next time to report each and every incident right here to the thread. |
Okay. Just had to log in again as you can see in your logs perhaps. |
Thank you for the feedback. This is not completely unexpected. Evicting entries started again on August 1st, even a bit sooner than estimated. |
On a laptop I hadn't used in five days: |
At least 8 other users have reported the same issue in https://community.openstreetmap.org/t/osm-webseite-standiges-login-notig/120072 All different browsers, not only Firefox. I could also reproduce it today on my mobile. spike-0[6-8] are seeing some cache evictions since a few days again: Following up on my previous comment to get rid of anonymous sessions as early as possible, we could check how the Gitlab repo addressed the issue. They're having similar issues with Redis and unauthenticated users filling up the memory. Redis and Memcached implementations should be fairly similar, ['rack.session.options'][:expire_after] is also used by the memcached client. Initially, Gitlab added a special helper for this purpose: https://gitlab.com/gitlab-org/gitlab/-/blob/ee088fc0d53198016e245c515f28e03d8229e297/app/controllers/application_controller.rb#L29 and some PRs on the topic: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/88514/diffs Lately they seem to have moved it to an own rack middleware to cover more scenarios: https://gitlab.com/gitlab-org/gitlab/-/commit/8c85364205ccb1f4602ab3543d10ff55295bd6cc This might be worthwhile checking out. |
I've adjusted the Gitlab code a bit to work with the osm website: https://github.com/mmd-osm/openstreetmap-website/tree/patch/sessionexpiry It's more of a proof of concept at this time, to demo the idea. I can create a PR to continue the discussion, if needed. For testing, I recommend to check results of "memcached-tool localhost:11211 dump" after each activity, in particular the TTL value. That's second last value in each line starting with "add rails:session:2:..." (format: unix epoch). /fyi: @AntonKhorev Meanwhile, memcached has also been restarted or purged, so we're down to 0 evictions for the next few weeks. |
Today had to login again. |
Hello. In openstreetmap/openstreetmap-website#2457 I was told to open an issue here. But as it is getting a little over my head, I will just leave this here.
The text was updated successfully, but these errors were encountered: