Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gvfs-helper: add gvfs.fallback and unit tests #665

Merged
merged 5 commits into from
Jul 1, 2024

Conversation

jeffhostetler
Copy link

By default, GVFS Protocol-enabled Scalar clones will fall back to the origin server if there is a network issue with the cache servers. However (and especially for the prefetch endpoint) this may be a very expensive operation for the origin server, leading to the user being throttled. This shows up later in cases such as 'git push' or other web operations.

To avoid this, create a new config option, 'gvfs.fallback', which defaults to true. When set to 'false', pass '--no-fallback' from the gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the future. In case this becomes a more widespread problem, engineering systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users, but at least that will help diagnose the problem when it occurs instead of later when the throttling shows up and the server load has already passed, damage done.

This change only applies to interactions with Azure DevOps and the
GVFS Protocol.


  • This change only applies to interactions with Azure DevOps and the
    GVFS Protocol.

@jeffhostetler jeffhostetler self-assigned this Jun 28, 2024
@jeffhostetler
Copy link
Author

@derrickstolee I took your #664 and added new unit tests. Hopefully, it all makes sense.

I debated putting the config lookup in gvfs-helper.exe, rather than in the client code, but decided that was more risk than I wanted (not that it was hard, but that it would change the historical default behavior, since gvfs-helper.exe does not have fallback turned on and only does it when requested, so it felt kinda backwards to move the decision at this point.)

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

stop_gvfs_protocol_server &&

grep -q "error: get: (http:503)" OUT.stderr &&
verify_connection_count 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever, the connection count is 6 with the fall-back, and 3 without it. I had to puzzle about the difference for a bit because the test cases are so nearly identical that I suspect that it might make for a fine addition to the commit message to explain this fine point?

@dscho
Copy link
Member

dscho commented Jul 1, 2024

@jeffhostetler thank you so much for taking the time to add the tests. This will be an invaluable source of documentation for future changes to the gvfs-helper. ❤️

jeffhostetler and others added 4 commits July 1, 2024 09:41
Construct 2 new unit tests to explicitly verify the use of
`--fallback` and `--no-fallback` arguments to `gvfs-helper`.

When a cache-server is enabled, `gvfs-helper` will try to fetch
objects from it rather than the origin server.  If the cache-server
fails (and all cache-server retry attempts have been exhausted),
`gvfs-helper` can optionally "fallback" and try to fetch the objects
from the origin server.  (The retry logic is also applied to the
origin server, if the origin server fails on the first request.)

Add new unit tests to verify that `gvfs-helper` respects both the
`--max-retries` and `--[no-]fallback` arguments.

We use the "http_503" mayhem feature of the `test_gvfs_protocol`
server to force a 503 response on all requests to the cache-server and
the origin server end-points.  We can then count the number of connection
requests that `gvfs-helper` makes to the server and confirm both the
per-server retries and whether fallback was attempted.

Signed-off-by: Jeff Hostetler <[email protected]>
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

Signed-off-by: Derrick Stolee <[email protected]>
Create new `cache_http_503` mayhem method where only the cache server
sends a 503.  The normal `http_503` directs both cache and origin
server to send 503s.  This will be used to help test fallback.

Signed-off-by: Jeff Hostetler <[email protected]>
@jeffhostetler jeffhostetler force-pushed the jh/gvfs-helper-fallback-config branch from 2466cad to 8668b8e Compare July 1, 2024 14:04
@jeffhostetler
Copy link
Author

I just added enhanced comments in-line in the test script and in the commit messages.

@dscho
Copy link
Member

dscho commented Jul 1, 2024

I just added enhanced comments in-line in the test script and in the commit messages.

Thank you!

Please ignore the test failure, this is my mistake.

@dscho
Copy link
Member

dscho commented Jul 1, 2024

Please ignore the test failure, this is my mistake.

@jeffhostetler actually, if I could get your review of #666, that would be nice...

Copy link
Collaborator

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing these tests, @jeffhostetler! I have a lot more confidence in the change, now.

…er-fallback-config

Let's include #666 to let the PR
builds pass.

Signed-off-by: Johannes Schindelin <[email protected]>
Copy link
Collaborator

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving once more!

@dscho dscho merged commit 648c5a2 into vfs-2.45.2 Jul 1, 2024
91 checks passed
@dscho dscho deleted the jh/gvfs-helper-fallback-config branch July 1, 2024 20:01
dscho added a commit that referenced this pull request Jul 17, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
dscho added a commit that referenced this pull request Jul 17, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
dscho added a commit that referenced this pull request Jul 17, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
dscho added a commit that referenced this pull request Jul 18, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
mjcheetham pushed a commit that referenced this pull request Jul 23, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
dscho added a commit that referenced this pull request Jul 25, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
mjcheetham pushed a commit that referenced this pull request Jul 29, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
dscho added a commit that referenced this pull request Sep 18, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
dscho added a commit that referenced this pull request Sep 24, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
dscho added a commit that referenced this pull request Oct 8, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
mjcheetham pushed a commit that referenced this pull request Dec 3, 2024
By default, GVFS Protocol-enabled Scalar clones will fall back to the
origin server if there is a network issue with the cache servers.
However (and especially for the prefetch endpoint) this may be a very
expensive operation for the origin server, leading to the user being
throttled. This shows up later in cases such as 'git push' or other web
operations.

To avoid this, create a new config option, 'gvfs.fallback', which
defaults to true. When set to 'false', pass '--no-fallback' from the
gvfs-helper client to the child gvfs-helper server process.

This will allow users who have hit this problem to avoid it in the
future. In case this becomes a more widespread problem, engineering
systems can enable the config option more broadly.

Enabling the config will of course lead to immediate failures for users,
but at least that will help diagnose the problem when it occurs instead
of later when the throttling shows up and the server load has already
passed, damage done.

 This change only applies to interactions with Azure DevOps and the
GVFS Protocol.

---

* [x] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants