-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unconditionally query crates.io HTTP index for publish status #693
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for exploring thos!
src/steps/index.rs
Outdated
let index = tame_index::index::ComboIndexCache::new(tame_index::IndexLocation::new( | ||
tame_index::IndexUrl::crates_io(None, None, None)?, | ||
))?; | ||
|
||
let index = match index { | ||
tame_index::index::ComboIndexCache::Git(gi) => { | ||
tame_index::index::RemoteGitIndex::new(gi)?.into() | ||
} | ||
tame_index::index::ComboIndexCache::Sparse(si) => { | ||
tame_index::index::RemoteSparseIndex::new( | ||
si, | ||
tame_index::external::reqwest::blocking::Client::builder() | ||
.http2_prior_knowledge() | ||
.build() | ||
.map_err(tame_index::Error::from)?, | ||
) | ||
.into() | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too thrilled with the amount of ceremony from this API...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understandable, but this API was a response to crates-index
having a much too rigid API that made testing and integration into other crates much more difficult, and some scenarios impossible without replicating pieces of logic that were not public in crates-index
.
That being said it's trivial to to add helper functions to tame-index to open the crates.io index with zero knobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of libraries have a builder for complex operations and then have a convenience new
function on the built type. I would assume something like that would exist here.
src/ops/cargo.rs
Outdated
.flat_map(|c| c.versions().iter()) | ||
.any(|v| v.version() == version) | ||
pub fn is_published(index: &tame_index::index::ComboIndex, name: &str, version: &str) -> bool { | ||
match index.krate(name.try_into().expect("crate name is invalid"), true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this call always do a network operation for sparse index?
Inside of cargo, we unconditionally cache lookups in-memory (and on-disk) and it requires an explicit call to force a new lookup. If the caller has to explicitly decide to do a krate
call and then cached_krate
, that seems onerous.
If this doesn't always do a network operation, then this is broken for sparse index as we need to ensure we get the latest published status.
(granted, I should soon just say we don' support those older cargos)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For git it does no network I/O only reading from the cache or git blob if the cache is outdated, for sparse it always does a network call, using the local cache entry to set the e-tag so that the response will only be 200 if the crate metadata has been updated.
src/ops/cargo.rs
Outdated
if let Err(e) = index.update() { | ||
log::debug!("crate index update failed with {}", e); | ||
if let tame_index::index::ComboIndex::Git(gi) = index { | ||
if let Err(e) = gi.fetch() { | ||
log::debug!("crate index update failed with {}", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the locking scheme?
Granted, I'm not correctly handling it for crates_index
today but its what has stopped me from migrating from v1 to v2 in cargo-edit
is the lack of clarify on this.
In cargo-edit
, we've had to implement a retry loop for locking. I see that tame-index advertises (b)locking support but its unclear to me which locks are involved seeing as tame-index is directly reading and manipulating internal cargo caches (not really thrilled with that btw). Is it using the same locking policy as cargo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the opening of a git index is locked via gix's mechanism, but tame-index doesn't currently support the global package lock mechanism that cargo uses. I'll open an issue to add that eventually, but I think the easier thing to do for this crate is to simply do one of the options I mentioned in the PR, and just only use the sparse index, and do zero disk operations at all, and remove the gix dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant about the idea of exclusively doing network operations to the point that I'd need a strong justification to move this PR forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that a problem? The current code is doing far more network I/O due to always performing a git fetch, changing it to do a single HTTP request for the crate being released with its etag means drastically reduced and faster checking for the publish status of the crate.
So I refactored this to still use AFAICT there are no tests in this crate around index communication, which makes sense, so I ran a quick test by just releasing one of my own crates with the changes in this PR.
Obviously you can't tell from the output but the waiting was < 2s, and most importantly, I had blown away |
crates-index
with tame-index
So I had to bump the MSRV, as tame-index requires 1.67.0 due to |
Actually I lied, bumped it to 1.67.1 to get rust-lang/rust-clippy#10265 and avoid a bunch of churn that only applies to 1.67.0. |
As pointed out in #30, gix's locking mechanism is insufficient in the face of SIGKILL or other process interruption that can't be caught by gix's signal handler, in addition to the giant footgun that tame-index doesn't actually hook that signal handler for you, meaning applications using it that forget to hook it also run into issues. In addition, I raised #17 while doing crate-ci/cargo-release#693 (https://github.com/crate-ci/cargo-release/pull/693/files/012d3e9a7be23db14096e6a0d41cea7528f9348c#r1301688472) as right now tame-index can perform mutation concurrently with cargo itself, or potentially read partial data while it is being written. This PR resolves both of these issues by forcing all R/W operations on indexes to take a `&FileLock` argument, as well as providing a `LockOptions` "builder" to create file locks. These locks are created using flock on unix and LockFileEx on windows, meaning they can properly be cleaned up by the OS in all situations, including SIGKILL and power loss etc, unlike gix's locks, and is the same mechanism that cargo uses for its global package lock, meaning downstream users can ensure they play nicely with cargo. The lock facilities are part of the public API of tame-index as I opted to roll my own implementation instead of using fslock, as it is very outdated, and doesn't support timeouts. This does mean a lot of unsafe has been added, but it is tested and not _too_ bad. This can potentially be moved out to a separate crate in the future, but is fine for now. This means it could be used to resolve rustsec/rustsec#1011, and is something I will use in cargo-deny for the same thing, protecting access to the advisory database during mutation. It should also be noted that one can also just construct a `FileLock::unlocked()` to satisfy the API, without actually performing any locking, for cases where it's not needed/testing/etc. Resolves: #17 Resolves: #30
Finally giving in to going with this approach. I made some tweaks in #719. |
In doing the upgrades (#722), it drew attention that this wasn't correct and disk IO is done, even in 0.5.1 which this PR is based on. |
No, it is only done if you don't set the etag, which this PR always did. |
This replaces the
crates-index
withtame-index
, which adds built-in support for sparse registries, resolving a couple of issues, as well as usinggix
instead ofgit2
which has improved functionality for authentication, resolving another issue (or at least, I believe it should).This unfortunately drastically increases the number of dependencies for this crate, due to gix and reqwest being pulled in, but there are a couple of options to mitigate this.1. Replace the usage ofgit2
withgix
in a follow up PR, which would remove some crates completely and not have 2 implementations of git. There is at least one missing feature in gix that I am aware of that this crate uses (dirty checking of the repository), but since this crate already uses the git cli for some operations like fetch anyways, that would be easy enough to replicate with the cli until gix gets that functionality in the future.2. Just drop support for git registry indices altogether. This may not be possible if using a custom registry that only supports the git sparse protocol, but some alternative registry sources such as cloudsmith already support the sparse protocol, so maybe not that big of an issue. Only using the crates.io sparse registry, even if the user uses an old cargo that doesn't support it, is irrelevant for this crate's usage as it is only checking for the existence of the crate, or the published version.Resolves: #218
Resolves: #679
Resolves: #692