-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow revision #45
base: master
Are you sure you want to change the base?
Allow revision #45
Conversation
This could be a half-measure for purescript#33. This does not prevent usage of master, but it arguably does behave how you'd expect: it always keeps whatever reference you're using as a package set up to date, and happens to also allow you to use a SHA in addition to a revision. I'm going to give this a spin, but if we later wanted to disallow branches, we could probably cobble together a git command that would tell us if a tree-ish or whatever they call it is a branch or not.
view actually dumps the exit status, sh does not.
Sorry for the delay, it's been a busy week. This seems like it could decrease performance for repos with many tags. Is there no simpler way to pull a single SHA hash from a remote repo? |
Sorry for the delay. That's a valid concern if I understand it correctly. I think we could avoid ever redundantly fetching if we could treat the reference as immutable, which SHAs and tags should be. So I see a few paths forward:
|
@MichaelXavier Why wouldn't
So you should be able to do something like this:
The above command only prints the branches/tags which exactly match You can even specify multiple tags/commits if you want. So all you have to do is verify that there aren't any And it also helpfully gives you the commit for the tag, so you can use that for additional robustness (e.g. if the tag changes while downloading the repo it will still checkout the correct commit). There might be an easier or faster way of checking, but I think using |
As for downloading a specific commit, it seems that's not possible. The best you can do seems to be:
So by using
|
For GitHub in particular you can efficiently download a specific commit by downloading this URL:
e.g. But that only works for GitHub, not Git repos in general. |
Oh, you can also use For reference, Nix uses the Nix does a Personally, I recommend downloading the |
Yes, I'd definitely like to use that path to pull things from GitHub, but we can deal with that separately. I like the idea of using |
@Pauan thanks for weighing in. I knew someone with a lot of deep git knowledge would have a solution. I am not sure when I'll have time to work on this again, probably wednesday but @Pauan if you have a workable solution and some free time until then, feel free to fixup my patch. Basically I'm just going to be implementing the commends you recommended anyway. |
I discovered Here is a list of "tree-ish" things that it should accept. $ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz <tree-ish>
$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz v3.1.0
$ git archive --remote=https://github.com/purescript/purescript-prelude.git --output=./dep-version.tar.gz 6d411827970302769895877511f51f7a414ecbb3 |
This is for purescript#45. We'll see what the tests say. Summary: * I use a newtype for Repo since things were starting to get complicated differentiating it from tags. Hope that's ok. * I use the ls-remote trick (you're a lifesaver, @Pauan!) to pull refs. If the ref is *proven* to be a branch, bail out with an error. If its *proven* to be a tag, we can shortcut the clone without running a checkout afterwards. Otherwise, assume its a SHA. * Restore the "exists" checks since we have decided to treat SHAs and tags as immutable. This should avoid unnecessary network calls on subsequent runs.
Conflicts: app/Main.hs
app/Main.hs
Outdated
let refs = Set.fromList (mapMaybe parseRef remoteLines) | ||
if Set.member rawAsBranch refs | ||
then do | ||
echoT (raw <> " is a branch. psc-package only supports tags and SHAs.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of disallowing this outright, I'm thinking we might want to put it behind a flag. What do you think?
We'd probably need to clone the repo into a directory named after the SHA hash of the branch or something though. Otherwise the files on disk could get out of sync with the remote branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disallowing branching was basically to allow us to not have to check every time. SHA and tag are both considered fixed commits whereas branches can change at any time. It just doesn't seem to me that the argument for a branch is worth the cost, but I could be convinced otherwise.
If we did stick to just SHA and tag then I don't think we'd have to bother with special folder naming because if you pulled it once, you're done.
app/Main.hs
Outdated
where | ||
rawAsBranch = "refs/heads/" <> raw | ||
rawAsTag = "refs/tags/" <> raw | ||
gitProc = inproc "git" ["ls-remote", "-q", "--refs", from] empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why inproc
here? Don't we use proc
elsewhere? Is there any important difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proc
seems to return an ExitCode
. inproc
returns a ByteString
which I need in order to parse the output.
app/Main.hs
Outdated
, sha | ||
] empty .||. exit (ExitFailure 1) | ||
where | ||
inGitRepo m = (sh (pushd into >> m)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: redundant parens. Also can you please indent this past the where
(or use a let
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I personally like the indented style but I've seen both in the file. I'll just go through and standardize in this file.
app/Main.hs
Outdated
] empty .||. exit (ExitFailure 1) | ||
cloneShallow (Repo from) (CloneSHA sha) into = do | ||
void $ proc "git" | ||
[ "clone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well pull this common prefix out into a function or constant since the two are so similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is done on the branch now.
Looks good, thanks for the updates. I do still have a slight concern that this will add a performance overhead, since running
Perhaps what we can do is to only look at tags and heads (with |
@paf31 I didn't realize that Using |
As per @Paun's suggestion as it returns less extraneous data this way.
@paf31 just bumping this to see if you needed anything else. |
] empty .||. exit (ExitFailure 1) | ||
-> IO () | ||
cloneShallow (Repo from) tgt into = do | ||
void $ proc "git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try this with a SHA hash, I get fatal: Remote branch <sha> not found in upstream origin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we confirm this works by now?
The code looks great, thanks. I'd also like to try using |
Another option, which allows us to avoid the costly What do you both think? Edit: to be clear, I'm suggesting we look for the prefix
|
I think it would make more sense to just use |
I also just found out about |
Sorry for the long delay. I think we should try to pick this back up. Since we're making breaking changes anyway, I think we should just use the URL approach, and aim to disallow branches completely.
Which do you think is the slow method? I think what we use right now should still be possible for tags, no? Only |
I think we're aligned. I think I can move forward on this when I have time:
|
I would be fine with that (and yes, we'd need to update the package set), but I'd also be fine with just looking for |
In response to recent comments in purescript#45. Now we don't need an expensive ls-remote call to determine if a clone target is a tag or sha. I'm still lacking a good way to test this out though.
@paf31 I threw up a quick implementation, swapping out the function which calls |
What's the status on this now? Looks like this was in a good place, but I don't think @hdgarrood 's point on the URL format was taken into account - |
This is in accordance to the comments in purescript#45
@nwolverson Just pushed an update. I think I glossed over the part about removing the |
I know there were some changes in who is working on this repo. Any chance of getting this merged? I don't know what is going on with appveyor but it looks unrelated to the changes. |
If there are no objections, we should really try to merge this soon, especially since the existing behavior leads to unpleasant surprises. |
else case T.toLower schemeName of | ||
"sha" -> Right (CloneSHA withoutScheme) | ||
"tag" -> Right (CloneTag withoutScheme) | ||
_ -> Left ("Invalid scheme. Expected sha: | tag: but got " <> schemeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should probably allow using tags by default and warn about it. Thoughts? cc @kritzcreek
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused. I could see adding a case for "" -> Right (CloneTag withoutScheme)
. But if they affirmatively provided a scheme that's some unknown, it seems like we should still error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should've read all of the code instead of just this bit :D I see now that we only hit this case if the target contained a :
in the first place.
else case T.toLower schemeName of | ||
"sha" -> Right (CloneSHA withoutScheme) | ||
"tag" -> Right (CloneTag withoutScheme) | ||
_ -> Left ("Invalid scheme. Expected sha: | tag: but got " <> schemeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should've read all of the code instead of just this bit :D I see now that we only hit this case if the target contained a :
in the first place.
] empty .||. exit (ExitFailure 1) | ||
-> IO () | ||
cloneShallow (Repo from) tgt into = do | ||
void $ proc "git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we confirm this works by now?
app/Main.hs
Outdated
_ -> Left ("Invalid scheme. Expected sha: | tag: but got " <> schemeName) | ||
where | ||
(schemeName, remainder) = T.breakOn ":" t | ||
withoutScheme = T.drop 3 remainder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want to use stripPrefix "://"
here instead? I could see some very hard to debug behaviour resulting from malformed URLs otherwise.
https://www.stackage.org/haddock/lts-10.5/text-1.2.2.2/Data-Text.html#v:stripPrefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should actually just be T.drop 1
here, since the format is now sha:27492fbb19484...
, without the slashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kritzcreek stripPrefix
won't work because there's content before the "://", we're trying to extract the scheme beforehand. If we want to be really paranoid about parsing the URI, we could pull in uri-bytestring
, but that is probably overkill.
@hdgarrood I think you're right. Probably should have console tested this since we don't really have a formal test suite for this project.
I don't know why it doesn't let me comment on the testing. I'm pretty sure I ran the commands on a terminal when I originally wrote it. Considering that @hdgarrood pointed out what appears to be a text parsing bug and the travis tests passed, I'm wondering if the test suite is insufficient as is. I feel unprepared to really thoroughly test this PR. I'm trying to figure a good point to hand this work off. |
The drop call was not updated when we went from "scheme://identifier" to "scheme:identifier" and was eating 2 characters from the identifier when a scheme was specified.
@hdgarrood Just confirmed the bug you spotted with the drop, pushed an update fixing it. |
Bumping this again. Possibly tagging @justinwoo |
Needs some conflict fixing, but otherwise this is good right? Anyone in disagreement? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just pushed a merge commit with master. The only conflict was it looks like there was a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs more testing or something around cloneShallow
, as I can't quite get everything running naively when it should be using the default tag-checkout
Talked this over with @justinwoo. I think we're on the same page as far as the idea of this PR:
However, I haven't really tracked the progress of this project over the 8 months this PR has been open and don't have a good test plan to ensure this continues to work. Sounded like @justinwoo was willing to take this change across the finish line so I'll leave it to him from here. |
* Add lens and mathbox * Add freeap * Add flare * Add aui
Would love to see this land sometime soon. Thanks for all the work so far. |
So I did some more digging into this issue and what can be done to fetch repos, but it seems that there's actually no real way to fetch repositories by a commit SHA reliably, being that this flag must be set on the server: git/git@68ee628
I would personally like to close this and document these findings somewhere clearly so that this feature doesn't get suggested over and over again. |
Sorry, I did not remember that this solution works by shallow cloning, so it's already "expensive". |
I didn’t realize this but apparently NPM only supports commit hashes for GitHub too. I think supporting this only for GitHub would be better than not at all. |
Should we close this PR? I haven't read through this PR's content and what it's trying to change. However, |
This is to further the discussion we were having in #33
I merged in upstream master so this should merge cleanly. You were asking if its possible for
git ls-remote
could be used to clean this up. I think that's for phase 2 of what I described in the original issue. The way I see it, this gets psc-package operating as-documented. Right now it says you can specify a tag or SHA. In reality, it lets you specify a branch or tag but not a SHA. It seems like supporting a branch may be a misfeature because it is not a fixed single commit but rather a moving target, yet psc-package basically assumes its a fixed target leading to some confusion.The correct final solution here is to forbid branches, which involves some sort of git incantation to tell us what type a tree-ish is and giving a nice error message if its a branch explaining that psc-package targets must be immutable. I also suspect that once we do that, we can bring back the "exists" checks to do fewer downloads. Its totally ok if you don't want to do this half-measure, it was just the simplest solution I knew how to implement right now and solved a problem I was having.