Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong indirect reverse dependencies for arithmoi #1234

Open
Bodigrim opened this issue Jul 29, 2023 · 20 comments
Open

Wrong indirect reverse dependencies for arithmoi #1234

Bodigrim opened this issue Jul 29, 2023 · 20 comments

Comments

@Bodigrim
Copy link
Contributor

https://hackage.haskell.org/package/arithmoi-0.13.0.0 says "Reverse Dependencies | 22 direct, 7356 indirect". This does not look right, I'm pretty sure that a transitive closure is well below one hundred packages.

@gbaz
Copy link
Contributor

gbaz commented Jul 29, 2023

Good eye. This has to do with how revdeps are calculated on hackage, which we could tune. scientific is listed as a direct revdep, because it depended on arithmoi at circa version 3.0.0 for a while, but it no longer does. So the revdeps of scientific in turn get adduced to the transitive closure, etc.

Arguably we should have some constraint on which version of a package constitutes a "revdep" -- only the most current, perhaps? cc: @ysangkok

@ysangkok
Copy link
Member

Note that we already do have a more restrictive listing on https://hackage.haskell.org/package/arithmoi/reverse , where it explicitly filters the revdeps for whether they accept the latest version.

@gbaz
Copy link
Contributor

gbaz commented Jul 30, 2023

How did we decide which page would show which, and also how do we provide a link to that second page? I'm not sure if I recall the discussions over the tradeoffs here...

@ysangkok
Copy link
Member

I don't think we had a discussion on this. If I remember correctly, the deps that don't accept a specific version are filtered out after getting the closure that is "too large".

The reason that I chose to keep the package package unfiltered is probably because I thought that it would be better to minimize the processing done on the package page. I am not sure whether this choice makes sense, since I haven't actually benchmarked it. I wasn't focusing too much on the UI, since my main goal was to get the rev dep email notifications ready.

As this issue report shows, it is confusing to people when the closure includes old rev deps. The reason it does this might be because a decision was made (before I got involved) to never delete the rev-dep edges. I never really questioned this.

We should think about a new UI for this. I think some decent primitives are already in place. So we should be able to iterate on the UI without too much risk.

I was wondering whether the "drill-down"[0] mode should really be the primary way of interaction, since I personally haven't used this much. Something that would be interesting to see, which we aren't exposing now, are the actual ranges used by the rev deps. That could easily fit in the table as it currently is. But the question again arises of how to pick a version to pull the range from.

There are many edge cases, due to dependencies getting set conditionally and such.

[0]: finding rev-deps, following a link, getting new rev-deps and repeating

@andreasabel
Copy link
Member

Anecdotally, I found the numbers given for reverse dependencies so wrong that they are at least useless spam if not harmfully misleading. For instance I was interested in the importance of https://hackage.haskell.org/package/unix-compat which is presented as:

Reverse Dependencies
81 direct, 4219 indirect [details]

Oh wow, I thought, this package is super important, the Haskell ecosystem will collapse impromptu if this package ever gets outdated. 4219 dependencies, this is more than what is captured by stackage (3000 packages).
Clicking on "details" I get a long popup with tons of trash that has accumulated on hackage, like https://hackage.haskell.org/package/bugzilla (requiring base < 4.7!).

So, yes, this isn't useful information. How can we improve it?

  1. First, if you publish data, you have to lay open how you came about this data. So, when I click on "details", I want to read there how these numbers "81 direct, 4219 indirect" were computed. Because it is far from obvious what should count as a dependency. Maybe atm it is something like "we ignore all version bounds, assume any condition and any flag setting always to be true (even if logically inconsistent) and treat A as a dependency of B if any version of B mentioned any version of A in any branch of a conditional or flag selection in any cabal file of B published on hackage". This truthful statement, how embarrasing it might be, gives the user an explanation and justification for the data they see.

  2. We should strive for information to be relevant. First, I am not at all interested in all the garbage we have on hackage, and what dependency it has on other garbage. Unfortunately, with the death of the hackage matrix we lost a tool that could help us sort away the garbage. But we could at least let the packages that claim themselves that they are garbage be garbage. E.g. bugzilla claims it is garbage by requiring base < 4.7 so we should not toss it to the user.
    In some sense, stackage has some advantage in presenting data such as reverse dependencies because they can start from a curated set of non-garbage packages.

  3. Maybe to shrink the rev deps to a sane amount could be to only consider the latest version of a package. This would fit the main philosophy that only one version (the latest) of a package is of most interest and eventually all packages that depend on it will strive to depend on its latest version.

P.S.: "drill-down" this is just good old depth-first graph search, is it?

@phadej
Copy link
Contributor

phadej commented Jul 31, 2023

claims it is garbage by requiring base < 4.7

That is somewhat opinionated take. The <6 packages can be considered as garbage as well. In fact I'd say that bugzilla is less garbage, as at least it's honest about not being updated. I.e. just by looking at bounds it's hard to say anything, and you'd punish packages with correct metadata, which are just not updated.

IMHO, packdeps has useful info, e.g. https://packdeps.haskellers.com/reverse/unix-compat or https://packdeps.haskellers.com/reverse/arithmoi though it includes any usages IIRC (and doesn't list which components use stuff), not only in library components.

@ysangkok
Copy link
Member

ysangkok commented Aug 4, 2023

I think that a first good step might be to

  • Make the info pane consistently show the same info as the /reverse page. Hopefully this won't be too much load, I will have to benchmark this.
  • Include the range of the dependent package like packdeps.haskellers.com.

I won't work on this for a while though. When I get more free time in two weeks, I will start working more on Hackage, starting with the vouching feature.

When I start on this feature, I plan on making mockups first.

@juhp
Copy link
Contributor

juhp commented Jan 9, 2024

I have to agree with earlier comments: the current results are often really bad.

I thought that acme-everything was also part of the problem? Or was that a wrong hunch?

@ysangkok
Copy link
Member

ysangkok commented Jan 9, 2024

The main problem is just that all versions are mushed together, so if a library at one point had a dependency, it keeps having it forever.

@juhp
Copy link
Contributor

juhp commented Jan 10, 2024

I find it hard to believe that even all package versions can account for the vast number of indirect dependents of many packages.

But if you say so I have to believe it I guess, but some seem impossible to me.
Or maybe I didn't understand what "indirect" means.

@ysangkok
Copy link
Member

@juhp Please show me the concrete example, then we can determine whether it is the same issue, or whether there is a separate issue. It sounds like you're not referring to the arithmoi example.

@juhp
Copy link
Contributor

juhp commented May 4, 2024

https://hackage.haskell.org/package/http2 [1]

It is exactly the same problem as arithmoi I believe...

I do feel that acme-everything (which has 7533 dependencies) should be filtered out of reverse dependency results anyway.

If one looks on Stackage: only 2 dependents are listed: https://www.stackage.org/nightly-2024-05-04/package/http2-5.0.1/revdeps - just for comparison

[1] I chose http2 because it is apparently the most frequently downloaded package currently, which I suppose it is a counterpoint or actually makes it a potentially bad example.

@juhp
Copy link
Contributor

juhp commented May 4, 2024

But I guess it is not that simple or still a mystery to me anyway, I think we would need to look at specific examples to understand what is going on.

I took the example of Biobase which I cannot easily see how it is related to http2.
Perhaps there are better examples: I guess one needs to search recursively to check carefully.
Nevertheless the results still seem a bit suspicious/surprising to me.

Anyway even just restricting to revdeps for latest version of a package or using reverse/ as Janus suggests would be a vast improvement: basically "anything" would be better than the status quo (it is also a bit annoying that it is not possible to link directly to the reverse deps pop-up).

(For bonus points it could close with Escape - nice to dream 😃)

@gbaz
Copy link
Contributor

gbaz commented May 4, 2024

Biobase -> PrimitiveArray -> DPUtils -> streaming-bytestring -> http-client

note that streaming-bytestring only depended on http-client in the 1.0 version.

@juhp
Copy link
Contributor

juhp commented Oct 31, 2024

Another example: https://hackage.haskell.org/package/gi-gtk (36 direct, 3592 indirect) 🔥

I guess one should write a tool at this point to understand the results.

@gbaz
Copy link
Contributor

gbaz commented Oct 31, 2024

[Lots of stuff using the crypto ecosystem] -> [A Huge Crypto Ecosystem] -> crypto-api -> entropy -> jsaddle -> gi-webkit -> gi-gtk

I agree its not so intuitive or easy to browse, but one can follow the links in the detail view to get there...

In this case, the difference between the results on hackage and stackage is that entropy will use jsaddle only if the compiler is ghcjs. The hackage implementation follows dependencies even when guarded by flags, while it appears the stackage implementation does not.

@andreasabel
Copy link
Member

The hackage implementation follows dependencies even when guarded by flags,

And it does so transitively, I suppose.
And I guess taking all possible releases of a package into account even if they have failed to build for years already.

Since there are so many false positives, I consider the hackage dependency feature not helpful atm.
Unfortunately packdeps is abandoned now, so I fell back follow reverse dependencies on stackage.

Reverse dependencies within a single snapshot seems to me the more meaningful concept.

The (minimum) quality assurance provided by Stackage is the reason why I slowly learned to embrace stack (even though the common wisdom nowadays seems to be that stack is obsolete with v2-cabal gaining maturity).

@gbaz
Copy link
Contributor

gbaz commented Nov 2, 2024

There is a large design space and a PR is welcome to tune what is displayed by default (as we calculate enough information to only consider most recent versions as well, for example).

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Nov 2, 2024

Unfortunately packdeps is abandoned now, so I fell back follow reverse dependencies on stackage.

@andreasabel you might be interested to try https://github.com/Bodigrim/hackage-revdeps, it's roughly equivalent to what packdeps was doing.

@juhp
Copy link
Contributor

juhp commented Nov 3, 2024

I basically also use stackage these days for this. Flora also has sane reverse deps info.

I actually had a look at the Hackage code some days ago, trying to figure out how the implementation could be switched out, but I found it pretty hard to find what I was looking for. A simple useful workaround could be just to link to the other reverse dependencies page instead, but I wasn't able to find the relevant code for it.

I still reckon acme-everything should be excluded too, though it probably wouldn't help as much as I thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants