Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: "Port-Impact" #581

Open
tcberner opened this issue Jul 30, 2024 · 16 comments
Open

Feature: "Port-Impact" #581

tcberner opened this issue Jul 30, 2024 · 16 comments

Comments

@tcberner
Copy link

Moin moin

portmgr had the idea some time ago to try to measure how much impact a given port has.

This could be used to gauge whether a port should be maintained by a group (bus-factor), or give some idea of when to do exp-runs.

A first stab at this could be to count the reverse-dependencies of given port. That would for give a higher importance to devel/cmake than to say www/firefox. However, as is quite obvious this example shows that this metric is not enough. As it won't give any importance to leaf-ports like www/firefox.

A suggestion by dvl was to also consider the "watchers" of a given port on freshports -- which could help give some weight to important leaf-ports.

mfg Tobias

@dlangille
Copy link
Contributor

Let's take gmake as the first example:

freshports.dvl=# select id, name, category, element_pathname(element_id) from ports_active where name = 'gmake' and category = 'devel';
 id  | name  | category |    element_pathname     
-----+-------+----------+-------------------------
 239 | gmake | devel    | /ports/head/devel/gmake

OK, that's the right port, let's get totals:

freshports.dvl=# select dependency_type, count(*) from port_dependencies where port_id_dependent_upon = (select id from ports_active where name = 'gmake' and category = 'devel')  group by dependency_type order by dependency_type;
 dependency_type | count 
-----------------+-------
 B               |  8286
 P               |     4
 R               |    50
 T               |    15
(4 rows)

@dlangille
Copy link
Contributor

Based on the above, gmake is:

  • vital to building 82876 ports
  • is used patch 4 ports
  • necessary at run time for 50 ports
  • used to test 15 ports

Over all, it is used by 8355 ports

freshports.dvl=# select count(*) from port_dependencies where port_id_dependent_upon = (select id from ports_active where name = 'gmake' and category = 'devel');
 count 
-------
  8355
(1 row)

@dlangille
Copy link
Contributor

The top 20 ports:

freshports.dvl=# select getport(port_id_dependent_upon), count(*) from port_dependencies group by port_id_dependent_upon order by count(*) desc limit 20;
              getport              | count 
-----------------------------------+-------
 /ports/head/lang/python39         | 13758
 /ports/head/devel/ruby-gems       | 11363
 /ports/head/lang/perl5.32         | 11223
 /ports/head/devel/py-setuptools   | 10132
 /ports/head/devel/gmake           |  8355
 /ports/head/devel/pkgconf         |  8086
 /ports/head/devel/gettext-runtime |  6594
 /ports/head/lang/ruby30           |  5676
 /ports/head/x11/libX11            |  5515
 /ports/head/lang/ruby32           |  5179
 /ports/head/devel/ninja           |  4358
 /ports/head/lang/python27         |  4267
 /ports/head/lang/python311        |  4214
 /ports/head/devel/glib20          |  3418
 /ports/head/devel/cmake-core      |  3289
 /ports/head/devel/gettext-tools   |  3242
 /ports/head/devel/autoconf        |  2911
 /ports/head/lang/perl5.36         |  2642
 /ports/head/x11/libXext           |  2530
 /ports/head/x11-toolkits/pango    |  2421
(20 rows)

freshports.dvl=# 

@dlangille
Copy link
Contributor

dlangille commented Jul 30, 2024

[20:07 pg03 dvl ~] % echo 'select getport(port_id_dependent_upon), count(*) from port_dependencies group by port_id_dependent_upon order by count(*) desc' | psql freshports.dvl > popular
wc -l p%                                                                                                                                                                                                  [20:07 pg03 dvl ~] % wc -l popular 
   26837 popular

The full output is at https://gist.github.com/dlangille/9f95843f5d49d44b670497ee0a0fd81d

WARNING: 3.43M

@dlangille
Copy link
Contributor

Issues:

  • deleted ports
  • ports on branches

@dlangille
Copy link
Contributor

This output features active ports only

[20:14 pg03 dvl ~] % echo 'select getport(PD.port_id_dependent_upon), count(*) from port_dependencies PD join ports_active PA on PA.id = PD.port_id_dependent_upon  group by port_id_dependent_upon order by count(*) desc' | psql freshports.dvl > popular 
[20:14 pg03 dvl ~] % wc -l popular
   15733 popular

Output at:

https://gist.github.com/dlangille/a22b87bcb44126e118c4304d185fe1c4

@dlangille
Copy link
Contributor

We can consider the top-20 most watched ports:

freshports.dvl=# select element_pathname(WLE.element_id), count(*) from watch_list_element WLE join ports_active PA on WLE.element_id = PA.element_id group by WLE.element_id order by count(*) desc limit 20;
         element_pathname         | count 
----------------------------------+-------
 /ports/head/devel/gmake          |   738
 /ports/head/converters/libiconv  |   737
 /ports/head/devel/gettext        |   721
 /ports/head/textproc/expat2      |   715
 /ports/head/print/freetype2      |   676
 /ports/head/graphics/png         |   676
 /ports/head/devel/m4             |   674
 /ports/head/archivers/unzip      |   633
 /ports/head/textproc/libxml2     |   576
 /ports/head/devel/pcre           |   573
 /ports/head/graphics/tiff        |   545
 /ports/head/lang/python          |   543
 /ports/head/misc/help2man        |   518
 /ports/head/ftp/wget             |   514
 /ports/head/devel/bison          |   502
 /ports/head/security/nmap        |   494
 /ports/head/security/sudo        |   492
 /ports/head/devel/popt           |   491
 /ports/head/x11-fonts/fontconfig |   487
 /ports/head/mail/postfix         |   466
(20 rows)

@dlangille
Copy link
Contributor

dlangille commented Aug 13, 2024

Things to do:

  • ignore branches on dependent ports (FreshPorts only has a port on a branch if a commit has occurred for that port on that branch - a FreshPorts branch is not like a repo branch)
  • add maintainer
  • restrict list to those with >= 600 dependencies
  • drop /ports/head from the output
  • upload to freefall
  • refresh the list monthly

@dlangille
Copy link
Contributor

dlangille commented Aug 14, 2024

Here is the new approach. This query takes < 20ms to run.

[11:53 pg03 dvl ~] % echo " with PDC as (     
select PD.port_id_dependent_upon as port_id, count(*) as count
from port_dependencies PD
group by PD.port_id_dependent_upon )

select split_part(EP.pathname, '/ports/head/', 2) as name, P.maintainer, count
  FROM ports P join PDC on P.id = PDC.port_id
               JOIN element_pathname EP ON P.element_id = EP.element_id
                     and EP.pathname like '/ports/head/%'
group by name, maintainer, count
having count > 500
ORDER BY count desc; " | psql freshports.dvl > ports.txt

@dlangille
Copy link
Contributor

What it looks like:

              name              |       maintainer       | count 
--------------------------------+------------------------+-------
 lang/python39                  | [email protected]     | 13427
 devel/ruby-gems                | [email protected]       | 11384

@dlangille
Copy link
Contributor

@dlangille
Copy link
Contributor

If this proves useful, it can be automated to update on a regular basis.

@grahamperrin
Copy link
Contributor

… try to measure how much impact

For what it's worth (not to complicate this issue), it might be useful to treat:

  • devel/electron30 as medium/high impact but "not measurably so" (or words to that effect)

– for as long as version 30 will be required to build Signal net-im/signal-desktop, which has some passionate users.


There might be any number of other cases that are not easily measurable in a way that corresponds with reported effects on end users. Signal comes to mind only because I'm aware of things being relatively noisy in and around package infrastructure bug 270565, where (understandably) no more than one version of Electron is built, at this time.

@dlangille
Copy link
Contributor

How do we code that without special casing it? Is there something in the port we can detect?

@grahamperrin
Copy link
Contributor

I'm no expert, but I can't think of anything detectable.

∑ (watch list counts) are low: 2 for electron30, 11 for signal-desktop.

@mirror176
Copy link

mirror176 commented Sep 10, 2024

If the pkg servers keep tallies on how many downloads each package gets, that would be more useful but a 'proper' list needs other data or exceptions as not all ports have permissive distribution by public pkg repos. If any ports have public download counts for their master_sites then that could be turned into a metric but sounds like similar difficulty as automatically checking program versions available from the original source. It would have different but relevant values of both total count and relative count. The reasons why those are separate is some downloads get hosted through 3rd parties, unofficial mirrors/sites, p2p, etc. where download counts are not maintained so each may have relevance.

Another freshports metric that could be gathered would be page views per port. Its value could be debatable as I'm sure I've used ports that I never visited a freshpports entry for and sometimes I view a page but I never install it.

Watchlist still seems like a better count as it means someone took the time to say 'i care about this port' where my main fear is too low of a logged in freshports userbase. Having an automated way to transfer a list of non-automatic packages from a system to freshports would help make that more complete + up to date but it would further benefit from dependencies getting counted too to try to follow flavors and build option differences. Similarly I've thought there could be value in tracking build options just to get a user count of different port options and the user-bases that form around specific divisions. Obviously trying to gather any of these metrics has difficulties both from users security and privacy being impacted when program installation records are tracked+shared.

Originally I was expecting this to be more about the build time and resources instead of userbase. Poudriere build runs log time but that can vary depending on the hardware it runs on, whether compiler caching helped, and for local builders there are issues with how much RAM, what is in RAM vs on disk, what disk speed, etc. It will be great if such metrics also come into play but they will be even more valuable in the ports tree itself so that a builder like poudriere to set how many of which jobs to run in what order and when (preemptively) for downloading, extracting, compiling, packaging, etc. to better utilize resources; considerations I thought about as I thought about writing a competitor/replacement before poudriere existed but never got anywhere for useful code. A 'basic' runtime of the port and summary of its dependencies for build times may be interesting but I don't know how useful it will be when differences get considered, specifically CPU+threads, RAM, and cache; could be attached or laid out similar to the latest version table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants