Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta_cache_list() - empty published column for recent packages #109

Closed
pawelru opened this issue Apr 19, 2024 · 4 comments
Closed

meta_cache_list() - empty published column for recent packages #109

pawelru opened this issue Apr 19, 2024 · 4 comments

Comments

@pawelru
Copy link
Contributor

pawelru commented Apr 19, 2024

Sys.Date()
#> [1] "2024-04-19"

library(pkgcache)
meta_cache_update()
#> 
#> ℹ Updating metadata database
#> ✔ Updating metadata database ... done
#> 
max(meta_cache_list()$published, na.rm = T) # note the diff from today!
#> [1] "2024-04-09 16:50:05 GMT"

# an example - package `tensorflow` released on 15th of Apr
library(rvest)
read_html("https://cran.r-project.org/web/packages/tensorflow/index.html") |> 
    html_element("table") |> 
    html_table() |> 
    head(x = _, 5)
#> # A tibble: 5 × 2
#>   X1         X2                                                                 
#>   <chr>      <chr>                                                              
#> 1 Version:   "2.16.0"                                                           
#> 2 Depends:   "R (≥ 3.6)"                                                        
#> 3 Imports:   "config, processx, reticulate (≥ 1.32), tfruns (≥ 1.0), utils, yam…
#> 4 Suggests:  "testthat (≥ 2.1.0), keras3, pillar, withr, callr"                 
#> 5 Published: "2024-04-15"

meta_cache_list(packages = "tensorflow")[, c("package", "version", "published")]
#> # A data frame: 2 × 3
#>   package    version published
#> * <chr>      <chr>   <dttm>   
#> 1 tensorflow 2.16.0  NA       
#> 2 tensorflow 2.16.0  NA

Created on 2024-04-19 with reprex v2.1.0

Is this a bug? What I can do to force update the cache? I am analysing CRAN data and the release / publish date is one of my inputs.

@gaborcsardi
Copy link
Member

That column is from metadata that is not on CRAN and we need to collect it separately. Unfortunately I had to shut down the infrastructure that collects it, so it hasn't been updated for a couple of days.

The metadata itself is now here: https://github.com/r-hub/cran-metadata/tree/gh-pages but until I write the code that updates it, it won't be updated. The old update code used a local CRAN mirror, which I don't have any more, so we need a completely new way of updating.

The published field is actually easy, so maybe I'll do that first. The hard ones are the hashes, for those I need to download the package files, and Windows binaries are rebuilt all the time, so that's a lot of downloads, potentially.

Anyway, I wan't aware of any use for that metadata, apart from pak printing the file sizes, so opening this issue was a good idea.

@pawelru
Copy link
Contributor Author

pawelru commented Apr 19, 2024

Thanks @gaborcsardi for a prompt reply. I'll have a look what you linked and consider this as an alternative to rvest-ing this from CRAN webpage. Definitely looking forward to bring this back. pkgcache API is so convenient to my use case. If it's comes to me - I don't use hashes at all so if this is a biggest piece of work then this can be definitely postponed.

@gaborcsardi
Copy link
Member

No need to scrape this field, you can also do something like

db <- tools::CRAN_package_db()
db$`Date/Publication`

@gaborcsardi
Copy link
Member

This is finally now fixed in pkgcache as well, the metadata is updated daily. We could probably update it more often if we wanted to.

The only caveat is that now updates are based on package name and R version, so rebuilds of binaries are not picked up. They should not rebuild source packages, so it should not affect those.

I think it would be possible to update the binaries without keeping a CRAN mirror. I'll open an issue for that: #119.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants