Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single release CMake file that contains all required links/hashes. #4631

Merged
merged 61 commits into from
Mar 4, 2024

Conversation

dudoslav
Copy link
Collaborator

@dudoslav dudoslav commented Jan 15, 2024

This PR aims to autogenerate cmake file containing all required hash/url combinations so that other repositories can simply pull this file and forget about changing all url/hash occurences in their code. The output assets look like this now:

https://github.com/dudoslav/TileDB/releases/tag/t01

Please let me know if the produced CMake file is ok, or should be changed to a different format.


TYPE: BUILD
DESC: Single release CMake file that contains all required links/hashes.

@dudoslav dudoslav requested a review from ihnorton January 15, 2024 11:46
@dudoslav dudoslav self-assigned this Jan 15, 2024
Copy link

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a great start that will simplify updates for downstream packages.

I am hesitant to have a tight coupling between all of FindTileDB_EP.cmake and downstream packages. For now I think we should just provide the capability to get TileDB from github releases, but not aim to have users replace the whole FindTileDB_EP.cmake (in case there are package-specific modifications, etc.)

I think we should limit this to a .cmake file with:

  1. package URLs and SHAs
  2. create a CMake function for the code here, and include it in the file. Downstream packages can call the function inside of their own FindTileDB_EP.cmake.

See comment inline about using cmake configure_file to create this from a .in template.

@@ -0,0 +1,141 @@
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this TileDB-<version>-packages.cmake.in, and use cmake configure file to do the substitution: https://cmake.org/cmake/help/latest/command/configure_file.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I might not understand properly what you are trying to achieve. configure_file(...) can be called from within another CMake file, so we would need something like generate.cmake and TileDB-<version>-packages.cmake.in that would be used to generate TileDB-<version>-packages.cmake. In any case we still need to substitute all values in BASH script since it has the whole list-regex logic. Is this what you had in mind?:

  • GitHub CI -> creates generate.cmake with all required values already in it.
  • generate.cmake uses these values to configure_file TileDB-.....cmake.in (This could have been done by CI in the first step)
  • upload the produced TileDB-...cmake with its SHA to release (We don't need to upload generate.cmake)

Do I understand it correctly?
If not could you perhaps explain your idea a bit more?

@eddelbuettel
Copy link
Contributor

Love this. Having the information in one place (by release) is very good. But may I raise my hand and ask for a (complementary) non-cmake format? Maybe JSON? Maybe key:value pairs? R does use cmake and while I could shimmy it (with help) a simpler approach would be good too.

@dudoslav
Copy link
Collaborator Author

dudoslav commented Jan 15, 2024

I have no problem also adding a json format, however, do you really need release build outside cmake?

I can read JSON from CMAKE without any issues since I can use string(JSON ...) functions

@ihnorton
Copy link
Member

Yes, good point @eddelbuettel

I have no problem also adding a json format, however, do you really need release build outside cmake?

Yes, these builds are used by TileDB-R outside of CMake.

@eddelbuettel
Copy link
Contributor

To be perfectly plain, I would prefer non-JSON and reading JSON happens to not be batteries included. I currently use the 'Debian Control File' format R itself uses in its DESCRIPTION file, for that we have a ready-made parser included (read.dcf() that is):

https://github.com/TileDB-Inc/TileDB-R/blob/e17563a554921d952ac473d5e49b65888a6a035a/tools/tiledbVersion.txt#L1-L2

Of course, other formats work but at the cost of added dependencies. So JSON, YAML, TOML, ... are all equally (in)convenient with the need for other library.

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Jan 15, 2024

do you really need release build outside cmake?

Yeah. Let's plan on a quick huddle this week and I show you a quick demo and more resources. In essence, R is driven by e.g. R CMD INSTALL nameOfArchve_x.y.z.tar.gz and invokes just make and other standard tools (via pre-made added Makefile snippets). One can farm out to cmake and other callable tools but it counts as extra dependency. We currently only depend on cmake if a build from (full, ie TileDB) source is requested else R and shell etc lead to the download of an artifact.


function(DOWNLOAD_TILEDB FROM_SOURCE)
# Available builds
set(URL_TILEDB_LINUX_X86_64 "{{URL_TILEDB_LINUX_X86_64}}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set these in PARENT_SCOPE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to pollute the parent scope with these variables, since noone else needs them

Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick comments, I will need some time to think of this design.

ExternalProject_Add(ep_tiledb
PREFIX "externals"
URL "${URL_TILEDB_SOURCE}"
URL_HASH SHA1=${SHA_TILEDB_SOURCE}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHA-1 is insecure; we should be using at least SHA-256.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I would like to change it if everyone is OK with it

CMAKE_ARGS
-DCMAKE_INSTALL_PREFIX=${EP_INSTALL_PREFIX}
-DCMAKE_PREFIX_PATH=${EP_INSTALL_PREFIX}
-DTILEDB_S3=${TILEDB_S3}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be passing more options here. The available set of options is at BuildOptions.cmake. I'm not saying to pass all of them, but the user should be able to set some more like Azure, GCS and WebP.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add these options to function's arguments

@dudoslav
Copy link
Collaborator Author

Ok so my next plan is to:

  1. Create CSV file of all releases + source
  2. Change sha1 to sha256 if its ok with everyone (because it might create more problems downstream)
  3. Modify cmake to use this csv file
  4. Divide cmake function into download_prebuilt and download_source

@dudoslav
Copy link
Collaborator Author

This might be a format that could be compatible with both R and CMake:

(base) dudoslav@beetle:~$ cat rellist.csv
os,arch,url,sha,noavx2
LINUX,X86_64,https://SOMEURL.com,da39a3ee5e6b4b0d3255bfef95601890afd80709,
LINUX,X86_64,https://SOMEURL.com,da39a3ee5e6b4b0d3255bfef95601890afd80709,NOAVX2
WINDOWS,X86_64,https://SOMEURL.com,da39a3ee5e6b4b0d3255bfef95601890afd80709,

@teo-tsirpanis
Copy link
Member

DESC: Look above

The item in the DESC field will be included in the release notes. You should set it to something meaningful like Add redistributable CMake file that downloads prebuilt binaries from GitHub Releases.

@teo-tsirpanis
Copy link
Member

teo-tsirpanis commented Jan 16, 2024

Here's how I am imagining this. Each release will have a JSON (or another format like CSV but I would prefer JSON IMHO) file named say artifacts.json that will contain information about the available release artifacts. Here's an example for 2.19.0:

{
    // We could add this to help non-vcpkg builds from source.
    "git-ref": "fa30a88abc5504ed8387e25937744afcd3802232",
    "artifacts": {
        "x64-windows": {
            "filename": "tiledb-windows-x86_64-2.19.0-fa30a88a.zip",
            "sha512": ""
        },
        "x64-linux": {
            "filename": "tiledb-linux-x86_64-2.19.0-fa30a88a.zip",
            "sha512": ""
        },
        "x64-linux-noavx2": {
            "filename": "tiledb-linux-x86_64-noavx2-2.19.0-fa30a88a.zip",
            "sha512": ""
        },
        "x64-macos": {
            "filename": "tiledb-macos-x86_64-2.19.0-fa30a88a.zip",
            "sha512": ""
        },
        "arm64-macos": {
            "filename": "tiledb-macos-arm64-2.19.0-fa30a88a.zip",
            "sha512": ""
        }
    }
}

By reading from the JSON file, the CMake script can be version-agnostic and be checked in once for every downstream repository. Here's how the script could be used:

include(DownloadPrebuiltTileDB)

# Downloads prebuilt binaries of TileDB from GitHub
# Releases and calls find_package to import them.
find_prebuilt_tiledb(
  # The version of TileDB to use. Required.
  VERSION 2.19.0
  # The hash of the artifacts.json file. Optional, will warn if not
  # specified, maybe suggesting to at least enable CMAKE_TLS_VERIFY.
  MANIFEST_SHA512 …
  # The name of the release artifact to download.
  # Optional, will try to autodetect if not specified.
  ARTIFACT_NAME x64-linux-noavx2
  # Flags that will also be passed to the find_package call.
  REQUIRED QUIET
)

target_link_libraries(my_project PRIVATE TileDB::tiledb)

Regarding building from source I am wondering if it would be a better idea to have this CMake script exclusively deal with downloading prebuilt binaries, and direct users that want more customizability to vcpkg (after we create a tiledb port, which can be done at any time). Scoping this out will significantly simplify the script.

@dudoslav
Copy link
Collaborator Author

@teo-tsirpanis I really like your suggestion, but there is a requirement from the R team for a format that could be easily included without dependencies. So JSON is somewhat out of question. Other than that, the CMake script will contain two functions:

fetch_release_list that downloads and parses the release list (rellist.csv)
download_source_tiledb that downloads and build source tiledb
download_prebuilt_tiledb that downloads prebuilt tiledb

The question is that if we don't need download_source_tiledb then I could get rid of fetch_release_list function. However, are we really sure we don't need that source tiledb distribution?

@eddelbuettel
Copy link
Contributor

The R build would need to load another package just to parse JSON, there is no built-in reader. Csv and alike work fine.

Another idea, because we are not short those few bytes, would be to store both sha1 and sha256 checksums.

@dudoslav
Copy link
Collaborator Author

Also, I am still not sure where to put this download logic, since it will not change between releases. Only the rellist changes. Maybe if I would hardcode the latest version into it it could be also distributed as release asset otherwise I am not sure what to do with it

@dudoslav
Copy link
Collaborator Author

dudoslav commented Jan 16, 2024

Ok, so what do you guys think about the latest changes and mostly about the DownloadPrebuiltTileDB.cmake file? I removed the source distribution logic as requested and add parameters to that CMake script.

Edit 1.: As always the produced CSV can be found here: https://github.com/dudoslav/TileDB/releases/download/t01/rellist.csv

Edit 2: Fixed link

Edit 2: Change link directly to CSV

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Jan 16, 2024

As always the produced CSV can be found here:

That's a cmake file, not a csv file.

Edit two weeks later: Not sure what I was up to in that comment. Hm.

@dudoslav dudoslav force-pushed the db/sc-17775/release_cmake branch from 4a40e0c to 705260d Compare February 5, 2024 14:20
@dudoslav
Copy link
Collaborator Author

dudoslav commented Feb 5, 2024

UPDATE: I had to rebase this branch on top of another branch that add CPACK releases, since those ones actually generate sha256 hashes. This means that this PR should be merged after that change: #4694

Here are proposed release assets:
https://github.com/dudoslav/TileDB/releases/tag/t03

Copy link
Member

@ihnorton ihnorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

detect_artifact_name(FETCH_PREBUILT_TILEDB_ARTIFACT_NAME)
endif()

ExternalProject_Add(ep_tiledb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use FetchContent or plain downloading?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this PR should rework this download logic from the original approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that ExternalProject might force downstream projects to use a superbuild architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExternalProject

Is this something we should address now in your opinion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the issue is that they still use it right? Don't we want to switch to vcpkg style dependencies? (ports)

Copy link
Member

@teo-tsirpanis teo-tsirpanis Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the plan is that all first-party downstream projects eventually get TileDB from vcpkg.

We might want to change ExternalProject later, but it will be a breaking change which might or might not be OK to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought the use of ExternalProject in this case will not require a superbuild project because we are using pre-built artifacts. I am not opposed to ExternalProject then. There is still the question of what would happen with using this script to download source artifacts.

Comment on lines 59 to 66
list(GET LINE 0 OS)
list(GET LINE 1 ARCH)
list(GET LINE 2 URL)
list(GET LINE 3 SHA)

if(${LENGTH} GREATER 4)
list(GET LINE 4 NOAVX2)
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have some reservations on how good this schema is. A more scalable and clean solution would be a PLATFORM,URL,SHA, where PLATFORM would be a string like x64-windows, x64-linux-noavx2, or source.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem reworking it that way, I assume that it would work for everyone.

@KiterLuc KiterLuc changed the title Single release CMake file that contains all required links/hashes Single release CMake file that contains all required links/hashes. Feb 28, 2024
@@ -74,7 +74,7 @@ endfunction()

function(detect_artifact_name OUT_VAR)
if (WIN32) # Windows
SET(${OUT_VAR} TILEDB_WINDOWS_X86_64 PARENT_SCOPE)
SET(${OUT_VAR} TILEDB_WINDOWS-X86_64 PARENT_SCOPE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fail on other architectures. Same in Linux.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that CMAKE can only define these three variables: WIN32/APPLE/LINUX

@dudoslav
Copy link
Collaborator Author

dudoslav commented Mar 1, 2024

I changed download logic to use FetchContent and add ReleaseListGeneration into Release workflow. This is the resulting artifacts:

https://github.com/dudoslav/TileDB/releases/tag/t09

And this is the pipeline that produced it:

https://github.com/dudoslav/TileDB/actions/runs/8110756605

I also downloaded DownloadPrebuiltTileDB.cmake and tested in local dummy project.

@KiterLuc KiterLuc merged commit 79030b1 into TileDB-Inc:dev Mar 4, 2024
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants