Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow to detect and auto-launch CLI when a repo has changes #16

Closed
15 of 19 tasks
alyssadai opened this issue Jul 21, 2023 · 2 comments · Fixed by #28
Closed
15 of 19 tasks

Add workflow to detect and auto-launch CLI when a repo has changes #16

alyssadai opened this issue Jul 21, 2023 · 2 comments · Fixed by #28
Assignees

Comments

@alyssadai
Copy link
Contributor

alyssadai commented Jul 21, 2023

We need a process that will listen for changes in any of our forks of the OpenNeuroDatasets in this org and provide the name of the repo (e,g., ds004400, same as the dataset ID) that has changed.

This repo name will serve as the input to another process that will get the data and run the CLI (this is currently a script that lives in another repo).

Steps to implement

  • Create workflow that runs on cronjob and compares the SHA of the latest commit of each repo (testing on first 5 for now) to SHAs in an existing file
    • If SHA is different, run CLI and upload JSONLD as artifacts
    • If repo not found in existing file, write SHA to file
    • Commit the updated SHA record file
  • Uncomment part of workflow that makes it run over all repos
  • Run workflow once on all repos to populate sha.txt
  • (Manually) create a branch auto-upload-jsonld in https://github.com/neurobagel/openneuro-annotations
    • as part of first push, can create a directory jsonld/ w/ .gitkeep

To ensure we can write to openneuro-annotations from a different org:

  • create PAT w/ RW permissions to that external repo and add here

  • Update workflow to:

    • first clear the data/ directory on the branch (this ensures that no outdated JSONLD files stick around)
    • batch-commit all new JSONLD files to the data/ directory on the auto-upload-jsonld branch (ideally this is part of the same commit as the data/ directory clearing, so we can get a diff of all the files that have changed/not)
    • this is after the CLI has finished running on all updated repos
      • NOTE: b/c each CLI produces different UUIDs, there willalways be a diff for JSONLDs that are regenerated successfully
        • Similarly, b/c the log includes the CLI image downloading, the layers have unique IDs across runs and so the log file will also always have diffs in each commit to the branch
    • Remove or comment out section to upload the JSONLDs as artifacts
  • refactor code to run CLI into reusable workflow, which accepts a list of dataset IDs as input

  • create a workflow that only runs on workflow_dispatch which supplies the full list of repos by default

  • See if we can do remaining steps using normal git commands: https://stackoverflow.com/questions/62960533/how-to-use-git-commands-during-a-github-action

Nice to haves

  • Do not make empty commit if sha.txt has not changed
    • Since we switched to normal git commands for pushing files, this should not happen anymore
  • Maybe parallelize?
  • Maybe only check most recently modified repos first?

Questions for future

  • Do we want to have the CLI runner only triggered by specific changes, i.e. to participants.json/tsv? (there is a "Git worker" that updates the ON datasets fairly regularly, but not necessarily to contents we're interested in)
  • Otherwise, should we only have PR/commits to the target JSONLD destination when there are changes to the JSONLD contents?
@alyssadai
Copy link
Contributor Author

@wizofe tagging you here since you have kindly taken a stab at implementing this! 🙌

@alyssadai alyssadai changed the title [FEAT] Add workflow to detect (+ eventually auto-launch some script) when a dataset has changes Add workflow to detect (+ eventually auto-launch some script) when a dataset has changes Aug 9, 2023
@alyssadai alyssadai changed the title Add workflow to detect (+ eventually auto-launch some script) when a dataset has changes Add workflow to detect and auto-launch CLI when a repo has changes May 6, 2024
@alyssadai
Copy link
Contributor Author

  • Can stop as soon as updated files end up in repo
  • Want to ensure any outdated files are removed from existing repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants