Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write an actual worker script #2

Open
mguaypaq opened this issue Dec 22, 2022 · 4 comments · May be fixed by #5
Open

Write an actual worker script #2

mguaypaq opened this issue Dec 22, 2022 · 4 comments · May be fixed by #5

Comments

@mguaypaq
Copy link
Member

One that:

  1. creates a temp directory;
  2. checks out the given commit;
  3. git-annex gets the data files;
  4. runs bids-validator;
  5. runs the output through ansifilter --html --fragment --line-numbers --anchors=self
  6. spits out a minimal HTML header and footer for the fragment
  7. cleans up the temp directory

See the current worker script for a skeleton.

@kousu
Copy link
Member

kousu commented Feb 17, 2023

We should try to make the test work with plain git and with git-lfs as well. It needs to detect if git-annex is in use before running git annex init, somehow. Or maybe with enough settings. We also must must set remote.origin.annex-readonly true.

@kousu
Copy link
Member

kousu commented Feb 17, 2023

I've done some preliminary explorations tonight to optimize git annex get.

I noticed that gitea has a folder data/tmp/; so if we cd data/tmp/bids-hook/$(mktemp); git clone ../../../gitea-repositories/${BH_USER}/${BH_REPO}.git then git's hardlink logic should kick in since it should be safe to assume we're on the same filesystem; git-annex has its own hardlink logic, activated by git config annex.hardlink and git config annex.thin, to further reduce storage. In fact if we git clone --shared then git-annex automatically assumes git config annex.hardlink true too.

I grabbed a dataset I had and uploaded it to my test instance, putting it here:

p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ du -hs .
357M    .
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ du -h -x -d 1 | sort -h
4,0K    ./branches
12K     ./info
32K     ./refs
108K    ./hooks
14M     ./objects
343M    ./annex
357M    .

Everything is singly-linked (notice: find -links $n means >=$n):

p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ find . -type f -links 2
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ find . -type f -links 3
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ find . -type f -links 4
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ 

I picked out this file to examine the hardlink status of under different conditions:

p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ git show HEAD:sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz
/annex/objects/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ find . -name SHA256E-s10578473--eae91fce7714d49508044a29a6fd04b43c79420037d928b1d792bb9c1c3cd573.nii.gz
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ find . -name SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz
./annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz
./annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ stat -c "%n @ %i avec %h liens" ./annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz
./annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens

I wrote this test script to see how different combinations of flags would result in space usage:

test() {
  (set -eu

  CLONE_ARGS="$1"; shift
  LN_MODE="$1"; shift

  BEFORE="$(du -s ~/src/neurogitea/gitea/data/ | awk '{print $1}')"
  (
    set -e
    cd "$(TMPDIR=~/src/neurogitea/gitea/data/tmp/bids-hook mktemp -d --suffix="$CLONE_ARGS,$LN_MODE")"
    pwd
    git clone $CLONE_ARGS ~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git >/dev/null 2>&1
    cd spine-generic-processed

    # annex.hardlink and annex.thin both use hardlinks
    # but they are different features
    # so git-annex does not allow both at the same time
    case "$LN_MODE" in
      "hardlink")
        git config annex.hardlink true
        git config annex.thin false
        ;;
      "thin")
        git config annex.hardlink false
        git config annex.thin true
        ;;
      "both")
        # in this case, annex.thin should prevail according to the docs
        git config annex.hardlink true
        git config annex.thin true
        ;;
      "neither")
        # if in the $CLONE_ARGS="--shared" case this should be the same as the hardlink case
        git config annex.hardlink false
        git config annex.thin false
        ;;
      *)
        exit 2
        ;;
    esac

    git annex init

    git annex get >/dev/null 2>&1 || true

    stat -c "%n @ %i avec %h liens" ~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz .git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz

  )
  AFTER="$(du -s ~/src/neurogitea/gitea/data/ | awk '{print $1}')"

  echo $((($AFTER - $BEFORE)/1024))MiB
  )
}

The difference between --local and --shared:

   -l, --local
      When the repository to clone from is on a local machine, this flag bypasses the normal "Git aware" transport mechanism and clones the
      repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are hardlinked
      to save space when possible.
   -s, --shared
      When the repository to clone is on the local machine, instead of using hard links, automatically setup .git/objects/info/alternates to share
      the objects with the source repository. The resulting repository starts out without any object of its own.
--shared, annex.hardlink => +397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--shared" "hardlink"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.Rs0CFRv01f--shared,hardlink
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)


  Repository was cloned with --shared; setting annex.hardlink=true and making repository untrusted.
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 2 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12205926 avec 1 liens
397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*
--shared, annex.thin => +397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--shared" "thin"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.WWqJqdrJGw--shared,thin
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)

  Repository was cloned with --shared; setting annex.hardlink=true and making repository untrusted.
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12209795 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12209795 avec 2 liens
397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*
--shared, both => +397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--shared" "both"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.tJUOGGfIZe--shared,both
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)

  Repository was cloned with --shared; setting annex.hardlink=true and making repository untrusted.
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12209795 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12209795 avec 2 liens
397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*
--shared, neither => +397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--shared" "neither"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.rZHXKhkA9G--shared,neither
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)

  Repository was cloned with --shared; setting annex.hardlink=true and making repository untrusted.
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 2 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12205926 avec 1 liens
397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*
--local, annex.hardlink => +397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--local" "hardlink"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.dNy74HKRRS--local,hardlink
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 2 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12206778 avec 1 liens
397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*
--local, annex.thin => +397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--local" "thin"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.JqYuiHT7Qh--local,thin
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12210303 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12210303 avec 2 liens
397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*
--local, both => +397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--local" "both"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.ZKSZB9k2DN--local,both
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12210303 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12210303 avec 2 liens
397MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*
p115628@joplin:~/src/neurogitea/gitea/data$ 
--local, neither => +738MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--local" "neither"
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/tmp/bids-hook/tmp.m9qqJbN4r5--local,neither
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12210340 avec 1 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 12206785 avec 1 liens
738MiB
p115628@joplin:~/src/neurogitea/gitea/data$ chmod -R +w tmp/bids-hook/ && rm -r tmp/bids-hook/*

And if we use /tmp instead of data/tmp/ then annex.hardlink is ignored, producing double (i.e. tripling) the data , but annex.thin works (only doubling the data), but only with --shared; --local is not allowed cross-filesystem.

--shared, annex.hardlink => +727MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--shared" "hardlink"
/tmp/tmp.dhPe81RtbI--shared,hardlink
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)

  Repository was cloned with --shared; setting annex.hardlink=true and making repository untrusted.
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 1444558 avec 1 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 1444563 avec 1 liens
727MiB
--shared, annex.thin => +387MiB
p115628@joplin:~/src/neurogitea/gitea/data$ test "--shared" "thin"
/tmp/tmp.zmE0fH5edL--shared,thin
init  (merging origin/git-annex origin/synced/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)

  Repository was cloned with --shared; setting annex.hardlink=true and making repository untrusted.
ok
(recording state in git...)
/home/GRAMES.POLYMTL.CA/p115628/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git/annex/objects/7a9/50b/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 12131858 avec 1 liens
.git/annex/objects/ZM/2K/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz/SHA256E-s1329011--3178c64a6d29ded1202e720105306f528ea0dcb401417c8828090a18123b6579.nii.gz @ 1457950 avec 2 liens
sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz @ 1457950 avec 2 liens
387MiB
--local, annex.hardlink => error
p115628@joplin:~/src/neurogitea/gitea/data$ test "--local" "hardlink"
/tmp/tmp.A6G1PVec7A--local,hardlink
Clonage dans 'spine-generic-processed'...
fatal: échec de la création du lien 'spine-generic-processed/.git/objects/f6/afab4f7dc9a30fc3c123118ee7d46e44748299': Lien physique inter-périphérique invalide
--local, annex.thin => error
p115628@joplin:~/src/neurogitea/gitea/data$ test "--local" "thin"
/tmp/tmp.VGFx3AjtC4--local,thin
Clonage dans 'spine-generic-processed'...
fatal: échec de la création du lien 'spine-generic-processed/.git/objects/f6/afab4f7dc9a30fc3c123118ee7d46e44748299': Lien physique inter-périphérique invalide

So it is impossible to get a zero-copy dataset, at least not with git clone: we either have to copy all the files into .git/annex/objects, or copy them from .git/annex/objects to the work tree. We cannot use hardlinks for both cases, because git-annex refuses to support both annex.hardlink and annex.thin at the same time.

@kousu
Copy link
Member

kousu commented Feb 17, 2023

I tried to see if I could get zero-copies by using git --git-dir to avoid having to run the clone, but git-annex doesn't like this:

p115628@joplin:~/src/neurogitea/gitea/data/tmp/bids-hook$ mkdir spine-generic-processed
p115628@joplin:~/src/neurogitea/gitea/data/tmp/bids-hook$ cd spine-generic-processed
p115628@joplin:~/src/neurogitea/gitea/data/tmp/bids-hook/spine-generic-processed$ git --git-dir ../../../gitea-repositories/kousu/spine-generic-processed.git/ --work-tree=. reset --hard master 2>&1 ^C
p115628@joplin:~/src/neurogitea/gitea/data/tmp/bids-hook/spine-generic-processed$ ls
dataset_description.json  sub-brnoCeitec05  sub-fslPrisma03     sub-mniS04         sub-oxfordFmrib10  sub-strasbourg02    sub-ucdavis02
derivatives               sub-brnoCeitec06  sub-fslPrisma04     sub-mniS05         sub-oxfordFmrib11  sub-strasbourg03    sub-ucdavis03
participants.json         sub-brnoUhb01     sub-fslPrisma05     sub-mniS06         sub-oxfordOhba01   sub-strasbourg04    sub-ucdavis04
participants.tsv          sub-brnoUhb02     sub-fslPrisma06     sub-mniS07         sub-oxfordOhba02   sub-strasbourg05    sub-ucdavis05
README                    sub-brnoUhb03     sub-geneva01        sub-mniS08         sub-oxfordOhba03   sub-strasbourg06    sub-ucdavis06
sub-amu01                 sub-brnoUhb04     sub-geneva02        sub-mniS09         sub-oxfordOhba04   sub-tehranS01       sub-ucdavis07
sub-amu02                 sub-brnoUhb05     sub-geneva03        sub-mountSinai01   sub-oxfordOhba05   sub-tehranS02       sub-ucl01
sub-amu03                 sub-brnoUhb06     sub-geneva04        sub-mountSinai02   sub-pavia01        sub-tehranS03       sub-ucl02
sub-amu04                 sub-brnoUhb07     sub-geneva05        sub-mountSinai03   sub-pavia02        sub-tehranS04       sub-ucl03
sub-amu05                 sub-brnoUhb08     sub-geneva06        sub-mountSinai04   sub-pavia03        sub-tehranS05       sub-ucl04
sub-balgrist01            sub-cardiff01     sub-hamburg01       sub-mountSinai05   sub-pavia04        sub-tehranS06       sub-ucl05
sub-balgrist02            sub-cardiff02     sub-hamburg02       sub-mountSinai06   sub-pavia05        sub-tokyo750w01     sub-ucl06
sub-balgrist03            sub-cardiff03     sub-hamburg03       sub-mpicbs01       sub-pavia06        sub-tokyo750w02     sub-unf01
sub-balgrist04            sub-cardiff04     sub-hamburg04       sub-mpicbs02       sub-perform01      sub-tokyo750w03     sub-unf02
sub-balgrist05            sub-cardiff05     sub-hamburg05       sub-mpicbs03       sub-perform02      sub-tokyo750w04     sub-unf03
sub-balgrist06            sub-cardiff06     sub-hamburg06       sub-mpicbs05       sub-perform03      sub-tokyo750w05     sub-unf04
sub-barcelona01           sub-cmrra01       sub-juntendo750w01  sub-mpicbs06       sub-perform04      sub-tokyo750w06     sub-unf05
sub-barcelona02           sub-cmrra02       sub-juntendo750w02  sub-mpicbs07       sub-perform05      sub-tokyo750w07     sub-unf06
sub-barcelona03           sub-cmrra03       sub-juntendo750w03  sub-nottwil01      sub-perform06      sub-tokyoIngenia01  sub-unf07
sub-barcelona04           sub-cmrra04       sub-juntendo750w04  sub-nottwil02      sub-queensland01   sub-tokyoIngenia02  sub-vallHebron01
sub-barcelona05           sub-cmrra05       sub-juntendo750w05  sub-nottwil03      sub-queensland02   sub-tokyoIngenia03  sub-vallHebron02
sub-barcelona06           sub-cmrra06       sub-juntendo750w06  sub-nottwil04      sub-queensland03   sub-tokyoIngenia04  sub-vallHebron03
sub-beijingGE01           sub-cmrrb01       sub-mgh01           sub-nottwil05      sub-queensland04   sub-tokyoIngenia05  sub-vallHebron04
sub-beijingGE02           sub-cmrrb02       sub-mgh02           sub-nottwil06      sub-queensland05   sub-tokyoIngenia06  sub-vallHebron05
sub-beijingGE03           sub-cmrrb03       sub-mgh03           sub-nwu01          sub-queensland06   sub-tokyoIngenia07  sub-vallHebron06
sub-beijingGE04           sub-cmrrb04       sub-mgh04           sub-nwu02          sub-sherbrooke01   sub-tokyoSkyra01    sub-vallHebron07
sub-beijingPrisma01       sub-cmrrb05       sub-mgh05           sub-nwu03          sub-sherbrooke02   sub-tokyoSkyra02    sub-vuiisAchieva01
sub-beijingPrisma02       sub-cmrrb06       sub-mgh06           sub-nwu04          sub-sherbrooke03   sub-tokyoSkyra03    sub-vuiisAchieva02
sub-beijingPrisma03       sub-cmrrb07       sub-milan01         sub-nwu05          sub-sherbrooke04   sub-tokyoSkyra04    sub-vuiisAchieva03
sub-beijingPrisma04       sub-dresden01     sub-milan02         sub-nwu06          sub-sherbrooke05   sub-tokyoSkyra05    sub-vuiisAchieva04
sub-beijingPrisma05       sub-dresden02     sub-milan03         sub-oxfordFmrib01  sub-sherbrooke06   sub-tokyoSkyra06    sub-vuiisAchieva05
sub-beijingVerio01        sub-fslAchieva01  sub-milan04         sub-oxfordFmrib02  sub-sherbrooke07   sub-tokyoSkyra07    sub-vuiisAchieva06
sub-beijingVerio02        sub-fslAchieva02  sub-milan05         sub-oxfordFmrib03  sub-stanford01     sub-ubc01           sub-vuiisIngenia01
sub-beijingVerio03        sub-fslAchieva03  sub-milan06         sub-oxfordFmrib04  sub-stanford02     sub-ubc02           sub-vuiisIngenia02
sub-beijingVerio04        sub-fslAchieva04  sub-milan07         sub-oxfordFmrib05  sub-stanford03     sub-ubc03           sub-vuiisIngenia03
sub-brnoCeitec01          sub-fslAchieva05  sub-mniPilot1       sub-oxfordFmrib06  sub-stanford04     sub-ubc04           sub-vuiisIngenia04
sub-brnoCeitec02          sub-fslAchieva06  sub-mniS01          sub-oxfordFmrib07  sub-stanford05     sub-ubc05           sub-vuiisIngenia05
sub-brnoCeitec03          sub-fslPrisma01   sub-mniS02          sub-oxfordFmrib08  sub-stanford06     sub-ubc06           sub-vuiisIngenia06
sub-brnoCeitec04          sub-fslPrisma02   sub-mniS03          sub-oxfordFmrib09  sub-strasbourg01   sub-ucdavis01
p115628@joplin:~/src/neurogitea/gitea/data/tmp/bids-hook/spine-generic-processed$ git --git-dir ../../../gitea-repositories/kousu/spine-generic-processed.git/ --work-tree=. annex get sub-fslAchieva04 2>&1 
get sub-fslAchieva04/anat/sub-fslAchieva04_T1w.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
get sub-fslAchieva04/anat/sub-fslAchieva04_T2star.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
get sub-fslAchieva04/anat/sub-fslAchieva04_T2w.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
get sub-fslAchieva04/anat/sub-fslAchieva04_acq-MToff_MTS.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
get sub-fslAchieva04/anat/sub-fslAchieva04_acq-MTon_MTS.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
get sub-fslAchieva04/anat/sub-fslAchieva04_acq-T1w_MTS.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
get sub-fslAchieva04/dwi/sub-fslAchieva04_dwi.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
get sub-fslAchieva04/dwi/sub-fslAchieva04_rec-average_dwi.nii.gz (not available) 
  Maybe add some of these git remotes (git remote add ...):
        456f378d-8e63-4b77-a2f4-ec39721417ed -- [email protected]:~/datasets/spine-generic-processed2
        9d1f6a73-0da8-447e-8cee-0122a4e52c0c -- [email protected]:~/repositories/datasets/spine-generic-processed.git
        af0664a3-e273-45a6-97e5-d11581ebaf49 -- [email protected]:~/datasets/spine-generic-processed
failed
git-annex: get: 8 failed

Maybe there's a way to trick it but this seems like a dead-end.

And a risky dead-end too, because I'm pretty sure using git --git-dir means git is allowed to write back directly to the source dir; for example, using reset or checkout will edit refs/ in the source repo

p115628@joplin:~/src/neurogitea/gitea/data/tmp/bids-hook/spine-generic-processed$ git --git-dir ../../../gitea-repositories/kousu/spine-generic-processed.git/ --work-tree=. reset --hard master~3 2>&1 
HEAD est maintenant à 623542ee8 add 674 manual segmentations and edit 41
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ cat refs/heads/master 
623542ee8657b40c304901bb7a2e17dc1806f566
p115628@joplin:~/src/neurogitea/gitea/data/tmp/bids-hook/spine-generic-processed$ git --git-dir ../../../gitea-repositories/kousu/spine-generic-processed.git/ --work-tree=. reset --hard master~3 2>&1 
HEAD est maintenant à 453988754 add missing 9 seg
p115628@joplin:~/src/neurogitea/gitea/data/gitea-repositories/kousu/spine-generic-processed.git$ cat refs/heads/master 
453988754d09d1231a7e000bd6d181693c6546a6

so that's not good.

@kousu
Copy link
Member

kousu commented Feb 17, 2023

In retrospect, it seems that the best option is to stick with annex.thin and using neither --shared nor --local. It achieves the minimum I've found so far -- one full copy of (the relevant pieces of) annex/objects/ -- and always works whether we're checking out to the same filesystem as the source repo or not, because it's hardlinking between .git/ and ./, which are definitely always going to be on the same filesystem, and doesn't require any complicated. If we are checking out to the same filesystem, this is the same as git clone --local, and if not, we have to make some copies but they shouldn't be that expensive because they should be just the metadata files.

And to further simplify, we can default to using /tmp (but have a way for sysadmins to configure that for themselves if they need more working storage than their system's RAM).

@kousu kousu linked a pull request Feb 20, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants