Skip to content
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.

Using --to-git breaks git diff functionality #195

Open
manuelakuhn opened this issue Apr 6, 2021 · 5 comments
Open

Using --to-git breaks git diff functionality #195

manuelakuhn opened this issue Apr 6, 2021 · 5 comments

Comments

@manuelakuhn
Copy link

I try to add a file to git instead of git-annex using the --to-git flag. This works fine for the initial commit but as soon as I modify the file, git diff does not work anymore because it claims that the file is stored in git-annex.

This only happens when creating the dataset using the hirni configuration template. When using a dataset created without a configuration template this problem does not exist and it works as expected.

Minimal example:

  • create dataset and save file
    $ datalad create -c hirni sourcedata
    $ cd sourcedata
    $ wget -P code https://raw.githubusercontent.com/psychoinformatics-de/datalad-hirni/master/datalad_hirni/resources/rules/custom_rules_template.py
    $ datalad save -m "Add custom rule" --to-git
    
  • open + edit file, e.g. add print statement in _rules function
  • safe change
  • investigating changes via git diff does not work anymore because the file is clamed to be stored in git-annex now:
    $ git diff
    diff --git a/code/custom_rules_template.py b/code/custom_rules_template.py
    index 0527da2..44db796 100644
    --- a/code/custom_rules_template.py
    +++ b/code/custom_rules_template.py
    @@ -1,54 +1 @@
    -"""Template for writing custom rules for dicom2spec"""
    -
    -
    -class MyDICOM2SpecRules(object):
    -
    -    def __init__(self, dicommetadata):
    -        """
    -
    -        Parameter
    -        ----------
    -        dicommetadata: list of dict
    -            dicom metadata as extracted by datalad; one dict per image series
    -        """
    -        self._dicom_series = dicommetadata
    -
    -    def __call__(self, subject=None, anon_subject=None, session=None):
    -        """
    -
    -        Parameters
    -        ----------
    -
    -        Returns
    -        -------
    -        list of tuple (dict, bool)
    -        """
    -        spec_dicts = []
    -        for dicom_dict in self._dicom_series:
    -            spec_dicts.append((self._rules(dicom_dict,
    -                                           subject=subject,
    -                                           anon_subject=anon_subject,
    -                                           session=session),
    -                               self.series_is_valid(dicom_dict)
    -                               )
    -                              )
    -        return spec_dicts
    -
    -    def _rules(self, series_dict, subject=None, anon_subject=None,
    -               session=None):
    -
    -        return {'description': series_dict['SeriesDescription']
    -                if "SeriesDescription" in series_dict else '',
    -
    -                'comment': 'I actually have no clue',
    -                'subject': series_dict['PatientID'] if not subject else subject,
    -                'anon-subject': anon_subject if anon_subject else None,
    -                'bids-session': session if session else None
    -                }
    -
    -    def series_is_valid(self, series_dict):
    -
    -        return series_dict['ProtocolName'] != 'ExamCard'
    -
    -
    -__datalad_hirni_rules = MyDICOM2SpecRules
    +/annex/objects/MD5E-s1706--837b847e5d9a8814d4aea4aa2a227aa6.py
@bpoldrack
Copy link

  • open + edit file, e.g. add print statement in _rules function
  • safe change

How exactly did you call save after editing and with what datalad version? I suspect you didn't say --to-git again?

Generally it probably boils down to the .gitattributes file. The default set by the cfg_hirni procedure would put everything into annex except for explicitly specified exceptions. That default is chosen, so one doesn't accidentally put something in git directly that may contain sensitive information (that's hard to retract). As long as things are in annex, one does maintain access control on the storage side per file. So, you could fix this by editing .gitattributes to say that this file (or every .py file, or any text file, or anything under code/, etc.) should be kept in git.

Ultimately it's a datalad issue though (as in not hirni itself) w/o an incredibly obvious solution. The .gitattributes say it should go into annex, while it's already tracked in git directly. One could argue "Well, save should respect .gitattributes for new files only!". But this isn't really inline with git-annex itself. You could base the config on file size and annex would change how the file is tracked, when it reaches that size threshold. So - not instantly clear how datalad should deal with that situation. Again: This is assuming you didn't specify --to-git with the second call to save.

However, if there's nothing to discuss about the default config set by hirni from your point of view, a discussion of a potential change of behavior in save, need for docs, etc. should move over there.

@manuelakuhn
Copy link
Author

There is no second datalad save call in this. I did not save the file to datalad at all after editing. With "safe changes" (sorry about the typo) I just mean save the change in the editor. And right after closing the editor git diff reports what is shown above.

@manuelakuhn
Copy link
Author

Just as an additional remark: I first thought this was a datalad issue in general but I did not manage to reproduce it with another type of dataset. I tried: no configuration template and the yoda template. That is when I suspected it has something to do with the hirni template and that is also why I opened a issue here.

Ultimately it's a datalad issue though (as in not hirni itself) w/o an incredibly obvious solution. The .gitattributes say it should go into annex, while it's already tracked in git directly. One could argue "Well, save should respect .gitattributes for new files only!". But this isn't really inline with git-annex itself. You could base the config on file size and annex would change how the file is tracked, when it reaches that size threshold. So - not instantly clear how datalad should deal with that situation. Again: This is assuming you didn't specify --to-git with the second call to save.

This behaviour I encountered as well and you make a good point here. While this seems to not be so straight forward, it would quite improve usability.

However, if there's nothing to discuss about the default config set by hirni from your point of view, a discussion of a potential change of behavior in save, need for docs, etc. should move over there.

This you can judge best, so feel free to do so.

@bpoldrack
Copy link

I just mean save the change in the editor. And right after closing the editor git diff reports what is shown above.

Oof. I have a suspicion (where it's actually git-diff itself that triggers the change), but will have a closer look and try to reproduce. Can you give me the output of datalad wtf please? Versions of datalad, git and git-annex may play a role here.

@manuelakuhn
Copy link
Author

I always try to reproduce the problem in a clean virtual environment with the newest datalad installation. That's why it is always in /tmp :-)

Here you go:

$ datalad wtf
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## credentials 
  - keyring: 
    - active_backends: 
      - SecretService Keyring
      - PlaintextKeyring with no encyption v.1.0 at /home/nela/.local/share/python_keyring/keyring_pass.cfg
    - config_file: /home/nela/.config/python_keyring/keyringrc.cfg
    - data_root: /home/nela/.local/share/python_keyring
## datalad 
  - full_version: 0.14.1
  - version: 0.14.1
## dataset 
  - id: 24c8a5bf-1b95-4397-b9c4-47d40fd09a0a
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /tmp/datalad-tests/sourcedata
  - repo: AnnexRepo
## dependencies 
  - annexremote: 1.5.0
  - appdirs: 1.4.4
  - boto: 2.49.0
  - cmd:annex: 8.20201129
  - cmd:bundled-git: UNKNOWN
  - cmd:git: 2.26.3
  - cmd:system-git: 2.26.3
  - cmd:system-ssh: 8.3p1
  - exifread: 2.3.2
  - humanize: 3.3.0
  - iso8601: 0.1.14
  - keyring: 23.0.1
  - keyrings.alt: 4.0.2
  - msgpack: 1.0.2
  - mutagen: 1.45.1
  - requests: 2.25.1
  - wrapt: 1.12.1
## environment 
  - LANG: de_DE.UTF-8
  - PATH: /tmp/datalad-tests/virtualenv/bin:/home/nela/bin:/usr/share/Modules/bin:/home/nela/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/nela/bin
## extensions 
  - container: 
    - description: Containerized environments
    - entrypoints: 
      - datalad_container.containers_add.ContainersAdd: 
        - class: ContainersAdd
        - load_error: None
        - module: datalad_container.containers_add
        - names: 
          - containers-add
          - containers_add
      - datalad_container.containers_list.ContainersList: 
        - class: ContainersList
        - load_error: None
        - module: datalad_container.containers_list
        - names: 
          - containers-list
          - containers_list
      - datalad_container.containers_remove.ContainersRemove: 
        - class: ContainersRemove
        - load_error: None
        - module: datalad_container.containers_remove
        - names: 
          - containers-remove
          - containers_remove
      - datalad_container.containers_run.ContainersRun: 
        - class: ContainersRun
        - load_error: None
        - module: datalad_container.containers_run
        - names: 
          - containers-run
          - containers_run
    - load_error: None
    - module: datalad_container
    - version: 1.1.2
  - hirni: 
    - description: HIRNI workflows
    - entrypoints: 
      - datalad_hirni.commands.dicom2spec.Dicom2Spec: 
        - class: Dicom2Spec
        - load_error: None
        - module: datalad_hirni.commands.dicom2spec
        - names: 
          - hirni-dicom2spec
          - hirni_dicom2spec
      - datalad_hirni.commands.import_dicoms.ImportDicoms: 
        - class: ImportDicoms
        - load_error: None
        - module: datalad_hirni.commands.import_dicoms
        - names: 
          - hirni-import-dcm
          - hirni_import_dcm
      - datalad_hirni.commands.spec2bids.Spec2Bids: 
        - class: Spec2Bids
        - load_error: None
        - module: datalad_hirni.commands.spec2bids
        - names: 
          - hirni-spec2bids
          - hirni_spec2bids
      - datalad_hirni.commands.spec4anything.Spec4Anything: 
        - class: Spec4Anything
        - load_error: None
        - module: datalad_hirni.commands.spec4anything
        - names: 
          - hirni-spec4anything
          - hirni_spec4anything
    - load_error: None
    - module: datalad_hirni
    - version: 0.0.8
  - metalad: 
    - description: DataLad semantic metadata command suite
    - entrypoints: 
      - datalad_metalad.aggregate.Aggregate: 
        - class: Aggregate
        - load_error: None
        - module: datalad_metalad.aggregate
        - names: 
          - meta-aggregate
          - meta_aggregate
      - datalad_metalad.dump.Dump: 
        - class: Dump
        - load_error: None
        - module: datalad_metalad.dump
        - names: 
          - meta-dump
          - meta_dump
      - datalad_metalad.extract.Extract: 
        - class: Extract
        - load_error: None
        - module: datalad_metalad.extract
        - names: 
          - meta-extract
          - meta_extract
    - load_error: None
    - module: datalad_metalad
    - version: 0.2.1
  - neuroimaging: 
    - description: Neuroimaging tools
    - entrypoints: 
      - datalad_neuroimaging.bids2scidata.BIDS2Scidata: 
        - class: BIDS2Scidata
        - load_error: None
        - module: datalad_neuroimaging.bids2scidata
        - names: 
          - bids2scidata
    - load_error: None
    - module: datalad_neuroimaging
    - version: 0.3.1
  - webapp: 
    - description: Generic web app support
    - entrypoints: 
      - datalad_webapp.WebApp: 
        - class: WebApp
        - load_error: None
        - module: datalad_webapp
        - names: 
          - webapp
          - webapp
    - load_error: None
    - module: datalad_webapp
    - version: 0.3
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Feeds
    - Testsuite
    - S3
    - WebDAV
  - dependency versions: 
    - aws-0.21.1
    - bloomfilter-2.0.1.0
    - cryptonite-0.25
    - DAV-1.3.4
    - feed-1.1.0.0
    - ghc-8.6.5
    - http-client-0.6.4
    - persistent-sqlite-2.9.3
    - torrent-10000.1.1
    - uuid-1.3.13
    - yesod-1.6.0.1
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
    - X*
  - local repository version: 8
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - httpalso
    - borg
    - hook
    - external
  - supported repository versions: 
    - 8
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
  - version: 8.20201129
## location 
  - path: /tmp/datalad-tests/sourcedata
  - type: dataset
## metadata_extractors 
  - annex (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.annex
    - version: None
  - audio (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.audio
    - version: None
  - bids (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.bids
    - version: None
  - datacite (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.datacite
    - version: None
  - datalad_core (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.datalad_core
    - version: None
  - datalad_rfc822 (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.datalad_rfc822
    - version: None
  - dicom (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.dicom
    - version: None
  - exif (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.exif
    - version: None
  - frictionless_datapackage (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.frictionless_datapackage
    - version: None
  - image (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.image
    - version: None
  - metalad_annex (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.annex
    - version: None
  - metalad_core (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.core
    - version: None
  - metalad_custom (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.custom
    - version: None
  - metalad_runprov (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.runprov
    - version: None
  - nidm (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.nidm
    - version: None
  - nifti1 (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.nifti1
    - version: None
  - xmp (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.xmp
    - version: None
## metadata_indexers 
## python 
  - implementation: CPython
  - version: 3.8.7
## system 
  - distribution: fedora/32
  - encoding: 
    - default: utf-8
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - max_path_length: 285
  - name: Linux
  - release: 5.11.10-100.fc32.x86_64
  - type: posix
  - version: #1 SMP Thu Mar 25 14:26:30 UTC 2021

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants