dicom2spec error due to checking for spec key "id" which does not exist #173

pvavra · 2021-03-03T14:15:15Z

when importing a tarball (can share on medusa if desired), I am getting the following error:

[ERROR ] 'id' [spec_helpers.py:get_specval:33] (KeyError)

it seems to stem from the implicit call on line:

datalad-hirni/datalad_hirni/commands/dicom2spec.py

Line 451 in c87946d

get_specval(s, "id") > get_specval(

Indeed, for all the other tarballs this error did not get triggered. I assume that some of the preceding conditions of the if clause were False for those other tarballs (tested only with one other atm) and hence that line did not get executed.

However, I do not understand what that conditional is trying to achieve. Why should any of those conditions lead to the series being ignored for the conversion in the first place?
The offending dicom series in this particular case is a SBRef image related to a functional run and should be converted.

Or is it related the sorting needing RF a few lines above?

As a quick workaround for this particular dataset, I can simpy comment out the whole for loop starting on line 436:

datalad-hirni/datalad_hirni/commands/dicom2spec.py

Lines 436 to 456 in c87946d

    
           for i in range(len(spec_series_list)): 
        
               # Note: Removed the following line from condition below, 
        
               # since it appears to be pointless. Value for 'converter' 
        
               # used to be 'heudiconv' or 'ignore' for a 'dicomseries', so 
        
               # it's not clear ATM what case this could possibly have catched: 
        
               # heuristic.has_specval(spec_series_list[i], "converter") and \ 
        
               if spec_series_list[i]["type"] == "dicomseries" and \ 
        
                   has_specval(spec_series_list[i], "bids-run") and \ 
        
                   get_specval(spec_series_list[i], "bids-run") in \ 
        
                       [get_specval(s, "bids-run") 
        
                        for s in spec_series_list[i + 1:] 
        
                        if get_specval( 
        
                               s, 
        
                               "description") == get_specval( 
        
                                   spec_series_list[i], "description") and \ 
        
                        get_specval(s, "id") > get_specval( 
        
                                                spec_series_list[i], "id") 
        
                        ]: 
        
                   lgr.debug("Ignore SeriesNumber %s for conversion" % i) 
        
                   spec_series_list[i]["tags"].append( 
        
                           'hirni-dicom-converter-ignore')

But not sure I am not breaking anything further down the line by it..

The text was updated successfully, but these errors were encountered:

bpoldrack · 2021-03-04T07:34:54Z

Hmm. Two things:

First of all, the immediate issue seems to be that the spec that is being worked over here has no id field for at least some entries. id would default to the SeriesNumber from the DICOM header fields and is used to specify order. Now, I think we've seen similar things before where there is no such entry. So, the code needs to be able to deal with that, I guess. In that sense, I consider it a bug.

Secondly, what this is trying to achieve is a guess as to what run to use when there appear to be several. If we have several series with the same description and the same previously determined run number, then this tries to check those SeriesNumber entries and mark all the apparent duplications but the last one (highest SeriesNumber) to be ignored by default. Idea is that this seems likely to be caused by aborted runs that were then redone (which is why we go for the last one). That guess might be wrong of course. In any case, you seem to have several series with the exact same Protocol field in the DICOM headers. Ultimately they need to get a specification that points to a different target in BIDS (or a choice is to be made which one to ignore for conversion). Can you tell, what's the reason for the duplication in your case, @pvavra?
I wonder whether it would be helpful to be able to disable such guessing and rather go with an initially derived specification, that we already know can't be converted without further editing. Or may be there's a better way to determine what to do with such duplications.

pvavra · 2021-03-04T08:37:08Z

Re no id:
I just re-ran the spec generation for all 30 acquisitions in my dataset and added a simple print(spec_series_list) on line 436, i.e. just before the whole for loop and save the output into a file. A ctrl-f later, I can confirm that not a single spec_series_list has the field id.
Should it maybe simply be uid instead?

In my case tarballs, the different runs are explicitly noted in the SeriesDescription (fMRI_some_more_stuff___run1 or fMRI_..._run1_...) and my custom rules are able to pick up on that and set the run field correctly. However, they didn't get triggered, I guess, because the above error halted the conversion before that. So I guess the default heuristic was not really able to pick up on this. But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

pvavra · 2021-03-04T08:44:52Z

Note that since there is not id, this sorting should also not work:

datalad-hirni/datalad_hirni/commands/dicom2spec.py

Lines 433 to 435 in c87946d

    
           spec_series_list = sorted(spec_series_list, 
        
                                     key=lambda x: get_specval(x, 'id') 
        
                                                   if 'id' in x.keys() else 0)

bpoldrack · 2021-03-04T08:45:53Z

Should it maybe simply be uid instead?

Well, as far as identifying an entry in a spec goes, that's an option. But implies no order in time, so not useful for what that piece of code is trying to achieve.

my custom rules are able to pick up on that and set the run field correctly.

Can I see them?

However, they didn't get triggered, I guess, because the above error halted the conversion before that. So I guess the default heuristic was not really able to pick up on this. But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

Okay, need to have a closer look how that can be. Having your custom rules might help, though.

pvavra · 2021-03-04T08:48:32Z

cannot attach a .py file.. so here it goes:

"""Custom rules for dicom2spec for RLAMF study."""
import re  # for extracting numbers from strings
# from pprint import pprint


class MyDICOM2SpecRules(object):  # pylint: disable-msg=r0205
    """Class holding rules for dicom2spec for RLAMF study."""

    def __init__(self, dicommetadata):
        """Store dicom metadata in internal var."""
        self._dicom_series = dicommetadata

    def __call__(self, subject=None, anon_subject=None, session=None):
        """Apply rules to set of all dicom series of an acquisition."""
        spec_dicts = []

        # Heuristic for guessing session based on present fMRI series
        if any(s['SeriesDescription'].endswith("run10") for s
               in self._dicom_series):
            session = "01"
        else:
            session = "02"
        # print("==========  SESSION  ================")
        # print(session)

        for dicom_dict in self._dicom_series:
            spec_dicts.append((self._rules(dicom_dict,
                                           subject=subject,
                                           anon_subject=anon_subject,
                                           session=session),
                               self.series_is_valid(dicom_dict)
                               ))
        return spec_dicts

    def _rules(self,  # pylint: disable-msg=r0201,r0912
               series_dict,
               subject=None,
               anon_subject=None,
               session=None):
        """Apply rules to a single series of dicoms."""
        task = None
        modality = None
        comment = ''
        run = None
        acquisition = None
        # T1 ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("t1"):
            modality = 'T1w'
            acquisition = 'mprage'

        # Fieldmap ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("gre_field_mapping"):
            if "M" in series_dict['ImageType']:
                # will infer magnitude1/magnitude2 automatically
                modality = 'magnitude'
            else:
                modality = 'phasediff'

        # IR EPI ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("IR-EPI"):
            # c.f. https://www.ncbi.nlm.nih.gov/pubmed/14635150
            modality = 'T1w'
            acquisition = 'irepi'

        # DTI ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("DTI") and \
           series_dict['SeriesDescription'].endswith("PA"):
            modality = 'dwi'
        if series_dict['SeriesDescription'].startswith("DTI") and \
           series_dict['SeriesDescription'].endswith("AP"):
            # the A >> P series does have any diffusion-weighting, and is
            # essentially only for the correction of distortions for the SE-EPI
            modality = 'epi'

        # fMRI ---------------------------------------------------
        # Session 1:
        label_task = 'fMRI_SMS2_2.2iso_66sl_TR2_run'
        if series_dict['SeriesDescription'].startswith(label_task):
            modality = 'bold'
            task = 'learning'
            # number afterwards indicates run number (range: 1 to 10)
            # Note: the remainder of the description does _not_ contain numbers
            tail = series_dict['SeriesDescription'][len(label_task):]
            run = re.findall(r'\d+', tail)[0]

        # Session 2:
        label_task = 'fMRI_run'
        if series_dict['SeriesDescription'].startswith(label_task):
            modality = 'bold'
            task = 'recognition'
            # only 1 to 4
            run = series_dict['SeriesDescription'][len(label_task)]

        # NB: the following should inherit the rest from the above task,
        # but overwrite the modality
        if series_dict['SeriesDescription'].endswith("SBRef"):
            modality = 'sbref'

        # T2w ---------------------------------------------------
        if series_dict['SeriesDescription'].endswith(
                                            "t2_tse_orth_hippo_448_p2"):
            modality = 'T2w'
        # Proton Density --------------------------------------------------
        if series_dict['SeriesDescription'].endswith(
                                            "PD_ax_acpc_hippo_448_p2"):
            modality = 'pd'

        # AAHead Scouts ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("AAHead_Scout"):
            comment = "Localizer - will be ignored for BIDS"

        return {'description': series_dict['SeriesDescription']
                if "SeriesDescription" in series_dict else '',
                'comment': comment,
                'subject': series_dict['PatientID']
                if not subject else subject,
                'anon-subject': anon_subject if anon_subject else None,
                'bids-session': session if session else None,
                'bids-modality': modality,
                'bids-task': task,
                'bids-run': run,
                'bids-acquisition': acquisition
                }

    def series_is_valid(self, series_dict):  # pylint: disable-msg=r0201
        """
        Set whether a series should be converted to bids at all.

        For all series which this returns `false`, the hirni-spec2bids will
        skip the conversion to niftis.
        """
        # Skip all localizers & report
        if series_dict['SeriesDescription'].startswith("AAHead_Scout") or \
           series_dict['SeriesDescription'] == "PhoenixZIPReport":
            valid = False
        else:
            valid = True

        return valid


__datalad_hirni_rules = MyDICOM2SpecRules  # pylint: disable-msg=C0103

bpoldrack · 2021-03-04T08:51:07Z

Thanks! Will look into that more closely.

bpoldrack · 2021-03-10T11:03:39Z

my custom rules are able to pick up on that and set the run field correctly. However, they didn't get triggered, I guess, because the above error halted the conversion before that

No, this piece of code is executed after the rules (which it probably shouldn't, but that's another issue). If your rules really weren't triggered, they are prob. not correctly configured. The dataset should have a config (or several in correct order) datalad.hirni.dicom2spec.rules with a value that points to the file you pasted above. So, if you do that outside of a datalad python context, that would be: git config -f .datalad/config --add datalad.hirni.dicom2spec.rules <path to your file>. Can you verify such a config exists in your dataset, @pvavra? Generally, it could also be in the dataset's .git/config or your ~/-gitconfig, of course. I would just suggest that the committed .datalad/config is best for provenance tracking. That way, the spec building is done on top of a commit, that actually contains the relevant config.

Note that since there is not id, this sorting should also not work:

This sorting "works" because it assigns a sorting key of 0 if there's no id, so doesn't change the order of snippets w/o an id and puts them before all snippets with an id.

bpoldrack · 2021-03-10T11:07:01Z

Hah, hold on. Looking for another reason why your rules might not have been executed, I noticed hirni didn't catch up on a change in datalad core. With a current datalad installed, it would indeed fail to get your rule, if there are multiple rules configured. Is that the case, @pvavra?

Note to myself: ConfigManager.get requires get_all=True!

bpoldrack · 2021-03-10T11:09:54Z

But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

Assuming that your rules weren't picked up, in my estimation it's simply the only one where default rules would derive the same run from the protocol field for several series. Could that be true?

A change in behavior of datalad's ConfigManager.get method, requires the use of get_all parameter in order to get all configured values. Adaption to that change missed this usage in dicom2spec as discovered in issue psychoinformatics-degh-173.

Don't apply an id based decision regarding duplicate runs if there's no id to be found in the spec snippets. See psychoinformatics-degh-173

bpoldrack · 2021-03-10T12:18:45Z

As a quick workaround for this particular dataset, I can simpy comment out the whole for loop starting on line 436:
...
But not sure I am not breaking anything further down the line by it..

No, nothing really relies on it so you don't break anything. If you end up with a specification containing duplicates, however, conversion based on it might go wrong.

In any case, if you could retry with the branch of PR #174 to confirm this issue doesn't get triggered, that would be great.
Apart from the already pushed changes, I'm looking into a command switch to turn that part off.

A change in behavior of datalad's ConfigManager.get method, requires the use of get_all parameter in order to get all configured values. Adaption to that change missed this usage in dicom2spec as discovered in issue psychoinformatics-degh-173.

Don't apply an id based decision regarding duplicate runs if there's no id to be found in the spec snippets. See psychoinformatics-degh-173

bpoldrack · 2021-03-18T10:28:54Z

@pvavra, can you confirm PR #174 works for you?

Regarding this:

Apart from the already pushed changes, I'm looking into a command switch to turn that part off.

I'll take that back. I think there should be a serious refactoring instead, that would turn that part into a rule that is used by default (and can be used in a custom setup, too, if so desired).

pvavra · 2021-03-25T21:41:49Z

I'll give the PR a try, but am not sure what the simplest way for me to test it is... Create a dedicated venv? if yes, how do I install that pull-request? I'm asking since I hope to be testing more PRs in the near future ;) So having a good workflow would be handy..

bpoldrack · 2021-04-07T06:35:35Z

@pvavra Sorry for answering a little late - just back from vacation.

I'd go for a dedicated venv, yes. You can then install the PR (or any particular branch/tag) like this: pip install git+https://github.com/bpoldrack/datalad-hirni@fix-173

bpoldrack · 2021-04-07T07:03:39Z

If you want to work on PRs yourself anyway, there's another approach that I usually use, @pvavra.
I have a local clone (with the relevant forks of other contributors as remotes) and a dev venv, where my local clone is editable installed: pip install -e /local/path. That way the venv is using whatever is currently checked out at that path and I adjust the checkout I work on or want to test rather than the venv.

bpoldrack added the bug Something isn't working label Mar 4, 2021

bpoldrack added a commit to bpoldrack/datalad-hirni that referenced this issue Mar 10, 2021

BF: Don't try to use non existent id field

4482f85

Don't apply an id based decision regarding duplicate runs if there's no id to be found in the spec snippets. See psychoinformatics-degh-173

bpoldrack added a commit to bpoldrack/datalad-hirni that referenced this issue Mar 18, 2021

BF: Don't try to use non existent id field

e9564b8

Don't apply an id based decision regarding duplicate runs if there's no id to be found in the spec snippets. See psychoinformatics-degh-173

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dicom2spec error due to checking for spec key "id" which does not exist #173

dicom2spec error due to checking for spec key "id" which does not exist #173

pvavra commented Mar 3, 2021

bpoldrack commented Mar 4, 2021

pvavra commented Mar 4, 2021

pvavra commented Mar 4, 2021

bpoldrack commented Mar 4, 2021

pvavra commented Mar 4, 2021

bpoldrack commented Mar 4, 2021

bpoldrack commented Mar 10, 2021

bpoldrack commented Mar 10, 2021 •

edited

Loading

bpoldrack commented Mar 10, 2021

bpoldrack commented Mar 10, 2021

bpoldrack commented Mar 18, 2021

pvavra commented Mar 25, 2021

bpoldrack commented Apr 7, 2021

bpoldrack commented Apr 7, 2021 •

edited

Loading

dicom2spec error due to checking for spec key "id" which does not exist #173

dicom2spec error due to checking for spec key "id" which does not exist #173

Comments

pvavra commented Mar 3, 2021

bpoldrack commented Mar 4, 2021

pvavra commented Mar 4, 2021

pvavra commented Mar 4, 2021

bpoldrack commented Mar 4, 2021

pvavra commented Mar 4, 2021

bpoldrack commented Mar 4, 2021

bpoldrack commented Mar 10, 2021

bpoldrack commented Mar 10, 2021 • edited Loading

bpoldrack commented Mar 10, 2021

bpoldrack commented Mar 10, 2021

bpoldrack commented Mar 18, 2021

pvavra commented Mar 25, 2021

bpoldrack commented Apr 7, 2021

bpoldrack commented Apr 7, 2021 • edited Loading

bpoldrack commented Mar 10, 2021 •

edited

Loading

bpoldrack commented Apr 7, 2021 •

edited

Loading