Skip to content
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.

dicom2spec error due to checking for spec key "id" which does not exist #173

Open
pvavra opened this issue Mar 3, 2021 · 14 comments
Open
Labels
bug Something isn't working

Comments

@pvavra
Copy link

pvavra commented Mar 3, 2021

when importing a tarball (can share on medusa if desired), I am getting the following error:

[ERROR ] 'id' [spec_helpers.py:get_specval:33] (KeyError)

it seems to stem from the implicit call on line:

get_specval(s, "id") > get_specval(

Indeed, for all the other tarballs this error did not get triggered. I assume that some of the preceding conditions of the if clause were False for those other tarballs (tested only with one other atm) and hence that line did not get executed.

However, I do not understand what that conditional is trying to achieve. Why should any of those conditions lead to the series being ignored for the conversion in the first place?
The offending dicom series in this particular case is a SBRef image related to a functional run and should be converted.

Or is it related the sorting needing RF a few lines above?

As a quick workaround for this particular dataset, I can simpy comment out the whole for loop starting on line 436:

for i in range(len(spec_series_list)):
# Note: Removed the following line from condition below,
# since it appears to be pointless. Value for 'converter'
# used to be 'heudiconv' or 'ignore' for a 'dicomseries', so
# it's not clear ATM what case this could possibly have catched:
# heuristic.has_specval(spec_series_list[i], "converter") and \
if spec_series_list[i]["type"] == "dicomseries" and \
has_specval(spec_series_list[i], "bids-run") and \
get_specval(spec_series_list[i], "bids-run") in \
[get_specval(s, "bids-run")
for s in spec_series_list[i + 1:]
if get_specval(
s,
"description") == get_specval(
spec_series_list[i], "description") and \
get_specval(s, "id") > get_specval(
spec_series_list[i], "id")
]:
lgr.debug("Ignore SeriesNumber %s for conversion" % i)
spec_series_list[i]["tags"].append(
'hirni-dicom-converter-ignore')

But not sure I am not breaking anything further down the line by it..

@bpoldrack
Copy link

Hmm. Two things:

First of all, the immediate issue seems to be that the spec that is being worked over here has no id field for at least some entries. id would default to the SeriesNumber from the DICOM header fields and is used to specify order. Now, I think we've seen similar things before where there is no such entry. So, the code needs to be able to deal with that, I guess. In that sense, I consider it a bug.

Secondly, what this is trying to achieve is a guess as to what run to use when there appear to be several. If we have several series with the same description and the same previously determined run number, then this tries to check those SeriesNumber entries and mark all the apparent duplications but the last one (highest SeriesNumber) to be ignored by default. Idea is that this seems likely to be caused by aborted runs that were then redone (which is why we go for the last one). That guess might be wrong of course. In any case, you seem to have several series with the exact same Protocol field in the DICOM headers. Ultimately they need to get a specification that points to a different target in BIDS (or a choice is to be made which one to ignore for conversion). Can you tell, what's the reason for the duplication in your case, @pvavra?
I wonder whether it would be helpful to be able to disable such guessing and rather go with an initially derived specification, that we already know can't be converted without further editing. Or may be there's a better way to determine what to do with such duplications.

@bpoldrack bpoldrack added the bug Something isn't working label Mar 4, 2021
@pvavra
Copy link
Author

pvavra commented Mar 4, 2021

Re no id:
I just re-ran the spec generation for all 30 acquisitions in my dataset and added a simple print(spec_series_list) on line 436, i.e. just before the whole for loop and save the output into a file. A ctrl-f later, I can confirm that not a single spec_series_list has the field id.
Should it maybe simply be uid instead?

In my case tarballs, the different runs are explicitly noted in the SeriesDescription (fMRI_some_more_stuff___run1 or fMRI_..._run1_...) and my custom rules are able to pick up on that and set the run field correctly. However, they didn't get triggered, I guess, because the above error halted the conversion before that. So I guess the default heuristic was not really able to pick up on this. But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

@pvavra
Copy link
Author

pvavra commented Mar 4, 2021

Note that since there is not id, this sorting should also not work:

spec_series_list = sorted(spec_series_list,
key=lambda x: get_specval(x, 'id')
if 'id' in x.keys() else 0)

@bpoldrack
Copy link

Should it maybe simply be uid instead?

Well, as far as identifying an entry in a spec goes, that's an option. But implies no order in time, so not useful for what that piece of code is trying to achieve.

my custom rules are able to pick up on that and set the run field correctly.

Can I see them?

However, they didn't get triggered, I guess, because the above error halted the conversion before that. So I guess the default heuristic was not really able to pick up on this. But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

Okay, need to have a closer look how that can be. Having your custom rules might help, though.

@pvavra
Copy link
Author

pvavra commented Mar 4, 2021

cannot attach a .py file.. so here it goes:

"""Custom rules for dicom2spec for RLAMF study."""
import re  # for extracting numbers from strings
# from pprint import pprint


class MyDICOM2SpecRules(object):  # pylint: disable-msg=r0205
    """Class holding rules for dicom2spec for RLAMF study."""

    def __init__(self, dicommetadata):
        """Store dicom metadata in internal var."""
        self._dicom_series = dicommetadata

    def __call__(self, subject=None, anon_subject=None, session=None):
        """Apply rules to set of all dicom series of an acquisition."""
        spec_dicts = []

        # Heuristic for guessing session based on present fMRI series
        if any(s['SeriesDescription'].endswith("run10") for s
               in self._dicom_series):
            session = "01"
        else:
            session = "02"
        # print("==========  SESSION  ================")
        # print(session)

        for dicom_dict in self._dicom_series:
            spec_dicts.append((self._rules(dicom_dict,
                                           subject=subject,
                                           anon_subject=anon_subject,
                                           session=session),
                               self.series_is_valid(dicom_dict)
                               ))
        return spec_dicts

    def _rules(self,  # pylint: disable-msg=r0201,r0912
               series_dict,
               subject=None,
               anon_subject=None,
               session=None):
        """Apply rules to a single series of dicoms."""
        task = None
        modality = None
        comment = ''
        run = None
        acquisition = None
        # T1 ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("t1"):
            modality = 'T1w'
            acquisition = 'mprage'

        # Fieldmap ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("gre_field_mapping"):
            if "M" in series_dict['ImageType']:
                # will infer magnitude1/magnitude2 automatically
                modality = 'magnitude'
            else:
                modality = 'phasediff'

        # IR EPI ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("IR-EPI"):
            # c.f. https://www.ncbi.nlm.nih.gov/pubmed/14635150
            modality = 'T1w'
            acquisition = 'irepi'

        # DTI ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("DTI") and \
           series_dict['SeriesDescription'].endswith("PA"):
            modality = 'dwi'
        if series_dict['SeriesDescription'].startswith("DTI") and \
           series_dict['SeriesDescription'].endswith("AP"):
            # the A >> P series does have any diffusion-weighting, and is
            # essentially only for the correction of distortions for the SE-EPI
            modality = 'epi'

        # fMRI ---------------------------------------------------
        # Session 1:
        label_task = 'fMRI_SMS2_2.2iso_66sl_TR2_run'
        if series_dict['SeriesDescription'].startswith(label_task):
            modality = 'bold'
            task = 'learning'
            # number afterwards indicates run number (range: 1 to 10)
            # Note: the remainder of the description does _not_ contain numbers
            tail = series_dict['SeriesDescription'][len(label_task):]
            run = re.findall(r'\d+', tail)[0]

        # Session 2:
        label_task = 'fMRI_run'
        if series_dict['SeriesDescription'].startswith(label_task):
            modality = 'bold'
            task = 'recognition'
            # only 1 to 4
            run = series_dict['SeriesDescription'][len(label_task)]

        # NB: the following should inherit the rest from the above task,
        # but overwrite the modality
        if series_dict['SeriesDescription'].endswith("SBRef"):
            modality = 'sbref'

        # T2w ---------------------------------------------------
        if series_dict['SeriesDescription'].endswith(
                                            "t2_tse_orth_hippo_448_p2"):
            modality = 'T2w'
        # Proton Density --------------------------------------------------
        if series_dict['SeriesDescription'].endswith(
                                            "PD_ax_acpc_hippo_448_p2"):
            modality = 'pd'

        # AAHead Scouts ---------------------------------------------------
        if series_dict['SeriesDescription'].startswith("AAHead_Scout"):
            comment = "Localizer - will be ignored for BIDS"

        return {'description': series_dict['SeriesDescription']
                if "SeriesDescription" in series_dict else '',
                'comment': comment,
                'subject': series_dict['PatientID']
                if not subject else subject,
                'anon-subject': anon_subject if anon_subject else None,
                'bids-session': session if session else None,
                'bids-modality': modality,
                'bids-task': task,
                'bids-run': run,
                'bids-acquisition': acquisition
                }

    def series_is_valid(self, series_dict):  # pylint: disable-msg=r0201
        """
        Set whether a series should be converted to bids at all.

        For all series which this returns `false`, the hirni-spec2bids will
        skip the conversion to niftis.
        """
        # Skip all localizers & report
        if series_dict['SeriesDescription'].startswith("AAHead_Scout") or \
           series_dict['SeriesDescription'] == "PhoenixZIPReport":
            valid = False
        else:
            valid = True

        return valid


__datalad_hirni_rules = MyDICOM2SpecRules  # pylint: disable-msg=C0103

@bpoldrack
Copy link

Thanks! Will look into that more closely.

@bpoldrack
Copy link

my custom rules are able to pick up on that and set the run field correctly. However, they didn't get triggered, I guess, because the above error halted the conversion before that

No, this piece of code is executed after the rules (which it probably shouldn't, but that's another issue). If your rules really weren't triggered, they are prob. not correctly configured. The dataset should have a config (or several in correct order) datalad.hirni.dicom2spec.rules with a value that points to the file you pasted above. So, if you do that outside of a datalad python context, that would be: git config -f .datalad/config --add datalad.hirni.dicom2spec.rules <path to your file>. Can you verify such a config exists in your dataset, @pvavra? Generally, it could also be in the dataset's .git/config or your ~/-gitconfig, of course. I would just suggest that the committed .datalad/config is best for provenance tracking. That way, the spec building is done on top of a commit, that actually contains the relevant config.

Note that since there is not id, this sorting should also not work:

This sorting "works" because it assigns a sorting key of 0 if there's no id, so doesn't change the order of snippets w/o an id and puts them before all snippets with an id.

@bpoldrack
Copy link

bpoldrack commented Mar 10, 2021

Hah, hold on. Looking for another reason why your rules might not have been executed, I noticed hirni didn't catch up on a change in datalad core. With a current datalad installed, it would indeed fail to get your rule, if there are multiple rules configured. Is that the case, @pvavra?

Note to myself: ConfigManager.get requires get_all=True!

@bpoldrack
Copy link

But I have no clue why that get_specval(..) got only triggered in that one tarball out of 30..

Assuming that your rules weren't picked up, in my estimation it's simply the only one where default rules would derive the same run from the protocol field for several series. Could that be true?

bpoldrack added a commit to bpoldrack/datalad-hirni that referenced this issue Mar 10, 2021
A change in behavior of datalad's ConfigManager.get method, requires the
use of get_all parameter in order to get all configured values. Adaption
to that change missed this usage in dicom2spec as discovered in issue
psychoinformatics-degh-173.
bpoldrack added a commit to bpoldrack/datalad-hirni that referenced this issue Mar 10, 2021
Don't apply an id based decision regarding duplicate runs if there's no
id to be found in the spec snippets.

See psychoinformatics-degh-173
@bpoldrack
Copy link

As a quick workaround for this particular dataset, I can simpy comment out the whole for loop starting on line 436:
...
But not sure I am not breaking anything further down the line by it..

No, nothing really relies on it so you don't break anything. If you end up with a specification containing duplicates, however, conversion based on it might go wrong.

In any case, if you could retry with the branch of PR #174 to confirm this issue doesn't get triggered, that would be great.
Apart from the already pushed changes, I'm looking into a command switch to turn that part off.

bpoldrack added a commit to bpoldrack/datalad-hirni that referenced this issue Mar 18, 2021
A change in behavior of datalad's ConfigManager.get method, requires the
use of get_all parameter in order to get all configured values. Adaption
to that change missed this usage in dicom2spec as discovered in issue
psychoinformatics-degh-173.
bpoldrack added a commit to bpoldrack/datalad-hirni that referenced this issue Mar 18, 2021
Don't apply an id based decision regarding duplicate runs if there's no
id to be found in the spec snippets.

See psychoinformatics-degh-173
@bpoldrack
Copy link

@pvavra, can you confirm PR #174 works for you?

Regarding this:

Apart from the already pushed changes, I'm looking into a command switch to turn that part off.

I'll take that back. I think there should be a serious refactoring instead, that would turn that part into a rule that is used by default (and can be used in a custom setup, too, if so desired).

@pvavra
Copy link
Author

pvavra commented Mar 25, 2021

I'll give the PR a try, but am not sure what the simplest way for me to test it is... Create a dedicated venv? if yes, how do I install that pull-request? I'm asking since I hope to be testing more PRs in the near future ;) So having a good workflow would be handy..

@bpoldrack
Copy link

@pvavra Sorry for answering a little late - just back from vacation.

I'd go for a dedicated venv, yes. You can then install the PR (or any particular branch/tag) like this: pip install git+https://github.com/bpoldrack/datalad-hirni@fix-173

@bpoldrack
Copy link

bpoldrack commented Apr 7, 2021

If you want to work on PRs yourself anyway, there's another approach that I usually use, @pvavra.
I have a local clone (with the relevant forks of other contributors as remotes) and a dev venv, where my local clone is editable installed: pip install -e /local/path. That way the venv is using whatever is currently checked out at that path and I adjust the checkout I work on or want to test rather than the venv.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants