Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

visualizer.py fails on chained MEG processing pipeline [update: solved] #120

Closed
robertoostenveld opened this issue Sep 14, 2023 · 14 comments

Comments

@robertoostenveld
Copy link

Following bids-standard/bep021#8 (comment) I am trying to construct a .prov.jsonld file for an example chained analysis pipeline of MEG data.

I cannot get the visualizer.py to work on my example, neither on the one that we hand-crafted in Kopenhagen. Both return an empty png figure. @cmaumet can you spot what is wrong?

{
	"BIDSProvVersion":"dev",
	"@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
	"records":{
		"prov:Agent":{
			"@id":"ed00edd9-eb1a-4f0c-9e84-16aa7e2d63e9",
			"label":"FieldTrip",
			"version":"0962f5dce"
		},
		"prov:Activity":[
			{
				"@id":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a",
				"label":"preproc",
				"used":"bids_raw:///sub-01/meg/sub-01_task-language_meg.ds"
			},
			{
				"@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1",
				"label":"avg",
				"used":"bids_derivative:///sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"8eadc657-b256-4e76-ba36-6d3069c4d92e",
				"label":"avg",
				"used":"bids_derivative:///sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"abd1c016-188a-4837-8101-4aa28638dfc3",
				"label":"avg",
				"used":"bids_derivative:///sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"4af2120c-e484-41b0-8d9d-da63990bf17c",
				"label":"combined",
				"used":"bids_derivative:///sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds"
			},
			{
				"@id":"103d21fd-7623-4650-8926-72f527d08c05",
				"label":"combined",
				"used":"bids_derivative:///sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds"
			},
			{
				"@id":"d3730058-518f-4793-bf48-bfba5c7f9d22",
				"label":"combined",
				"used":"bids_derivative:///sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds"
			}
		],
		"prov:Entity":[
			{
				"@id":"bids_derivative/sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
				"label":"data",
				"generatedBy":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a"
			},
			{
				"@id":"bids_derivative/sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds",
				"label":"timelockFC",
				"generatedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
			},
			{
				"@id":"bids_derivative/sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds",
				"label":"timelockIC",
				"generatedBy":"8eadc657-b256-4e76-ba36-6d3069c4d92e"
			},
			{
				"@id":"bids_derivative/sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds",
				"label":"timelockFIC",
				"generatedBy":"abd1c016-188a-4837-8101-4aa28638dfc3"
			},
			{
				"@id":"bids_derivative/sub-01/meg/sub-01_task-languageFC_desc-combined_meg.ds",
				"label":"combinedFC",
				"generatedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
			},
			{
				"@id":"bids_derivative/sub-01/meg/sub-01_task-languageIC_desc-combined_meg.ds",
				"label":"combinedIC",
				"generatedBy":"103d21fd-7623-4650-8926-72f527d08c05"
			},
			{
				"@id":"bids_derivative/sub-01/meg/sub-01_task-languageFIC_desc-combined_meg.ds",
				"label":"combinedFIC",
				"generatedBy":"d3730058-518f-4793-bf48-bfba5c7f9d22"
			}
		]
	}
}
@robertoostenveld
Copy link
Author

robertoostenveld commented Sep 14, 2023

Note that the processing looks something like this

digraph {
  preproc [ color=red, shape=square ]
  avg [ color=red, shape=square ]
  combined [ color=red, shape=square ]
  rawdata -> preproc -> data
  data -> avg -> timelockFC -> combined -> combinedFC
  data -> avg -> timelockIC -> combined -> combinedIC
  data -> avg -> timelockFIC -> combined -> combinedFIC
}

combined

where "preproc", "avg" and "combined" are processing steps, and all others are data.

The "avg" (computing the ERFs) is repeated 3 times for different selection of trials (corresponding to conditions), and consequently the "combined" step is also executed three times (for the three conditions).

@cmaumet
Copy link
Collaborator

cmaumet commented Sep 19, 2023

@robertoostenveld: Thanks for sharing those!

I think the problem is probably related to recent update to the library to better reflect key names defined in the spec document (mainly changes in case):

  • records --> Records
  • prov:Agent --> Software
  • label --> Label
  • version --> Version
  • prov:Activity --> Activities
  • used --> Used
  • prov:Entity --> Entities
  • generatedBy --> GeneratedBy

In case I missed some, all the keys are defined in this context file (now harmonized with what is in the BIDS-Prov spec).

A revised version of the minimal example we had worked on in Copenhagen is here: https://github.com/bids-standard/BEP028_BIDSprov/blob/master/examples/simple_example/simple_example.prov.jsonld

Does this help?

@robertoostenveld
Copy link
Author

Also the example does not result in a figure with python -m visualize --input_file ./simple_example.prov.jsonld

@cmaumet
Copy link
Collaborator

cmaumet commented Sep 19, 2023

[Edit: redirect is now in place]

Thanks for spotting that and sorry! There was an issue in the PURL redirection https://purl.org/nidash/bidsprov/context.json that I've just fixed. Now the PURL context link should redirect to https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json and solve the issue.

On my machine python -m visualize --input_file ../examples/simple_example/simple_example.prov.jsonld --output_file here.png returns:

here

@cmaumet
Copy link
Collaborator

cmaumet commented Sep 19, 2023

Updating your example to (update key names and paths in bids URIs):

{
	"BIDSProvVersion":"dev",
	"@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
	"Records":{
		"Software":{
			"@id":"ed00edd9-eb1a-4f0c-9e84-16aa7e2d63e9",
			"Label":"FieldTrip",
			"Version":"0962f5dce"
		},
		"Activities":[
			{
				"@id":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a",
				"Label":"preproc",
				"Used":"bids_raw::sub-01/meg/sub-01_task-language_meg.ds"
			},
			{
				"@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1",
				"Label":"avg",
				"Used":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"8eadc657-b256-4e76-ba36-6d3069c4d92e",
				"Label":"avg",
				"Used":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"abd1c016-188a-4837-8101-4aa28638dfc3",
				"Label":"avg",
				"Used":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"4af2120c-e484-41b0-8d9d-da63990bf17c",
				"Label":"combined",
				"Used":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds"
			},
			{
				"@id":"103d21fd-7623-4650-8926-72f527d08c05",
				"Label":"combined",
				"Used":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds"
			},
			{
				"@id":"d3730058-518f-4793-bf48-bfba5c7f9d22",
				"Label":"combined",
				"Used":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds"
			}
		],
		"Entities":[
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
				"Label":"data",
				"GeneratedBy":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds",
				"Label":"timelockFC",
				"GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds",
				"Label":"timelockIC",
				"GeneratedBy":"8eadc657-b256-4e76-ba36-6d3069c4d92e"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds",
				"Label":"timelockFIC",
				"GeneratedBy":"abd1c016-188a-4837-8101-4aa28638dfc3"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-combined_meg.ds",
				"Label":"combinedFC",
				"GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-combined_meg.ds",
				"Label":"combinedIC",
				"GeneratedBy":"103d21fd-7623-4650-8926-72f527d08c05"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-combined_meg.ds",
				"Label":"combinedFIC",
				"GeneratedBy":"d3730058-518f-4793-bf48-bfba5c7f9d22"
			}
		]
	}
}

The vizualize function gives us:
here

Compared to your example there are 3 "combine" activities.

@cmaumet
Copy link
Collaborator

cmaumet commented Sep 19, 2023

Removing duplicate activities (this is closer to what we have above in your diagram but maybe not what we want as provenance):

{
	"BIDSProvVersion":"dev",
	"@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
	"Records":{
		"Software":{
			"@id":"ed00edd9-eb1a-4f0c-9e84-16aa7e2d63e9",
			"Label":"FieldTrip",
			"Version":"0962f5dce"
		},
		"Activities":[
			{
				"@id":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a",
				"Label":"preproc",
				"Used":"bids_raw::sub-01/meg/sub-01_task-language_meg.ds"
			},
			{
				"@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1",
				"Label":"avg",
				"Used":["bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
				"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
				"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"]
			},
			{
				"@id":"4af2120c-e484-41b0-8d9d-da63990bf17c",
				"Label":"combined",
				"Used": [
				"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds", "bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds", "bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds"]
			}
		],
		"Entities":[
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
				"Label":"data",
				"GeneratedBy":"d0a40395-0d97-4366-9d1e-30b2d5c1a17a"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-avg_meg.ds",
				"Label":"timelockFC",
				"GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-avg_meg.ds",
				"Label":"timelockIC",
				"GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-avg_meg.ds",
				"Label":"timelockFIC",
				"GeneratedBy":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFC_desc-combined_meg.ds",
				"Label":"combinedFC",
				"GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageIC_desc-combined_meg.ds",
				"Label":"combinedIC",
				"GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
			},
			{
				"@id":"bids_derivative::sub-01/meg/sub-01_task-languageFIC_desc-combined_meg.ds",
				"Label":"combinedFIC",
				"GeneratedBy":"4af2120c-e484-41b0-8d9d-da63990bf17c"
			}
		]
	}
}

here

@arnodelorme
Copy link

arnodelorme commented Sep 19, 2023

Following up on our discussion. Is the id, for example, "@id":"bb67ad88-e56c-4c37-ba0c-b6b9e47daff1", supposed to be common across BIDS datasets? Would it be possible to have simpler IDs? How are the IDs generated? For example, could we have "preproc-fieldtrip-0962f5dce." It would make the text file much easier to read by the common mortal.

Also, are the parameters to the functions called timelockFIC defined somewhere (even if it is not in a standard format).

@cmaumet
Copy link
Collaborator

cmaumet commented Sep 20, 2023

Hi @arnodelorme!

Yes you can have simpler ids (but they still have to be URIs), I'm pasting here the BIDS-Prov spec:


@id: REQUIRED. A unique identifier like a UUID that will be used to associate activities with this software (e.g., urn:1264-1233-11231-12312, "urn:bet-o1ef4rt"


So what you are proposing as an id could work, possibly prepending a prefix urn:

For the parameters I would recommend adding a Command, similar to here.

Pingging: @satra in case he'd like to chime in!

@satra
Copy link
Collaborator

satra commented Sep 20, 2023

@arnodelorme - in addition to what camille said, the Label is what common mortals should use for reading, the id is what computers would use to determine uniqueness of that node in a graph. also Label does not have to be a single word. it is a short name.

@robertoostenveld
Copy link
Author

Hi @arnodelorme, the example I constructed and shared above was using random identifiers generated in MATLAB with char(java.util.UUID.randomUUID.toString).

What I understand from the FSL example, for example when searching for fslmaths, is that repeated calls to the same function (or application) should have unique IDs. So my activity label should have been ft_timelockanalysis rather than avg, but the ID would remain as random as it is now.

After iterating a few more times, and also realizing that I want each activity (= function call) to be represented separately, I now have this:

{
	"BIDSProvVersion":"dev",
	"Records":{
		"Software":{
			"@id":"158ce020-01fe-4d8f-9e0d-de8c2cd711d0",
			"Label":"FieldTrip",
			"Version":"0aa4735f6"
		},
		"Activities":[
			{
				"@id":"d57adc29-f990-40db-a6d8-7cd29033f175",
				"Label":"ft_preprocessing",
				"Used":"bids_raw/sub-01/meg/sub-01_task-language_meg.ds"
			},
			{
				"@id":"d5a6423d-4dd0-467f-9b5a-85d1c54feede",
				"Label":"ft_timelockanalysis",
				"Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"073b31e3-f7b8-4121-aae8-b0639ae32619",
				"Label":"ft_timelockanalysis",
				"Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"8d5bac53-6891-4127-a627-6ab8646c4ce1",
				"Label":"ft_timelockanalysis",
				"Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds"
			},
			{
				"@id":"e1410a6f-4868-4085-9a8b-b44e17d552af",
				"Label":"ft_megplanar+ft_combineplanar",
				"Used":"sub-01/meg/sub-01_task-language_desc-timelockFC_meg.ds"
			},
			{
				"@id":"8adcb54d-f70a-4ff1-91b7-67b0c0cd70d6",
				"Label":"ft_megplanar+ft_combineplanar",
				"Used":"sub-01/meg/sub-01_task-language_desc-timelockIC_meg.ds"
			},
			{
				"@id":"38281c42-8051-4f26-bb31-03f05ff6d716",
				"Label":"ft_megplanar+ft_combineplanar",
				"Used":"sub-01/meg/sub-01_task-language_desc-timelockFIC_meg.ds"
			}
		],
		"Entities":[
			{
				"@id":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
				"Label":"data",
				"GeneratedBy":"d57adc29-f990-40db-a6d8-7cd29033f175"
			},
			{
				"@id":"sub-01/meg/sub-01_task-language_desc-timelockFC_meg.ds",
				"Label":"timelockFC",
				"GeneratedBy":"d5a6423d-4dd0-467f-9b5a-85d1c54feede"
			},
			{
				"@id":"sub-01/meg/sub-01_task-language_desc-timelockIC_meg.ds",
				"Label":"timelockIC",
				"GeneratedBy":"073b31e3-f7b8-4121-aae8-b0639ae32619"
			},
			{
				"@id":"sub-01/meg/sub-01_task-language_desc-timelockFIC_meg.ds",
				"Label":"timelockFIC",
				"GeneratedBy":"8d5bac53-6891-4127-a627-6ab8646c4ce1"
			},
			{
				"@id":"sub-01/meg/sub-01_task-language_desc-combinedFC_meg.ds",
				"Label":"combinedFC",
				"GeneratedBy":"e1410a6f-4868-4085-9a8b-b44e17d552af"
			},
			{
				"@id":"sub-01/meg/sub-01_task-language_desc-combinedIC_meg.ds",
				"Label":"combinedIC",
				"GeneratedBy":"8adcb54d-f70a-4ff1-91b7-67b0c0cd70d6"
			},
			{
				"@id":"sub-01/meg/sub-01_task-language_desc-combinedFIC_meg.ds",
				"Label":"combinedFIC",
				"GeneratedBy":"38281c42-8051-4f26-bb31-03f05ff6d716"
			}
		]
	},
	"@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json"
}

which renders like this

res

The most relevant changes are that I give the activities the FieldTrip function as their label, and the entities get the MATLAB variable name as the label. So the MATLAB data structures that I have in memory after running this for a single subject are data, timelockFC, timelockIC, timelockFIC, combinedFC, combinedIC, and combinedFIC, and each of these is also represented as CTF dataset (.ds) on disk.

What I don't yet have figured out is how to add "Command" details, which in FieldTrip would be the input cfg structure.

What I also have not yet figured out is how to deal with steps that are not represented on disk. In the case above that is the output of ft_megplanar, which is followed by ft_combineplanar. In the case of ICA cleaning it could also be ft_componentanalysis, followed by ft_rejectcomponent. The back-projected clean raw data could be written to disk according to BEP021, but the unmixed data (component topos and timeseries) not. I think this also shows where (i)EEG and MEG diverge from (f)MRI analyses, which mostly takes nifti files as input and spits out other nifti files. The data structures of (i)EEG/MEG analyses are not so consistent.

I'll keep working on it for a bit longer, to see whether I can make further progress in line with this BEP028 and BEP021.

@arnodelorme
Copy link

Thank you Robert. Yes, this is a step in the right direction. For the IDs of the activities, could you use something like

			{
				"@id":"code/fieldtrip_reference_functions/ft_timelockanalysis.m",
				"Label":"ft_timelockanalysis",
				"Used":"sub-01/meg/sub-01_task-language_desc-timelockFIC_meg.ds"
			}

@robertoostenveld
Copy link
Author

Hi Arno, I understand that IDs have to be unique (i.e., separate nodes in the graph), so two calls to ft_timelockanalysis should have different IDs. It would be possible to do something like this

{
  "@context":"https://raw.githubusercontent.com/bids-standard/BEP028_BIDSprov/master/context.json",
  "BIDSProvVersion":"dev",
  "Records":{
    "Software":[
      {
        "Id":"683ef168-914c-4c2e-a957-885c15cf2636",
        "RRID":"RRID:SCR_004849",
        "Label":"FieldTrip",
        "Type":"Software",
        "Version":"0aa4735f6"
      }
   ],
    "Activities":[
      {
        "Id":"ft_timelockanalysis-7e50855c",
        "Label":"ft_timelockanalysis",
        "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
        "AssociatedWith":"683ef168-914c-4c2e-a957-885c15cf2636"
      },
      {
        "Id":"ft_timelockanalysis-d1da5e67",
        "Label":"ft_timelockanalysis",
        "Used":"sub-01/meg/sub-01_task-language_desc-preproc_meg.ds",
        "AssociatedWith":"683ef168-914c-4c2e-a957-885c15cf2636"
      },
      ...

but I don't see an advantage of this over having fully random UUIDs. I think it is better to fully ignore the IDs when you look at it, unless you use them to link things together (as with the lines in the graph). Oh, and note that the ID also does not have to have a reference to the software package, as that is something you can do with the AssociatedWith reference.

PS I noticed that @id is not needed, it can also be Id. That makes writing the json easier. Have a look at my code_v3 directory, which has a writeprov.m function, and two functions that help in constructing the provenance as MATLAB structure.

@robertoostenveld robertoostenveld changed the title visualizer.py fails on chained MEG processing pipeline visualizer.py fails on chained MEG processing pipeline [update: solved] Sep 23, 2023
@arnodelorme
Copy link

arnodelorme commented Sep 23, 2023 via email

@robertoostenveld
Copy link
Author

Besides the BIDS specific example script whose results are being discussed at bids-standard/bep021#8, I have now also added support for the prov.jsonld output to ft_analysispipeline with fieldtrip/fieldtrip@93d3792.

I think there is nothing any more to be done on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants