Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with keys that are unavailable in the MDF #61

Closed
jusack opened this issue Sep 15, 2023 · 7 comments · Fixed by #62
Closed

Error with keys that are unavailable in the MDF #61

jusack opened this issue Sep 15, 2023 · 7 comments · Fixed by #62

Comments

@jusack
Copy link
Contributor

jusack commented Sep 15, 2023

I am trying to read MDF files that are not completely up to specification, causing me the run into the case, where there are required keys missing in the file.

The problem is now, that instead of utilizing the @keyrequired macro in

studyName(f::MDFFile)::Union{String, Missing} = @keyrequired f["/study/name"]
which should return missing, it is already detected in

MPIFiles.jl/src/MDF.jl

Lines 72 to 78 in 2c4c740

function getindex(f::MDFFile, parameter)
if haskey(f.file, parameter)
return read(f.file, parameter)
else
return nothing
end
end
that the key is missing and returned as nothing. This causes the function using the @keyrequired macro to fail, since it can´t convert nothing to Union{String, Missing}

What would be the desired solution to this? Keep the distinction between keyrequired and keyoptional and just let the getindex function throw a KeyError? Or should we just return nothing for all keys that are unavailable no matter if they are optional or not?

@jusack
Copy link
Contributor Author

jusack commented Sep 15, 2023

If it is decided that missing should be returned, this will fail since rxDataConversionFactor is @keyrequired:

a = rxDataConversionFactor(f)
if a!=nothing

@tknopp
Copy link
Member

tknopp commented Sep 15, 2023

@jonschumacher @nHackel: Any opinions on that? I have not introduced missing in MPIFiles and don't know how much this is used in the other packages (MPIReco / MPIMeasurements). The most simple thing would be to replace all occurrences of Missing/missing by nothing/Nothing.

@nHackel
Copy link
Member

nHackel commented Sep 15, 2023

MPIReco has not used a lot of missing, but the new version will use missing in the Plan interface. Internally, all Parameter structs will use nothing though. MPIMeasurements used missing a few times, but mostly we switched to nothing as it was more fitting.

I'm really unsure about which solution I'd prefer. One the one hand, using missing and throwing KeyErrors is more "correct", since the HDF5 does not have a concept of null/nothing. That would require updating cases that check for isnothing, potentially also in upstream packages. Another difficulty with missing is that it propagates and has weird behaviour, a little bit like NaN.

Functions like iszero(missing) or missing + 4 return missing. isless(missing, 2) even returns a boolean. So if we chose this solution we have to be very careful in upstream packages or even MPIFile itself to not run into weird bugs due to this behaviour. Nothing would just throw an error.

At which points are you running into issues @jusack? Is it still in the initial construction/parsing of an MDF or when "high-level" functions like getMeasurements are called?

I think my preferred solution would be something where such an initial construction "always" works but might fill up with missing values, but the actual functions on our MPIFile object then would either return a value (which could be nothing or empty depending on the function) or throw an error if something missing was accessed and used in some way. If we allow functions that aren't just accessing a missing value, then we'd probably have to think about default values, which I think would be bad/difficult 🤔

I'm not sure if the current structure of MPIFiles allows for such a solution though nor how much effort it would take

@jusack
Copy link
Contributor Author

jusack commented Sep 15, 2023

The initial construction of the MPIFile works fine, but the first occasion where it fails was the show method when printing the returned object after construction (because no study name was defined). My current goal is opening a system matrix from our 3D MPS recorded with the old software with the SFViewer, however, there were multiple things missing, that I have to add manually and I did not manage yet to add all things for the SFViewer to start.

One thing we could do is just replace the @keyrequired macro to throw an descriptive error about the dataset in the MDF that is missing, even though it is required by the standard. After all a default value for a required parameter does not really make sense

@tknopp
Copy link
Member

tknopp commented Sep 15, 2023

So, one general comment: When beginning MPIFiles we always tried to fix wrong data when accessing it. So during that time it would have been natural to provide default values.

But with the change from MDF V1 to V2 we tried to be much more restrict and the philosophy is to rather fix the wrong data instead of fixing it during loading. This does not answer the question regarding missing vs nothing but in the concrete case it makes sense to have some tools at hand to fix MDF data. We have done this during the V1 to V2 transition and the script that we used operated on the HDF5 level. So in this case it would be a small script that opens all mdf's in a directory, checks if the studyName is there and if not creates one.

@jonschumacher
Copy link
Member

The macro was introduced by me during the implementation of the in-memory MDF which also relies on the distinction between nothing and missing to distinguish required and optional values. From my point of view the getindex method should just throw an error which can then be handled in downstream functions.

@jonschumacher
Copy link
Member

jonschumacher commented Sep 18, 2023

I now added a PR changing getindex to return a KeyError. Addition to my previous comment: returning nothing can easily hide bugs since the KeyError might just stem from a spelling mistake in the key. The macros also can't check that but they are used in dedicated functions which are tested in CI and they are an active decision when getting the key typically applied in the same line as the key string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants