Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulseq loading slow and potentially incorrect block ordering #224

Open
gabuzi opened this issue Nov 20, 2023 · 8 comments · Fixed by #321
Open

Pulseq loading slow and potentially incorrect block ordering #224

gabuzi opened this issue Nov 20, 2023 · 8 comments · Fixed by #321

Comments

@gabuzi
Copy link
Contributor

gabuzi commented Nov 20, 2023

Hello, me again :)

Two aspects on Pulseq that I have noticed:

1. Speed

I have been toying a bit more with Koma, now moving to even larger pulseq files. Pulseq loading is extremely slow for those (does not finish within 20min), and @time read_seq(...) indicates a large amount of allocations.

Let's take as an example a toy cartesian 2D bSSFP sequence with 64 phase encode steps and 2 excitations per step. The result has 768 blocks, the pulseq file is 27KB in size.

Reading this with the latest master ae27d17, gives

julia> time_read_seq(f) = @time read_seq(f)
       println("baseline")
       seq = time_read_seq(seqfile)
baseline
[ Info: Loading sequence bSSFP_FA30deg_TE10ms_TR20ms_2D_(69x64)_pulseq.seq ...
  0.752135 seconds (8.51 M allocations: 299.462 MiB, 11.12% gc time)
Sequence[ τ = 3539.2 ms | blocks: 768 | ADC: 128 | GR: 636 | RF: 128 | DEF: 11 ]

julia> 

We can see 8.5M allocations totaling 300MB and a ballpark of 0.5-1s for loading (have warmed-up the function before to force compilation).
A larger sequence file (6912 blocks), 220KB pulseq file comes in at ca 50s time and 21GiB allocations.
Extrapolating this to, say an MRF sequence, where you could easily have two to three orders of magnitude more blocks could easily make pulseq sequence loading completely infeasible (I have tried that). This was with an M1 macbook pro.

I have spent some time digging into the sequence loading code and optimizing it. I went through several iterations and I can now load the small file above in 0.027s with 300k allocations totalling in 11.6MB.

The larger file now loads in 0.25s with 2.59M allocations for 98MiB.

2. [BLOCKS] order

Along the way I realized that Koma seems to interpret the pulseq [BLOCKS] section as being reproduced on a scanner in incrementing ID order instead of line by line in [BLOCKS]. The blocks data is read into dictionaries of (ID => block_data), and then the sequence struct is assembled by iterating over incrementing IDs.
See https://github.com/cncastillo/KomaMRI.jl/blob/ae27d17326b7989d7fc1aceba01a051450e3d219/KomaMRICore/src/io/Pulseq.jl#L412
(which coincidentally is one of the main culprit lines for the slow load),
and then in get_block, we have: https://github.com/cncastillo/KomaMRI.jl/blob/ae27d17326b7989d7fc1aceba01a051450e3d219/KomaMRICore/src/io/Pulseq.jl#L642
obj["blockEvents"][i] is the mapping from block with ID==i to the data from the [BLOCKS] section for this ID.

Digging through the pulseq spec 1.4.1 (my seqs are 1.4.0, but there were no relevant changes from 1.4.1 to 1.4.0): https://pulseq.github.io/specification.pdf, we find sec 2.2:

There are no restrictions on the ordering of IDs, although sequential ordering is often implemented.

And in sec 2.7, p10 top

The sequence must declare at least one block. Any non-zero number of blocks
may be defined. The blocks are executed sequentially.

While this may be seen as a bit ambiguous ("sequentially line-by-line or by-id?"), I think together with sec 2.2, I would say it's line-by-line.

This is good, because it relieves Koma from storing the BLOCKS into a Dict and relatively expensive indirect lookups, but rather the entire [BLOCKS] section can be read into one big Int array that doesn't have to be modified, which is significantly more efficient for creation and access.

This is more or less how I achieved the optimizations described above.

3. What's next?

Since these are more invasive modifications (I had to define the == operator for sequences to ensure that the results I get form different reading functions are the same), and I could only test on my pulseq files. These are all v.1.4.0 and created via pypulseq and thus probably lack the diversity allowed by the specification. So, the obvious question is: are there pulseq test files available in Koma already? I didn't find any.

I'm happy to open a PR with the optimizations so that you can have a closer look. Let me know what you think.

@gabuzi
Copy link
Contributor Author

gabuzi commented Nov 25, 2023

At some point I did get a github notification with sequences on the writer branch, but interestingly, the comment doesn't show up here. I'll have a look at those for further testing.

@cncastillo
Copy link
Member

cncastillo commented Nov 28, 2023

Hi @gabuzi. Thank you so much for your efforts 😄! It is very nice to see people other than me and Boris contributing to the project.

We have some Pulseq files that test different relevant aspects (I changed the extension .seq to .seq.txt so I can upload them here):

Hope this is useful.

@beorostica we should add some tests to #231 compare our k-space with MATLAB's Pulseq k-space for these examples, to ensure we are reading the same waveforms.

@gabuzi
Copy link
Contributor Author

gabuzi commented Nov 28, 2023

@cncastillo No worries! To me, that's how open-source software works -- happy to give something back!

Thanks for the examples! At the moment, I am a bit short on time, unfortunately. But I came across the python implementation for pulseq (pypulseq), and I realized that it uses a very similar approach for sequence reading. I didn't think that was coincidental, so I looked at the upstream pulseq matlab implementation, which turned out to be very similar as well :)

Consequently, all implementations suffer the same ambiguity in block ordering, for which I have now opened an issue in the matlab pulseq repo about it: pulseq/pulseq#40

@cncastillo
Copy link
Member

Yeah it makes sense, the Pulseq reader in Koma is pretty much a copy-paste of Pulseq's MATLAB reader (v1.4.0 at least). It will be interesting to see what they think.

In the meantime, if you are busy, we can test the performance improvements. We would just require a draft pull request.

@cncastillo
Copy link
Member

I looked at this again using @profview, and I found the same sources of slow down you mentioned. The culprits were type instability in get_blocks and read_blocks, the flip angle calculation, the Sequence constructor being slow, and the use of scanf in some parts. I fixed them in 6176d03.

The current master:
image

New version from #321
image

So now the reader is 20 times faster! 🥳 Thanks for detecting the problem and the great progress on the solution in #238 😄

In #321 we are also including some tests to be sure that we are reading the same as MATLAB's Pulseq (v1.4.X), but we still need to add some more for PyPulseq (v1.3.X) #318.

Cheers!

@github-project-automation github-project-automation bot moved this from In Progress to Solved in Dev in KomaMRI.jl v0.8.0 Apr 8, 2024
@gabuzi
Copy link
Contributor Author

gabuzi commented Apr 8, 2024

This is great news! 20x is a great speedup 🚀

I'm just noting this here as it might have gone under in my relatively big opening post for this issue. The numbers I have obtained back then indicate that another factor of 10x should be possible. My benchs using #238 saw 50s -> 0.25s for a seq with 6912 blocks. But then again, other bits that influence loading performance might have changed.

Arguably, the 20x speedup is enough for for most standard rapid imaging sequences, but longer ones like 3D MRF without undersampling (e.g. for debugging) could become a patience tester ;)

@cncastillo cncastillo reopened this Apr 8, 2024
@cncastillo
Copy link
Member

Nice! I am all for speed improvements. I will reopen this so I don't forget to look at this on the weekend. The difference in speed could be related to the use of the dictionaries. I will need to investigate.

Also related to performance, we recently enabled a new gl option for plot_seq that should make the plots much faster #364. In the UI this option is automatically enabled if the sequence has more than 1000 blocks.

@gabuzi
Copy link
Contributor Author

gabuzi commented Apr 8, 2024

From memory, I am quite certain that it was indeed due to dictionaries. IIRC, in #238 I avoided those to get those further speedups. If I'm not mistaken, this did come with the caveat that sequence blocks were then ordered by order as they appear in the file, not by their ID (which is fine according to the pulseq standard, see #224 (comment), but crucially is not what the pulseq matlab reference implementation does!).

Also related to performance, we recently enabled a new gl option for plot_seq that should make the plots much faster #364. In the UI this option is automatically enabled if the sequence has more than 1000 blocks.

This is great to hear! I remember plots being an issue for those large sequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants