Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiled algorithm implementations #22

Merged
merged 7 commits into from
Sep 8, 2023

Conversation

graeme-a-stewart
Copy link
Member

This is a big merge, which adds three implementations of the N2Tiled algorithm. These use different data layout strategies and have rather different performance.

The first uses an SoA layout connected to the tiles themselves, so that each tile's jets are stored in a compact structure. However, this requires many containers to be allocated (Vector{Float64} for each tracked parameter) across many tiles, which is really expensive in typical scenarios where there are O(500) tiled being tracked. This is the N2TiledSoATile strategy. It is the slowest of the 3 implementations.

The second uses a global SoA, for a compact jet representation. This is much faster for the global operations such as finding the minimum value for dij. However, keeping the compact global SoA requires shuffling and shrinking containers as jets are merged and finalised, which is quite a lot of data movement. This is the N2TiledSoAGlobal strategy. It is more than x2 faster than the SoA per tile, but still suffers from complex book keeping and data shuffling.

The third algorithm is a port of Philippe Gras' FastJet inspired implementation of the N2Tiled algorithm from that package. It keeps book keeping very light, but maintains a compact list of dij parameters to allow for fast searching for the jet merges and finalisations. This turns out to be optimal.

This PR is to snapshot the work done so far. It will then be the starting point for a production release of this package.

Implementation of a version of the N2Tiled algorithm used by fastjet,
but using a SoA structure for tiled jets which form part of the
structure of the tile itself.

LorentzVector code is used instead of a "plain" momentum vector.

Coordinate pair structure is added that tracks neighbours and
caches are used for the rightmost and the all neighbours
lists for each tile.

Some optimisations and sizehints are used to speed the code up.

Switched to plain JSON package as it's much simpler than JSON3.

Added tests against the FastJet outputs for this tiled algorithm and
for the N2Plain algorithm. An ENUM is used to control the switching
between algorithms.

This implementation remains quite slow as it suffers from many
bookkeeping overheads, unfortunately.
This is a implementation of the N2Tiled algorithm that manages
the jets as a global SoA structure and keeps a linked list of
the jets in each tile with a first -> next/previous structure.
Set() was also tried, but it was just too slow.

As jets are merged and finalised, there is shuffle and shrink
on the SoA to keep a compact list (which is quite expensive,
it turns out).

Considerable optimisation of searches using LoopVectorisation.

Finally, this is a lot faster than the per-tile SoA, but it's still
slower than the FastJet linked list approach by about x2. This slowdown
actually comes inherently from the complexity of managing the complex
SoA, which outweighs the places where it gives a performance gain!
Better separation of code which is algorithm specific from
code which is used by more than one tiled algorithm

Update names of algorithms to be more descriptive of their
implementation and data structures
This is a reimplementation of the code developed by
Philippe Gras to test a Julia implementation of the FastJet
N2Tiled algorithm.

This implementation doesn't attempt any SoA, but it's extremely
lightweight in manipulations as jets are merged and finalised.

The speed is equal to or faster than FastJet, mainly down to using
the LoopVectorisation package to improve the speed of the search
for the minimum dij.
Match the name FastJet uses for the strategy
Although we keep the other implementations at the moment,
our default and fastest algorithm
name should match FastJet here for consistency
As he originally implemented the current N2Tiled algorithm, inspired by
FastJet
We require a const global struct that only works from 1.8 onwards
Bump version to 0.2.0
@graeme-a-stewart
Copy link
Member Author

Just accepting this one myself as we need this code in the repo for the CHEP paper

@graeme-a-stewart graeme-a-stewart merged commit 15bfd59 into JuliaHEP:main Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant