Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of VLSV format in post-processing #19

Open
rjarvinen opened this issue Feb 22, 2016 · 14 comments
Open

Improve performance of VLSV format in post-processing #19

rjarvinen opened this issue Feb 22, 2016 · 14 comments

Comments

@rjarvinen
Copy link
Member

Study if the performance of VLSV can be improved for post-processing. Currently ~5-10 GB VLSV files are not feasible to be analyzed on laptop computers. This may not be due to a RAM memory limit but could be more an issue with the performance of the VLSV reader and the VLSV Visit plugin. For example, could stored cells be sorted for faster access by a separate post-processing tool?

@iljah
Copy link

iljah commented Feb 22, 2016

It would be nice to have some benchmarks. For example how much memory is required to fetch all data in one cell? How much of the file has to be read to fetch all data from one cell? How does the required CPU time to fetch M variables from N cells scale?

In files written by dccrg cells are not guaranteed to be in any particular order but writing a post-processing tool to sort the cells (and their data for faster sequential access) based on id to make them faster to find seems almost trivial.

@galfthan
Copy link
Member

In general vslv writes data out so that the data from each process is in order, so data from rank 0 comes first then rank 1 and so on. With dynamic load balancing the data is not in any particular order when looking at the ID's of individual cells. This means that to read in the the data from a particular cell one first needs to read in all cellids so that the location can be found. The overhead when reading data from a single point is thus very large. For example: to read in rho from one particular point in a Vlasiator simulation with 1000 files, each with 4000 x 2000 cells, means that one needs to read in 64 GB of data, while the actual rho data is only 8 kB in size.

There are in general two different solutions:

  1. Sort cells while writing or in post-processing. I think it would be best to do it while writing to avoid an annoying post processing step that is potentially very slow and requires buffer space. The all-to-all like communication step, or a complex fileview would not be free either.

  2. Add more metadata to reduce the amount of data that is read in. If we for example wrote the bounding box of each process then one could read in the cellids of just a few processes to find the cell.

I would probably start by testing what the performance penalty would be when using a custom fileview to write data in order.

@sandroos
Copy link
Contributor

@rjarvinen any chance you could toss me a sample VLSV file via Dropbox, for example, that is too slow to analyze? Also check out the pull request..

@sandroos
Copy link
Contributor

@galfthan Yup, bounding box per domain that limits the cell IDs would indeed make things faster, I have a much bigger update coming for vlsv where I might implement this

@iljah
Copy link

iljah commented Feb 23, 2016 via email

@sandroos
Copy link
Contributor

@rjarvinen Actually just giving the mesh dimensions (xcells,ycells,zcells), number of domains (roughly), and more information of what kind of data analysis you're doing might be sufficient for me to check what I can do.

@sandroos
Copy link
Contributor

How would writing cell data in cell id order work in parallel? Seems
like all-to-all would still be involved at least to find out who has
which cells.

For AMR yes, for regular meshes each process can calculate the correct offsets in output file, assuming there are no holes in the mesh.

I'm not a big fan of the idea of sorting data in VLSV files, however it would be possible to add indexing data as a post-processing step to speed up random accesses.

@rjarvinen
Copy link
Member Author

I will prepare soon a benchmark

@rjarvinen
Copy link
Member Author

Here's a quick VLSV/Visit plugin performance test with a nominal Venus run. Compared are a VTK file from the HYB simulation and a VLSV file from a Corsair/RHybrid run. Both files have the same amout of scalar and vector variables and the same grid size of 120x160x160 (+-1 cell). VTK uses STRUCTURED_POINTS grid structure.

Data files are available here (file sizes are: VLSV 1.3G and VTK 610M):

https://dl.dropboxusercontent.com/u/8446786/vlsv_perf_test_data_files.zip

Comparison uses attached Visit python script and a shell script to run the comparison (provided that Visit is installed). The script opens the VLSV/VTK file and creates plots of 6 different scalar variables and exit. VLSV takes more than twice the time VTK does to complete the script.

I don't know if the performance difference comes from the VLSV format itself, the grid type used in the VLSV file or the plugin code. I didn't test the pull request with new optimizations for UCD multimesh reader yet and don't know if it affects this test.

./run_perf_test.sh
VLSV format:
Running: cli2.10.0 -nowin -s visit_plotter.py
Running: viewer2.10.0 -nowin -noint -host 127.0.0.1 -port 5600
Running: mdserver2.10.0 -host 127.0.0.1 -port 5600
Running: engine_ser2.10.0 -host 127.0.0.1 -port 5600

real    1m17.649s
user    0m0.812s
sys 0m0.230s
VTK format:
Running: cli2.10.0 -nowin -s visit_plotter.py
Running: viewer2.10.0 -nowin -noint -host 127.0.0.1 -port 5600
Running: mdserver2.10.0 -host 127.0.0.1 -port 5600
Running: engine_ser2.10.0 -host 127.0.0.1 -port 5600

real    0m29.312s
user    0m0.776s
sys 0m0.272s

vlsv_perf_test.zip

@sandroos
Copy link
Contributor

@rjarvinen One additional question: how many domains (=MPI procs) do you have in a nominal Venus run?

Please test the version in pull request
#18

as it may potentially give a major performance boost in VisIt.

@rjarvinen
Copy link
Member Author

720 PEs using 60 nodes on voima. Thanks, I'll check that patch!

@sandroos
Copy link
Contributor

vtk files are still faster as compared to the vlsv in pull request, the speed difference mainly comes from the mesh formats. Structured grid is much easier to generate than a mesh where the cells appear in random order.

I'll take a look if I can speed up things more, but in the meanwhile you can also do parallel visualization in Voima. I'm sure @ykempf can help you out if you weren't using voima for remote visualization already.

@rjarvinen
Copy link
Member Author

The performance difference seems to come from creating individual plots.

OpenDatabase(db) command runs faster on VLSV (3 seconds) than on VTK (11 seconds). Plotting only one parameter takes roughly the same amount of time for both formats (30 seconds). Additional plots increase the running time almost linearly on VLSV but not considerably on VTK.

Maybe VTK has buffering or something, which makes it faster to use once the file is opened.

@sandroos
Copy link
Contributor

After checking up memory usage with the resource monitor, it indeed does seem like that VTK plugin caches the whole file in memory, and that's why changing variables is faster.

I suppose running an expression in VisIt for VLSV file may be quite slow if it reads in variable data multiple times (although reading variables shouldn't be that slow) and optimizing this a bit might be a good idea. I'm not sure what's the best way to do it for multi-domain data, though, since there are no guarantees that same MPI processes read the same domains every time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants