Add PCA Dictionary Indexing #638

ZacharyVarley · 2023-05-13T13:42:37Z

Description of the change

Add fast PCA dictionary indexing

Progress of the PR

Docstrings for all functions
Unit tests with pytest for all lines
Clean code style by running black via pre-commit

For reviewers

The PR title is short, concise, and will make sense 1 year later.
New functions are imported in corresponding __init__.py.
New features, API changes, and deprecations are mentioned in the unreleased
section in CHANGELOG.rst.
New contributors are added to release.py, .zenodo.json and
.all-contributorsrc with the table regenerated.

HNSW not viable

10x improvement over naive pyopencl

hakonanes · 2023-05-21T15:23:10Z

Hi @ZacharyVarley,

thank you for opening this PR! I modified your added tutorial to compare DI with PCA DI on my 6 yr old laptop. PCA DI is about 2x faster than standard DI, although it uses much more memory, which I hope we can look at.

Please let me know when you'd like feedback on the PR!

hakonanes · 2023-05-21T15:25:40Z

Please let me also know whether you have any questions regarding tests, docs, dependency control etc. You might already be aware of our contributing guide.

ZacharyVarley · 2023-05-21T15:50:40Z

Hi @ZacharyVarley,

thank you for opening this PR! I modified your added tutorial to compare DI with PCA DI on my 6 yr old laptop. PCA DI is about 2x faster than standard DI, although it uses much more memory, which I hope we can look at.

Please let me know when you'd like feedback on the PR!

@hakonanes

The memory footprint is expected to be 4x larger than holding the uint8 dictionary patterns, as the dictionary patterns must be cast to 32 bit floats in order to calculate the covariance matrix using LAPACK for subsequent eigenvector decomposition. I realize now that this should be avoided by chunking the dictionary covariance matrix calculation. Further, this approach is most beneficial when the dictionary and/or pattern dimensions are large (roughly > 100,000 and > 60x60). This might be why only a 2x speedup was observed. I am hoping to have time to patch up everything in the PR this week.

hakonanes · 2023-05-21T17:30:37Z

I realize now that this should be avoided by chunking the dictionary covariance matrix calculation.

I haven't looked in detail at the implementation, but just assumed that the dictionary had to be in memory for PCA DI. If not, that's great! You should be able to map the calculation across chunks of the dictionary using dask.array.map_blocks(). Using Dask gives the user the option to control the number of workers and available memory they want to use for this parallelized computation outside of kikuchipy, which is one of the greatest benefits of using Dask (kikuchipy don't have to consider these things).

From my experience, it is impossible to find the optimum between speed and memory use of DI across different platforms and sizes of dataset and dictionary. We therefore give the user the option to control how many experimental (n_per_iteration) and dictionary patterns (the chunk size of the dictionary if a Dask array, controlled by chunk_shape=n in calls to get_patterns()) are compared in each iteration. It would be good if similar options could be given to the user for PCA DI as well. Larger chunks = faster, but higher memory use.

review-notebook-app · 2023-10-31T16:56:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ZacharyVarley added 6 commits May 5, 2023 08:21

Public commit on HNSW attempt

b37460a

HNSW not viable

Add cuml exhaustive

d164b8c

10x improvement over naive pyopencl

Public Commit of PCA Dictionary Indexing

bf8e136

Add authorship and Changelog

49a20af

try imports

a143346

Remove GPU implementations

138878b

hakonanes added the enhancement New feature or request label May 13, 2023

hakonanes added this to the v0.9.0 milestone May 13, 2023

hakonanes removed this from the v0.9.0 milestone Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PCA Dictionary Indexing #638

Add PCA Dictionary Indexing #638

ZacharyVarley commented May 13, 2023 •

edited

Loading

hakonanes commented May 21, 2023

hakonanes commented May 21, 2023

ZacharyVarley commented May 21, 2023 •

edited

Loading

hakonanes commented May 21, 2023

review-notebook-app bot commented Oct 31, 2023

Add PCA Dictionary Indexing #638

Are you sure you want to change the base?

Add PCA Dictionary Indexing #638

Conversation

ZacharyVarley commented May 13, 2023 • edited Loading

Description of the change

Progress of the PR

For reviewers

hakonanes commented May 21, 2023

hakonanes commented May 21, 2023

ZacharyVarley commented May 21, 2023 • edited Loading

hakonanes commented May 21, 2023

review-notebook-app bot commented Oct 31, 2023

ZacharyVarley commented May 13, 2023 •

edited

Loading

ZacharyVarley commented May 21, 2023 •

edited

Loading