Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse pattern is not guaranteed to be full rank #16

Open
jrapin opened this issue Nov 17, 2021 · 0 comments
Open

Sparse pattern is not guaranteed to be full rank #16

jrapin opened this issue Nov 17, 2021 · 0 comments

Comments

@jrapin
Copy link

jrapin commented Nov 17, 2021

Hi,
First, thanks for this code! ;)

From my understanding the sparsity pattern for the block is fully random. This is concerning since it leads to non-full rank matrices when increasing sparsity. See the figure below which BlockSparseLinear generated for a 256x256 matrix with 25% density and 32x32 blocks:
block_mask_25pc_256

If my computation is correct, at this size, block size and sparsity, only around 20% of matrices will be full rank or, for another example, only 10% of 1024x1024 matrices at 10% density will be full rank), and, if I am not completely mistaken (I might yet be), 0% of the matrices created in the README self.fc = BlockSparseLinear(1024, 256, density=0.1)

I don't have any good option to propose though sorry, I see only 2 complementary ways:

  • preselecting some of the tiles to make sure all input data is used and all output data is filled (eg: diagonal pattern for a square matrix),
  • adding an API for users to provide the sparsity pattern they want to use if they need more flexibility (eg: BlockSparseLinear.from_pattern(pattern: torch.Tensor, block_shape: Tuple[int, int]), but then it's no more a "drop in replacement" to a linear layer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant