Feature request: dense-sparse-dense bilinear #137

Linux-cpp-lisp · 2021-05-14T22:04:29Z

General bilinear operations take the form x^T @ A @ y where A is of shape (outdim, xdim, ydim). (See https://pytorch.org/docs/stable/generated/torch.nn.Bilinear.html).

Currently, as far as I can tell, implementing this for a sparse A using pytorch_sparse is only possible as a sparse-dense matmul of A with the outer product of x and y. This outer product is a huge memory cost and, if A is sparse, contains many unnecessary computations.

It seems to me like supporting this computation directly would not require dramatic changes to the existing CPU and CUDA matmul code — seemingly just a second outer loop over x in addition to y. But I'm not really familiar enough with the code to see if that's true.

Ideally, like existing sparse-dense multiplication, it would support a batch dimension in x and y:

x=[batch, dx], A=[dx, dy, dout], y=[batch, dy] --> [batch, dout]

I would be happy to help with implementing this if you think it is doable and something you'd be interested in having in the library!

The text was updated successfully, but these errors were encountered:

rusty1s · 2021-05-17T05:11:46Z

That's interesting. I think one current limitation is that SparseTensor needs to be two-dimensional. As such, I'm not really sure if we can directly re-use CPU and CUDA kernels for this. WDYT?

Linux-cpp-lisp · 2021-05-20T06:58:27Z

Hm right — I think all such bilinearities can be expressed as a linear operation between a linear operator A and the outer product of x and y, which can be expressed as a matmul between a flattened version of A and the outer product. As a result, I think the sparse tensor A can remain a 2D matrix, but that you'd need the compute kernel to generate the appropriate elements of the outer product on-the-fly.

A quick look at the CPU kernel (https://github.com/rusty1s/pytorch_sparse/blob/master/csrc/cpu/spmm_cpu.cpp#L84) makes it seem like you're already doing random access on the dense matrix, so extending that to random access on two dense tensors doesn't seem like a big deal. OTOH, I don't know any CUDA and have no idea what that looks like in the CUDA kernels, so it may be much more complicated than for the CPU kernel.

I'm trying a lot of very different approaches to accelerating this strange computation — my A is very sparse but also has a lot of repeated structure, so its not obvious that explicitly storing it in COO format is actually the best way to go — but can't rule anything out either 😄

Allows split according to a column values. (e.g. smaller 80% is train; larger 10% is test; rest is validation). Can be date column or other int valued columns that require temporal splits. Co-authored-by: rusty1s <[email protected]>

* compile with half * Fix * fix * rename * update * update * update * update * typo Co-authored-by: rusty1s <[email protected]>

rusty1s added the enhancement New feature or request label Sep 16, 2021

yanbing-j pushed a commit to yanbing-j/pytorch_sparse that referenced this issue Aug 18, 2022

Add support for half (rusty1s#137)

1f49a3a

* compile with half * Fix * fix * rename * update * update * update * update * typo Co-authored-by: rusty1s <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: dense-sparse-dense bilinear #137

Feature request: dense-sparse-dense bilinear #137

Linux-cpp-lisp commented May 14, 2021 •

edited

Loading

rusty1s commented May 17, 2021

Linux-cpp-lisp commented May 20, 2021 •

edited

Loading

Feature request: dense-sparse-dense bilinear #137

Feature request: dense-sparse-dense bilinear #137

Comments

Linux-cpp-lisp commented May 14, 2021 • edited Loading

rusty1s commented May 17, 2021

Linux-cpp-lisp commented May 20, 2021 • edited Loading

Linux-cpp-lisp commented May 14, 2021 •

edited

Loading

Linux-cpp-lisp commented May 20, 2021 •

edited

Loading