Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Everything is unsafe; use slice instead? #15

Open
daniel-vainsencher opened this issue Jul 29, 2015 · 11 comments
Open

Everything is unsafe; use slice instead? #15

daniel-vainsencher opened this issue Jul 29, 2015 · 11 comments

Comments

@daniel-vainsencher
Copy link
Contributor

Hi, at its core, the implementation of matrix is using a ptr, which then requires unsafe even to do a read only access of data (see, for example, the implementation of trace). I'm a rust newbie, but I think that unsafe should be reserved for much more dire circumstances; is there any reason not to implement "ptr" as a slice instead, and then just access its elements? slice [1] seems to be the idiomatic way to access contiguous block of memory.

[1] https://doc.rust-lang.org/std/slice/

@shailesh1729
Copy link
Contributor

Hi,

Please spend some time looking at the implementation of std::vec https://doc.rust-lang.org/src/collections/vec.rs.html which is part of standard library. In order to have efficient vector implementation, vector works with the raw pointer. The std::slice

The scrirust matrix library has been built similarly. Note that, all the unsafe operations are restricted to matrix library itself. They are getting thoroughly tested via unit tests. It helps the higher layers (linear algebra, statistics etc.) to work on the safe API provided by the matrix structure.

Efficiency is important in the implementation of core matrix operations. Any kind of range checking etc. would be a no no.

@daniel-vainsencher
Copy link
Contributor Author

I agree there is no point in writing an inefficient linear algebra library. On the other hand, anything that is exported as safe should really try to be safe. For example: the get function is unsafe (bounds checked only in debug) and exported as safe. How about instead having two versions: get uses bounds checked access, and unsafe_get avoids the bounds check, but is unsafe to use. The first is still what I would use most of the time, because accessing a single element is not what usually has to be as fast as possible, and I at least am not absolutely careful about all my indices. The second is what I would use in contexts where I take other precautions for correct indexing and speed is a concern.

I think many functions could be implemented efficiently without bounds checking, by using a few core iterator implementations that void bounds checking. For example trace: implement carefully an iterator over the main diagonal, using unsafe and exposed as safe, because under the matrix invariants this is safe. Then implement trace as just a sum over this iterator, which is obviously correct. The iterator can be reused. This should make it easier to avoid, for example, the current access out of bounds bug that currently happens when computing the trace of a zero-sized matrix (unless I missed something).

@shailesh1729
Copy link
Contributor

These changes are doable. If you are making some of these changes, I will
happily merge. Slightly busy with other things so not actively working on
SciRust right now. Will fix in due course.

On Fri, Jul 31, 2015 at 9:27 PM, Daniel Vainsencher <
[email protected]> wrote:

I agree there is no point in writing an inefficient linear algebra
library. On the other hand, anything that is exported as safe should really
try to be safe. For example: the get function is unsafe (bounds checked
only in debug) and exported as safe. How about instead having two versions:
get uses bounds checked access, and unsafe_get avoids the bounds check, but
is unsafe to use. The first is still what I would use most of the time,
because accessing a single element is not what usually has to be as fast as
possible, and I at least am not absolutely careful about all my indices.
The second is what I would use in contexts where I take other precautions
for correct indexing and speed is a concern.

I think many functions could be implemented efficiently without bounds
checking, by using a few core iterator implementations that void bounds
checking. For example trace: implement carefully an iterator over the main
diagonal, using unsafe and exposed as safe, because under the matrix
invariants this is safe. Then implement trace as just a sum over this
iterator, which is obviously correct. The iterator can be reused. This
should make it easier to avoid, for example, the current access out of
bounds bug that currently happens when computing the trace of a zero-sized
matrix (unless I missed something).


Reply to this email directly or view it on GitHub
#15 (comment).

@daniel-vainsencher
Copy link
Contributor Author

I'll try to think of a way of doing just a bit at a time, changing the representation and then fixing everything is too much for me now as well. Maybe the right way is to first create the iterators, the start converting methods to use them, then change the representation last. Anyway, I'll send PRs as I get to things.

@daniel-vainsencher
Copy link
Contributor Author

So we're making progress on this! you've created the safe and unsafe getters, and in #16 I've released the row and column iterators from some constraints on T, added the diagonal iterator, and used it in a few places. One thing that bothers me is that the iterators currently receive a copy of the pointer to the data, which I think completely avoids borrow checks. In other words, I don't think scirust currently prevents reading a matrix with an iterator while simultanuously modifying it directly (but I haven't yet tried it).

Looking back at Vec, it uses a Unique, rather than a ptr, but I have to admit I'm not well versed in the nuances. Ideally, we should be able to borrow a buffer, build a matrix around them, and operate safely on it. But I don't yet know how to do that.

@shailesh1729
Copy link
Contributor

That advanced rust book seems to have detailed discussion on implementation
of Vec struct https://doc.rust-lang.org/nightly/adv-book/. A year back Vec
was using ptr directly. That's when I had started the work on SciRust. I
think over time they shifted to Unique. I didn't really follow up.

On Tue, Aug 4, 2015 at 7:47 PM, Daniel Vainsencher <[email protected]

wrote:

So we're making progress on this! you've created the safe and unsafe
getters, and in #16 #16 I've
released the row and column iterators from some constraints on T, added the
diagonal iterator, and used it in a few places. One thing that bothers me
is that the iterators currently receive a copy of the pointer to the data,
which I think completely avoids borrow checks. In other words, I don't
think scirust currently prevents reading a matrix with an iterator while
simultanuously modifying it directly (but I haven't yet tried it).

Looking back at Vec, it uses a Unique, rather than a ptr, but I have to
admit I'm not well versed in the nuances. Ideally, we should be able to
borrow a buffer, build a matrix around them, and operate safely on it. But
I don't yet know how to do that.


Reply to this email directly or view it on GitHub
#15 (comment).

@shailesh1729
Copy link
Contributor

I stlil have problem with the get() get_unchecked() design. One of my core
design concerns was that SciRust API should be simple enough so that
writing typical scientific computing code should look like as simple as
MATLAB or Python for people who are just starting with it, while it should
provide the flexibility of a systems language for advanced users.

The revised API now forces one to write get(r, c).unwrap() for
typical situations. Alternative is to use panic() inside get()
something which I have done in many functions. This is a core API design
issue on which I am not yet settled.

The return value choices are:

  • Option
  • T and use debug asserts
  • T and use panic!
  • Result<T, Error>

You might see different choices being made for different functions in the
API.

On Tue, Aug 4, 2015 at 7:59 PM, Shailesh Kumar [email protected]
wrote:

That advanced rust book seems to have detailed discussion on
implementation of Vec struct https://doc.rust-lang.org/nightly/adv-book/.
A year back Vec was using ptr directly. That's when I had started the work
on SciRust. I think over time they shifted to Unique. I didn't really
follow up.

On Tue, Aug 4, 2015 at 7:47 PM, Daniel Vainsencher <
[email protected]> wrote:

So we're making progress on this! you've created the safe and unsafe
getters, and in #16 #16 I've
released the row and column iterators from some constraints on T, added the
diagonal iterator, and used it in a few places. One thing that bothers me
is that the iterators currently receive a copy of the pointer to the data,
which I think completely avoids borrow checks. In other words, I don't
think scirust currently prevents reading a matrix with an iterator while
simultanuously modifying it directly (but I haven't yet tried it).

Looking back at Vec, it uses a Unique, rather than a ptr, but I have to
admit I'm not well versed in the nuances. Ideally, we should be able to
borrow a buffer, build a matrix around them, and operate safely on it. But
I don't yet know how to do that.


Reply to this email directly or view it on GitHub
#15 (comment).

@qinwf
Copy link

qinwf commented Aug 4, 2015

Unsafe can be safe and efficient, unsafe is not evil.

@shailesh1729
Copy link
Contributor

@qinwf agreed. The idea I think is to see what parts of the code can be rewritten using standard Safe Rust without losing efficiency.

@daniel-vainsencher
Copy link
Contributor Author

I agree that get has to be nice to use. I think T and panic! is by far the
best option. Both Option and Result communicate that the alternative answer
is acceptable and happens, which is not the case in this case. The
alternative means out of bounds, should always indicate a bug, hence should
always halt the program and give maximal help in fixing the bug.

Hi Qin :)

Nobody said that unsafe per se is evil; but a significant reason to use
Rust is the guarantees it makes possible. A library that throws those away
by, for example by exposing unsafe interfaces as safe, is simply less
useful.

I use Python every day, and I've used Matlab on serious projects. I believe
Rust with nice libraries has the potential to be much much nicer than
either, though Python is pretty darn nice, and Matlab has a nice
environment. But Matlab is not friendly for maintaining and reusing
libraries, because the abstractions are too weak and leaky, and when
performance becomes important, Python degrades gracefully using Cython...
but it degrades. If SciRust eventually provides Rust with the equivalent of
Numpy's ndarray and scipy's matrix, except with all the safety and
efficiency that Rust makes possible, it would become a wonderful
environment for algorithm development.

Back to concrete stuff: I am not an expert in rust by a long shot. I think
that this particular area of the best methods to combine efficiency and
safety are still in flux, but a lot has changed for the better in the last
year. If you guys are willing, I think it is worth keeping open eyes for
the best design.

On Tue, Aug 4, 2015 at 11:59 AM, Shailesh Kumar [email protected]
wrote:

@qinwf https://github.com/qinwf agreed. The idea I think is to see what
parts of the code can be rewritten using standard Safe Rust without losing
efficiency.


Reply to this email directly or view it on GitHub
#15 (comment).

Daniel Vainsencher

@shailesh1729
Copy link
Contributor

Here are a couple of interesting iterator ideas.

a) A column wise cell iterator. This iterates all the entries in the matrix column wise. Since the matrix is stored in column major fashion, hence this is expected to run fast. Although already there in CellIterator but mentioning here to compare with the next one.
b) A row wise cell iterator. A naive implementation would run slow since matrix is stored in column major fashion. But, the iterator can probably hold a temporary transpose in memory. Now, it doesn't need to compute the transpose in advance. If we look at the block based transpose implementation (in matrix_transpose.rs), the iterator could just create a black matrix in advance and fill in the blocks on need basis during iteration. Typical Haskell style lazy computation.
c) A zig zag cell iterator. This goes like a_11, a_12, a_21, a_13 a_22, a_31 and so on. There are several applications of such an iterator. e.g. zig zag scanning module in JPEG / MPEG image compression.

In all these cases, the iterator could be created for any submatrix of the original matrix. So the iterator constructor will take (row, column, num_rows, num_columns) as arguments to identify the submatrix on which iteration would happen.

One simple application of (a) and (b) is in are_transpose function in matrix_transpose.rs.

Do these sound useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants