Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add benchmarks #17

Open
ev-br opened this issue Mar 16, 2016 · 17 comments
Open

add benchmarks #17

ev-br opened this issue Mar 16, 2016 · 17 comments

Comments

@ev-br
Copy link
Owner

ev-br commented Mar 16, 2016

and actually compare to, say, dok_matrix.

@pv
Copy link

pv commented Mar 28, 2016

Probably best to compare against pysparse.ll_mat

@pv
Copy link

pv commented Mar 28, 2016

Also, memory usage would be important to understand and document

@ev-br
Copy link
Owner Author

ev-br commented Mar 28, 2016

Very crude for now, just %timeit and %memit, using the example from pysparse docs:
https://github.com/ev-br/sparr/blob/master/benchmarks.ipynb

Basically, this is now about 2x slower than ll_mat for single-element __setitem__ + python looping overhead. The memory consumption is not great, but not terrible either, at least on this problem size.

@pv
Copy link

pv commented Mar 28, 2016

Also see scipy/scipy#6004

@pv
Copy link

pv commented Mar 28, 2016

The memory usage may be a non-trivial disadvantage for std::map

@ev-br
Copy link
Owner Author

ev-br commented Mar 28, 2016

It could be, yes. Like I was saying over at scipy issue, I wonder what's the deal with the Wikipedia link map: it's being invoked for both sides of the argument.

@ev-br
Copy link
Owner Author

ev-br commented Mar 28, 2016

On a more serious note, there's no question that a sorted vector of sorted vectors wins if there's an idea of the number of rows/columns. Otherwise it's either reallocation or back to essentially the same memory as std::map, no?

@maciejkula
Copy link

Vec of vecs has a per-row overhead, whereas std::map has a per-entry overhead. This matters if you have a lot of entries.

@ev-br
Copy link
Owner Author

ev-br commented Mar 28, 2016

But you have to reallocate vectors if there's an insertion into a row which was previously empty and there were non-empty rows from both sides of it?

@maciejkula
Copy link

In scipy/scipy#6004 I pre-allocate all the row vectors, so I pay the per-vector overhead up-front.

@ev-br
Copy link
Owner Author

ev-br commented Mar 28, 2016

Makes sense --- so you require an a priori estimate of the number of rows?

@maciejkula
Copy link

I just realised you've been working on this recently, apologies for stepping on your toes! I scanned scipy PRs so as to not duplicate work, but yours was separate :(

@maciejkula
Copy link

Yes. This can be relaxed with some work.

@ev-br
Copy link
Owner Author

ev-br commented Mar 28, 2016

Re: toes: No problem whatsoever! Hope you don't feel I'm stepping on yours.

@ev-br
Copy link
Owner Author

ev-br commented Mar 28, 2016

In your solution with preallocation: you can probably just have a second vector with row indices. The overhead is at most nrows, but there's no reallocation at all.

@maciejkula
Copy link

Yep, or have a resize call that allocates new vectors when necessary.

@ev-br
Copy link
Owner Author

ev-br commented Apr 12, 2016

#40 (comment) for some numbers on constructing a Possoin 2D matrix from pysparse docs.

I've no idea why memory benchmarks report all zeros. From peakmem, fast_lil_mat actually looks good :-). Timewise, ll_mat wins hands down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants