avoid dense vector operations in local solver #29

martinjaggi · 2015-11-12T14:41:31Z

for good performance on sparse, we shouldn't use any dense vector operation in a single coordinate update. such an update needs to have cost only (number of non-zeros of the returned feature vector by the oracle), which can be very sparse.

e.g. we should remove the dense temp variables here for example,
https://github.com/dalab/dissolve-struct/blob/master/dissolve-struct-lib/src/main/scala/ch/ethz/dalab/dissolve/optimization/DBCFWSolverTuned.scala#L961

we can check how that affects performance on a large sparse binarySVM dataset.

jderiu · 2015-11-12T15:43:04Z

I think the problem is not the operation performance itself. I think it's rather the fact that a + b returns a new vector. Then the old weight vector object is replaced by the new one, which over time causes the garbage collector to be called.

I replaced the updateWeights in the StructSVMModel class by a method which adds a vector to the weight without creating a new vector-object. It seems to execute faster.

Before i noticed that the localModel.updateWeights(tempWeights2) operation took 16ms every 5 or 6 iterations, which indicated that there was garbage collection involved.

When using the new methods i created (one for addition and one for subtraction) this part of the code takes 0ms.

I would like to test my methods some more, if they seem ok. I could to a pull request sometime tomorrow.

martinjaggi · 2015-11-12T15:46:18Z

thanks!

can you check how it affects the binarySVM example performance? the RCV1
dataset would be recommended for the check, since that's very sparse.

On Thu, Nov 12, 2015 at 4:43 PM, jderiu [email protected] wrote:

I think the problem is not the operation performance itself. I think it's
rather the fact that a + b returns a new vector. Then the old weight vector
object is replaced by the new one, which over time causes the garbage
collector to be called.

I replaced the updateWeights in the StructSVMModel class by a method which
adds a vector to the weight without creating a new vector-object. It seems
to execute faster.

Before i noticed that the localModel.updateWeights(tempWeights2) operation
took 16ms every 5 or 6 iterations, which indicated that there was garbage
collection involved.

When using the new methods i created (one for addition and one for
subtraction) this part of the code takes 0ms.

—
Reply to this email directly or view it on GitHub
#29 (comment)
.

jderiu · 2015-11-13T10:20:11Z

I checked the performance. I ran it on the RCV1 dataset. The old implementation takes approx. 1200ms per round. The new implementation takes only 750ms per round.

What is striking however is that each iteration in the mapper takes longer as the previous. So that the first iteration takes 1ms and the last iteration (after passing through all 16k datapoints) takes 500ms.
I think there is the same kind of problem since each vector addition and subtraction creates new objects which have to be garbage collected. In these cases however I don't quite know how to adapt the code yet.. Maybe it would make sense to optimize the code to avoid the creation of new objects

martinjaggi · 2015-11-13T10:26:18Z

yes one has to be careful what kind of adding operation will actually result, when adding a sparse vector (the feature vector) to a dense (w).
in your experiments did you keep w dense? (and w_i sparse?)

jderiu · 2015-11-13T10:42:14Z

yes w is a dense vector and w_i is sparse.

the increase of duration in each iteration was due to a small bug I introduced..
So after rerunning the experiments I get that the old implementation, which replaced the weight vector (which is dense) by a new one, takes 1200ms per round.
If one takes a look at the mapper iterations one can notice that most iterations take 0ms, the others take about 15ms.

The new implementation which add and subtracts from the weight vector without creating a new object takes 400ms per round. Here one can notice that all iterations in the mapper take 0ms.

jderiu · 2015-11-13T10:56:25Z

I also ran the it on my data. Which has 9600 datapoins and feature vectors with 220k dimensions.
The old implementation takes 22000ms per round the new one 1200ms.
If you want I can do a pull request.

martinjaggi · 2015-11-13T11:02:33Z

yes, please make a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid dense vector operations in local solver #29

avoid dense vector operations in local solver #29

martinjaggi commented Nov 12, 2015

jderiu commented Nov 12, 2015

martinjaggi commented Nov 12, 2015

jderiu commented Nov 13, 2015

martinjaggi commented Nov 13, 2015

jderiu commented Nov 13, 2015

jderiu commented Nov 13, 2015

martinjaggi commented Nov 13, 2015

avoid dense vector operations in local solver #29

avoid dense vector operations in local solver #29

Comments

martinjaggi commented Nov 12, 2015

jderiu commented Nov 12, 2015

martinjaggi commented Nov 12, 2015

jderiu commented Nov 13, 2015

martinjaggi commented Nov 13, 2015

jderiu commented Nov 13, 2015

jderiu commented Nov 13, 2015

martinjaggi commented Nov 13, 2015