Benchmark behaviour on defined class is slow. #3

bon · 2016-09-01T14:05:40Z

The benchmarks provided are for methods on the built-in lisp types number, fixnum and double-float. To test the behaviour on defined classes we added a simple boxing class and found that peformance degraded when using inlined-generic-functions, inlined. We found the following numbers of processor cycles for the four methods in playground.lisp, respectively:

Experiment on sbcl 1.3.5.24

See bon@8b6e4d5

So my question is whether this indicates that inlined-generic-functions only speed up on built-in types and not on defined classes?

The text was updated successfully, but these errors were encountered:

guicho271828 · 2016-09-01T17:12:39Z

it seems normal-plus is running w/o boxing, right?

bon · 2016-09-01T17:23:49Z

Correct! Fixed in bon@76d1eb6

Processor cycles are now

guicho271828 · 2016-09-01T20:45:04Z

phew.

guicho271828 · 2016-09-02T22:40:31Z

I just tested your version. On my machine, the result is still in favor of the inlined version.

Evaluation took:
  0.001 seconds of real time
  0.004000 seconds of total run time (0.004000 user, 0.000000 system)
  400.00% CPU
  638,640 processor cycles
  131,024 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  608,634 processor cycles
  163,808 bytes consed

Evaluation took:
  0.003 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  0.00% CPU
  4,543,020 processor cycles
  655,184 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  389,169 processor cycles
  163,808 bytes consed

What is this difference? In your result I-g-function is performing better, but not much better.
I use SBCL 1.3.8 on roswell on

$ uname -a
Linux guicho-x61 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo
...
model name  : Intel(R) Core(TM)2 Duo CPU     T7100  @ 1.80GHz
...

bon · 2016-09-03T15:13:20Z

For me the numbers of cycles vary wildly from run to run. Sometimes the igf gets a little quicker, sometimes slower. One example is shown below.

But the more interesting question is why the igf showed a 10x speedup on numbers but hardly any difference on defined classes? Of course I would be very happy to see a 10x speedup on defined classes too!

$ cat /proc/cpuinfo  | ag 'model name' | head -1
model name  : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
$ uname -a
Linux tie 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux
$ ros use sbcl
$ ~/.roswell/impls/x86-64/linux/sbcl/1.3.9/bin/sbcl --version
SBCL 1.3.9
$ ros run
$ rlwrap ros run
* (ql:quickload :inlined-generic-function)

...

* (load "benchmark.lisp")

...

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  424,334 processor cycles
  131,024 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  362,358 processor cycles
  163,792 bytes consed

Evaluation took:
  0.001 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  0.00% CPU
  2,060,160 processor cycles
  655,200 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.003333 seconds of total run time (0.003333 user, 0.000000 system)
  100.00% CPU
  493,287 processor cycles
  163,792 bytes consed

guicho271828 · 2016-09-03T19:08:05Z

the reason of not achieving 10x speedup is due to the type information and the cost of slot access.

The contents slot of box is not typed, so the (+ (contents a) b) part is always calling a generic-+, not the optimized machine assembly. You should check the disassembly result.
The accessor contents is a normal generic function. So the slot access is slow.

Imagine the total cost is 10X for normal GF and X for IGF. Above two factor adds two overheads, resulting in 10X+A+B vs X+A+B. Then obviously 10 times speedup is not achievable since A+B could be very large.

guicho271828 · 2016-09-16T21:22:18Z

I updated the environment and noticed that the examples in playground.lisp getting slow. It looks like the function is prevented from inlining.

guicho271828 · 2016-09-16T21:25:34Z

(push :inline-generic-function *features*) still successfully forces the functions being inlined, but I don't like this solution...

bon mentioned this issue Apr 29, 2017

Interacts badly with package loading #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark behaviour on defined class is slow. #3

Benchmark behaviour on defined class is slow. #3

bon commented Sep 1, 2016

guicho271828 commented Sep 1, 2016

bon commented Sep 1, 2016

guicho271828 commented Sep 1, 2016

guicho271828 commented Sep 2, 2016

bon commented Sep 3, 2016

guicho271828 commented Sep 3, 2016 •

edited

Loading

guicho271828 commented Sep 16, 2016

guicho271828 commented Sep 16, 2016

Benchmark behaviour on defined class is slow. #3

Benchmark behaviour on defined class is slow. #3

Comments

bon commented Sep 1, 2016

guicho271828 commented Sep 1, 2016

bon commented Sep 1, 2016

guicho271828 commented Sep 1, 2016

guicho271828 commented Sep 2, 2016

bon commented Sep 3, 2016

guicho271828 commented Sep 3, 2016 • edited Loading

guicho271828 commented Sep 16, 2016

guicho271828 commented Sep 16, 2016

guicho271828 commented Sep 3, 2016 •

edited

Loading