Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match speed of libvorbis #2

Open
est31 opened this issue Sep 1, 2016 · 11 comments
Open

Match speed of libvorbis #2

est31 opened this issue Sep 1, 2016 · 11 comments

Comments

@est31
Copy link
Member

est31 commented Sep 1, 2016

atm we are two times slower than libvorbis. We need to be at least as fast as them.

@est31
Copy link
Member Author

est31 commented Sep 1, 2016

Maybe there is some improvement doable in huffman tree decoding? No idea.

@est31
Copy link
Member Author

est31 commented Oct 3, 2016

Note about current speed: it ranges between 1.6x and 1.8x for floor1 files, and its faster for floor0 files, but those don't really matter as there are almost no files with floor 0.

@est31
Copy link
Member Author

est31 commented Oct 3, 2016

And part of the speed improvement was thanks to changes between rust 1.11 and 1.12 compilers.

@est31
Copy link
Member Author

est31 commented Oct 21, 2016

Wow, seems recent changes in rustc have lead to some serious speed improvement. As of rust nightly compiler 2016-10-18, lewton is only 1.18 to 1.25 as slow as libvorbis.

@est31
Copy link
Member Author

est31 commented Oct 21, 2016

(note: I'm always comparing the "Overall ratio of difference" output of cargo run --release bench of the cmp tool).

@est31
Copy link
Member Author

est31 commented Oct 21, 2016

mhh, seems it has the same performance on Rust 1.12.1, so its caused by something else? No idea. Either way, its really good.

@est31
Copy link
Member Author

est31 commented May 7, 2017

As of rustc 1.19.0-nightly (f4209651e 2017-05-05), the factor is around 1.09 to 1.12.

@est31
Copy link
Member Author

est31 commented Aug 26, 2017

With rustc 1.21.0-nightly (2aeb5930f 2017-08-25), the factor is between 1.05 and 1.06.

@ashthespy
Copy link

Have there been some recent regressions?
I was curious so ran the comparison with rustc 1.30.0 (da5f414c2 2018-10-24) and the latest master (0.9.3):

$ cargo run --release bench
    Finished release [optimized] target(s) in 0.58s
     Running `target/release/cmp bench`

Comparing speed for bwv_1043_vivace.ogg : libvorbis=0.6495s we=0.8464s difference=1.30x
Comparing speed for bwv_543_fuge.ogg    : libvorbis=0.9369s we=1.3493s difference=1.44x
Comparing speed for maple_leaf_rag.ogg  : libvorbis=0.2593s we=0.3801s difference=1.47x
Comparing speed for hoelle_rache.ogg    : libvorbis=0.4680s we=0.6724s difference=1.44x
Comparing speed for thingy-floor0.ogg   : libvorbis=0.2157s we=0.2524s difference=1.17x

Overall time spent for decoding by libvorbis: 2.5293s
Overall time spent for decoding by us: 3.5007s
Overall ratio of difference: 1.38x

@est31
Copy link
Member Author

est31 commented Nov 1, 2018

@ashthespy I'm not sure where this comes from. This slow behaviour happens on rustc 1.20 stable taken from rustup as well, so it isn't a regression of rustc itself or of llvm. It might be some improvement in how gcc optimizes: libvorbis is usually taken from the OS so it's compiled via your OS compiler, which is usually gcc, while lewton is compiled using rustc + llvm. To get a fair comparison, one would have to compare to clang of the same version that the rustc is coming from.

@fdoyon
Copy link

fdoyon commented Jan 24, 2019

Most of the performance delta is due to the transient Vec and SmallVec allocations, realloc, and drops.
Here is a trace you can open with Instrument on MacOS.
alloc trace.trace.zip

Please see my comments on the allocation issue regarding the need for an API and design change to solve this issue efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants