-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCG works slowly and type warning questions #7
Comments
By the way, I thought it may be an issue of julia> function testn(n, rng)
for i = 1:n
rand(rng);
end
end
julia> @time testn(100000, Base.Random.GLOBAL_RNG)
0.024788 seconds (3.13 k allocations: 131.090 KB)
julia> @time testn(1000000, Base.Random.GLOBAL_RNG)
0.007132 seconds (4 allocations: 160 bytes)
julia> @time testn(10000000, Base.Random.GLOBAL_RNG)
0.065980 seconds (4 allocations: 160 bytes)
julia> @time testn(100000, r)
0.026353 seconds (200.00 k allocations: 3.052 MB)
julia> @time testn(1000000, r)
0.126789 seconds (2.00 M allocations: 30.518 MB, 10.10% gc time)
julia> @time testn(10000000, r)
0.786922 seconds (20.00 M allocations: 305.176 MB, 2.50% gc time) The allocations are also... |
Well, it is called BigCrush for a reason.. But those timings do look slow: have you tried profiling? |
Yes, the backtraces seem usual. |
What does |
|
Hmm that's not much help, I guess because of the inlining. Since the code is quite short, the best option might be to write it out all the code in 1 function and see if you can make it faster, then compare it with what you have here. |
All right |
Actually I have looked at the And I noticed that with the convention of |
code_native(RNG.PCG.pcg_random, (RNG.PCG.PCGStateSetseq{UInt64,Val{:XSH_RR}},))
.text
Filename: bases.jl
Source line: 0
pushq %rbp
movq %rsp, %rbp
Source line: 231
movq (%rdi), %rcx
Source line: 206
movabsq $6364136223846793005, %rax # imm = 0x5851F42D4C957F2D
imulq %rcx, %rax
addq 8(%rdi), %rax
movq %rax, (%rdi)
Source line: 50
movq %rcx, %rax
shrq $18, %rax
xorq %rcx, %rax
shrq $27, %rax
shrq $59, %rcx
Source line: 12
movl %ecx, %edx
negl %edx
movl %eax, %esi
shrl %cl, %esi
movb %dl, %cl
shll %cl, %eax
orl %esi, %eax
popq %rbp
retq
nop I don't have much knowledge about assembly but I think this seems simple and clear enough? Oh, OK, then I found that the |
I don't think it's just the After warmup, I get:
which suggests there is some sort of type instability or codegen problem here I think you're making your code way more complex than it needs to be, and this could be causing performance problems somewhere (it certainly makes it harder to find the performance problems). I would suggest:
|
wow, this is amazing... I didn't realize the parametric types could affect so much on performance... |
I don't know if it is due to parametric types, but they do make it harder
to figure out what is causing it
|
Yes, I'm trying to figure it out, and rewriting some codes to make it more clear. I have a problem while replacing the |
The output type is usually half the state, isn't it? So one way would be to define a function halfwidth(::Type{UInt64}) = UInt32
halfwidth(::Type{UInt32}) = UInt16 etc., and then call this inside the function, i.e. O = halfwidth(typeof(state))
rotr((((state >> p1) $ state) >> p2) % O,
(state >> p3) % O) |
OK, I see. |
julia> @time foo(0x185706b82c2e03f8, 10_000_000)
0.064160 seconds (5 allocations: 176 bytes)
0x004c507bc616b135
julia> @time bar(r,10_000_000)
0.034669 seconds (5 allocations: 176 bytes)
0x004c495e0178e5df It seems fixed now. |
I tested one common PCG declared and initialized as:
It passed all the tests of the TestU01 big crush battery as expected, but the long CPU time frightened me. It took over 21 hours on my laptop.
For comparison, the test results of
Base.Random.rand
are:Obviously it's unacceptable with such slow speed. I've tried to use
@code_warntype
to find where could be the bottleneck, and it shows some strange positions of codes are analyzed to returnAny
, which was thought to be optimized. For example,How can I fix this problem?
The text was updated successfully, but these errors were encountered: