Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure cpu-time and real-time and report both #94

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Jan 18, 2018

This still needs some cleanup and the history is a bit messy.
In essence this measures both the time actually spend running on the cpu
and the time passed. If the OS migrates a process the cpu-time won't increase,
while real-time will. Similar for other things like calling into the kernel.

  • Update docs

src/execution.jl Outdated
@@ -314,21 +314,24 @@ function generate_benchmark_definition(eval_module, out_vars, setup_vars, core,
$(setup)
__evals = __params.evals
__gc_start = Base.gc_num()
__start_time = time_ns()
__start_realtime = BenchmarkTools.Timers.realtime()
__start_cputime = BenchmarkTools.Timers.cputime()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interpolate BenchmarkTools here.

Also, this is awesome!

const RUSAGE_SELF = Cint(0)
const RUSAGE_CHILDREN = Cint(-1)

struct timeval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and rusage should use the standard Julia naming convention of CamelCase for types. So this should be TimeVal and the other should be RUsage.

src/execution.jl Outdated
@@ -409,7 +412,7 @@ macro btime(args...)
$trialmin = $BenchmarkTools.minimum($trial)
$trialallocs = $BenchmarkTools.allocs($trialmin)
println(" ",
$BenchmarkTools.prettytime($BenchmarkTools.time($trialmin)),
$BenchmarkTools.prettytime($BenchmarkTools.realtime($trialmin)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print cputime here as well? or cpuratio?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpuratio seems like a good idea

src/groups.jl Outdated
@@ -81,7 +81,10 @@ Base.median(group::BenchmarkGroup) = mapvals(median, group)
Base.min(groups::BenchmarkGroup...) = mapvals(min, groups...)
Base.max(groups::BenchmarkGroup...) = mapvals(max, groups...)

Base.time(group::BenchmarkGroup) = mapvals(time, group)
import Base: time
@deprecate time(group::BenchmarkGroup) realtime(group)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @deprecate should be moved to a central location

src/trials.jl Outdated
@@ -4,53 +4,59 @@

mutable struct Trial
params::Parameters
times::Vector{Float64}
realtimes::Vector{Float64} #TODO: rename to realtimes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove todo


err = ccall(:GetProccessTimes, Cint, (HANDLE, Ref{FileTime}, Ref{FileTime}, Ref{FileTime}, Ref{FileTime}),
proc, creation_time, exit_time, kernel_time, user_time)
# TODO: Error handling
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add error handling

end

@inline function realtime()
# counter = Ref{UInt64}()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete dead code

@inline function cputime()
ru = Ref{rusage}()
err = ccall(:getrusage, Cint, (Cint, Ref{rusage}), RUSAGE_SELF, ru)
# TODO error check
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add error handling

memory::Symbol
end

function TrialJudgement(r::TrialRatio)
ttol = params(r).time_tolerance
mtol = params(r).memory_tolerance
return TrialJudgement(r, judge(time(r), ttol), judge(memory(r), mtol))
return TrialJudgement(r, judge(realtime(r), ttol), judge(cputime(r), ttol), judge(memory(r), mtol))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrevels I think using the same tolerance for cputime and realtime should be ok, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the tolerance is a fudge number anyway.

@@ -10,7 +10,13 @@ if VERSION >= v"0.7.0-DEV.3052"
using Printf
end

const BENCHMARKTOOLS_VERSION = v"0.2.2"
const BENCHMARKTOOLS_VERSION = v"0.3.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ararslan Note that I think this is a big enough change the we should bump the minor number, but I can remove this from this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have in this PR. Even though this is recorded in the JSON stuff used by Nanosoldier, the version isn't actually used by anything AFAIK. Thanks for the heads up!

Base.show(io::IO, t::TrialEstimate) = print(io, "TrialEstimate(", prettytime(time(t)), ")")
Base.show(io::IO, t::TrialRatio) = print(io, "TrialRatio(", prettypercent(time(t)), ")")
Base.show(io::IO, t::TrialJudgement) = print(io, "TrialJudgement(", prettydiff(time(ratio(t))), " => ", time(t), ")")
Base.show(io::IO, t::Trial) = print(io, "Trial(", prettytime(realtime(t)), ")")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these show methods include cputime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe cpuratio

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On playing with it I think it is not necessary.

@vchuravy
Copy link
Member Author

The code in timers is mostly taken from google/benchmark and that is under Apache v2.0. The code is fairly basic, but in order to make sure I retained the copyright notice as mandated by the Apache License.
So the whole Timers module is under Apache v2

@ararslan
Copy link
Member

ararslan commented Jan 19, 2018

I don't think the Apache license mandates that derived works be licensed under Apache; as I understand it it's like MIT in that you just need to preserve the original license when distributing the original code (in whole or in part). So in this case I think it's worth acknowledging that the code is derived from Google stuff but I don't think it requires a separate license.

@vchuravy
Copy link
Member Author

The Apache license does mandate that derived works keep the license notice intact and include a copy of the license and mention all modifications...

"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.

a; You must give any other recipients of the Work or Derivative Works a copy of this License; and
b; You must cause any modified files to carry prominent notices stating that You changed the files; and
c; You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and

So yeah, I would go for the conservative option and consider a translation of code a derivative work.

@ararslan
Copy link
Member

Right, we just need to include or link to their license, we don't have to license our own under Apache.

@vchuravy
Copy link
Member Author

That is contrarian to my understanding of the license.

I translated code under the Apache license to Julia, therefore creating a derivative work. According to the license I must a; keep the license notice, b; include a prominent notice that I changed the files and c; I must give any recipient a copy of the license.

So the code under src/timers is Apache licensed and everything else is MIT.

@vchuravy vchuravy changed the title [RFC/WIP] Measure cpu-time and real-time and report both Measure cpu-time and real-time and report both Jan 19, 2018
@ararslan
Copy link
Member

It's not a copyleft license; derivative works don't need to bear the same license. See https://www.whitesourcesoftware.com/whitesource-blog/top-10-apache-license-questions-answered/.

@vchuravy
Copy link
Member Author

Ok, changed the license to MIT, while complying with the Apache requirement of retaining the license notices.
This is pretty much ready to merge from my side.

Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks Valentin, this is really exciting!

@vchuravy
Copy link
Member Author

I would appreciate if people on arcane platforms like Windows or Mac could give this I try. This should work on more reasonable platforms like a current FreeBSD.
But I personally have only tested this on Linux 64bit.

src/timers/windows.jl Outdated Show resolved Hide resolved
@vchuravy
Copy link
Member Author

Let's hold merging this for now. I got a report that this reports odd things on Windows.

doc/manual.md Outdated Show resolved Hide resolved
@ararslan
Copy link
Member

Confirmed that tests pass on current Julia master on FreeBSD 11.1.

@@ -96,11 +110,11 @@ returns the minimum time in seconds.

```julia
julia> @btime sin(1)
13.612 ns (0 allocations: 0 bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the GC percentage is worth showing when there are 0 allocations

This code is based on the `google/benchmarks` repository.

- `realtime` is a monotonic high-resolution clock provided by the kernel.
  In most cases this is equivalent to `time_ns` in `Base`.
- `cputime` is the time that the process was actually running on a processor.
  This excludes time spend in the kernel and time spend descheduled.

The `Timers` module includes a fair-amount of platform specific code so the
implementation is split between a generic *unix* implementation based on the
high-resoltion clocked provided through `clock_gettime`. On *darwin* prior to
10.12.0 the kernel only provides `CLOCK_MONOTONIC` making it less useful and
we have to fallback onto `getRUsage`.

We intentional don't do error handling for the timer calls. Which may or may
not be a wise choice.
Renames `times` and `time` to `realtime` and `realtimes` throughout and adds
`cputime` and `cputimes`. Real-time is the actual monotonic run time of the code
and cpu-time is the time spend in userspace running on processor.

```
julia> @Btime sin(1)
  7.572 ns [100.00% CPU, 0.00% GC] (0 allocations: 0 bytes)
0.8414709848078965

julia> @Btime sleep(1)
  1.002 s [0.01% CPU, 0.00% GC] (5 allocations: 192 bytes)
```
`:GetProcessTimes` has a resolution of `100ns` which is much worse than the resolution of the high-precision clock.
@vchuravy
Copy link
Member Author

vchuravy commented Jan 23, 2018

Ok hunted down the Windows issue and updated the docs. It turns out the accuracy of the windows counters is only 100ns and so the information becomes fairly useless for short functions.
Jarrett and I discussed a variety of possibilities to deal with this, but in the end we decided to simply turn off printing for cpu-time on Windows.

@vchuravy
Copy link
Member Author

I was informed that I missed https://msdn.microsoft.com/en-us/library/windows/desktop/ms684929(v=vs.85).aspx (https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx) and so there is hope that we get accurate timings on windows after all.

@vchuravy
Copy link
Member Author

vchuravy commented Jan 24, 2018

So I am currently blocked on converting the cycles to a reasonable time.

  • tsc_invariant/tsc_constant are necessary http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/
    tsc_invariant is on Nehalem-and-later
  • Getting the TSC frequency is tricky.
  • Reading MSR_PLATFORM_INFO [15:8] through the rdmsr instruction, which can't be executed in user-space.
  • Reading through the Intel Manual I noticed that there is a section for cpuid Time Stamp Counter and Nominal Core Crystal Clock Information Leaf which has the necessary information sometimes...
function cpuid(infoType)
         info = Ref{NTuple{4, UInt32}}()
         ccall(:jl_cpuid, Cvoid, (Ref{NTuple{4, UInt32}}, UInt32), info, infoType)
         info[]
end
tsc_invariant() = cpuid(0x80000007)[4] == 0x00000100
function tsc_freq()
          info = cpuid(0x15)
          if info[1] == 0 || info[3] == 0
            return nothing
          else
             return info[2] / info[1] * info[3]
          end
end

and on my machine info[3] == 0
The manual specifies:

TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] )/ CPUID.15H:EAX[31:0] + K

@jrevels
Copy link
Member

jrevels commented Jan 25, 2018

So I am currently blocked on converting the cycles to a reasonable time.

We can always merge this without the Windows-specific improvements (and just print N/A or w/e for cpu-time on Windows), and include the improvements in a future PR once they're ready.

a `Measurement` is a a given timestamp (which can measure multiple things)
a `MeasurementDelta` is the difference between two `Measurements` and
currently trackes real-time and the ratio of cpu-time/real-time.

All the conversion and calculation is part of MeasurementDelta so that it is 'cheap'
to obtain a Measurement. As an example currently we convert every `time_ns` to Float64
which we can delay using this notion of a MeasurementDelta.
@vchuravy
Copy link
Member Author

I have to double check that the frequency of QueryProcessCycleTime and QueryPerformanceCounter are the same, but otherwise I found a workable solution for Windows.

I will let this stew for a bit and focus other work.

function Measurement()
proc = currentprocess()
time = perfcounter()
cpu = cpucycles()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpucyles(proc)

@goretkin
Copy link

This feature is very exciting. Is there any technical issue that is preventing this PR from being merged?

It seems even as far as CPU Time goes, there may be multiple measures in certain OSes (e.g. https://linux.die.net/man/2/getrusage mentions RUSAGE_THREAD) for measuring per-thread times, too, which might suggest there's room for more features in future PRs.

@vchuravy
Copy link
Member Author

This feature is very exciting. Is there any technical issue that is preventing this PR from being merged?

I think I got annoyed at Windows, but I don't recall the details anymore.

@gdalle
Copy link
Collaborator

gdalle commented Jun 13, 2023

Hey @vchuravy, you made me a maintainer, now you deal with the fallout 😈
This is a pretty old PR with a lot of unresolved conversations, how mergeable do you think it is (provided we can update the branch)?
If the Windows version is not ready, I agree with what @jrevels said:

We can always merge this without the Windows-specific improvements (and just print N/A or w/e for cpu-time on Windows), and include the improvements in a future PR once they're ready.

@vchuravy
Copy link
Member Author

vchuravy commented Jun 14, 2023

I think having something like this would still be valuable, over the years I have been debating what the best level of integrations might be, experimenting with LIKWID.jl and LinuxPerf.jl (@carstenbauer ) might have opinions as well.

You can see this PR as an item on my to-do list (but its pretty much at the bottom of the priority queue), or as an invitation and initial stub for someone else to implement it.

@gdalle gdalle marked this pull request as draft June 14, 2023 05:27
@willow-ahrens
Copy link
Collaborator

What's the difference between Timers and https://github.com/schmrlng/CPUTime.jl ?

@willow-ahrens
Copy link
Collaborator

@Zentrik what do you think of this PR?

@vchuravy
Copy link
Member Author

vchuravy commented May 11, 2024

What's the difference between Timers and https://github.com/schmrlng/CPUTime.jl ?

The code here precedes that package, and looking at the details my old code here should have lower overhead than the code in CPUTime.jl

@willow-ahrens
Copy link
Collaborator

Would there be a maintainability difference? Should we contribute this code to that package?

@giordano
Copy link
Member

I think having something like this would still be valuable

For what is worth, these days I've been looking into bad scaling observed on certain system and using code from this PR to measure cpu-time vs real-time was very useful to gain more insights about was going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants