-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event performance branch #4
base: core-8-5-branch
Are you sure you want to change the base?
Conversation
3571421
to
71888ad
Compare
4c20ec4
to
5a9dc5e
Compare
I rebased it over branch The results of the tests and the comparison between both branches is attached below:Note that you'll need several cafe cups, if you'll run the performance test Summaries of the performance test-cases:Click here to expand ********************************************************************************
* exec single events: after / update, after / cancel, etc
********************************************************************************
Total 9 cases:
-26.977846 µs/# 3618786 # 822707.899 #/sec 4398.628 nett-ms
+4.932550 µs/# 11167487 # 2662542.096 #/sec 4194.295 nett-ms
Average:
-2.997538 µs/# 402087 # 822708 #/sec 488.736 nett-ms
+0.548061 µs/# 1240831 # 2662539 #/sec 466.033 nett-ms
Min:
-0.255511 µs/# 1763501 # 3913724 #/sec 450.594 nett-ms
+0.173639 µs/# 2487372 # 5759058 #/sec 431.906 nett-ms
Max:
-6.339246 µs/# 78527 # 157747 #/sec 497.802 nett-ms
+1.163969 µs/# 419694 # 859129 #/sec 488.511 nett-ms
********************************************************************************
* event random access: (5000 events)
********************************************************************************
Total 2 cases:
-38.890300 µs/# 54262 # 54342.536 #/sec 998.518 nett-ms
+1.242782 µs/# 1541390 # 1609295.848 #/sec 957.804 nett-ms
Average:
-19.445150 µs/# 27131 # 54343 #/sec 499.259 nett-ms
+0.621391 µs/# 770695 # 1609296 #/sec 478.902 nett-ms
Min:
-14.9376 µs/# 33411 # 66945.2 #/sec 499.080 nett-ms
+0.620512 µs/# 771739 # 1611573 #/sec 478.873 nett-ms
Max:
-23.9527 µs/# 20851 # 41748.9 #/sec 499.438 nett-ms
+0.622270 µs/# 769651 # 1607018 #/sec 478.931 nett-ms
********************************************************************************
* event random access: (by 50000 events)
********************************************************************************
Total 2 cases:
-633.135000 µs/# 3191 # 3188.010 #/sec 1000.938 nett-ms
+1.657629 µs/# 1168261 # 1206857.510 #/sec 968.019 nett-ms
Average:
-316.567500 µs/# 1595 # 3187 #/sec 500.469 nett-ms
+0.828815 µs/# 584130 # 1206855 #/sec 484.010 nett-ms
Min:
-286.314 µs/# 1748 # 3492.7 #/sec 500.476 nett-ms
+0.815207 µs/# 593414 # 1226682 #/sec 483.755 nett-ms
Max:
-346.821 µs/# 1443 # 2883.3 #/sec 500.462 nett-ms
+0.842422 µs/# 574847 # 1187052 #/sec 484.264 nett-ms
+********************************************************************************
+ conditional updates / vwait (suitable only in new version)
+********************************************************************************
+Total 8 cases:
+3.069509 µs/# 16129124 # 4532621.189 #/sec 3558.454 nett-ms
+Average:
+0.383689 µs/# 2016140 # 4532618 #/sec 444.807 nett-ms
+Min:
+0.097015 µs/# 4019592 # 10307703 #/sec 389.960 nett-ms
+Max:
+0.728454 µs/# 661526 # 1372771 #/sec 481.891 nett-ms
+********************************************************************************
+ NRT-capability test cases (suitable only in new version)
+********************************************************************************
+Total 23 cases:
+2614.448878 µs/# 5273324 # 464334.839 #/sec 11356.727 nett-ms
+Average:
+113.671690 µs/# 229274 # 464333 #/sec 493.771 nett-ms
+Min:
+0.572487 µs/# 833524 # 1746763 #/sec 477.182 nett-ms
+Max:
+1069.33 µs/# 468 # 935.16 #/sec 500.447 nett-ms
********************************************************************************
* in-between important event by amount of idle events
********************************************************************************
Total 2 cases:
-623384.000000 µs/# 4 # 3.208 #/sec 1246.768 nett-ms
+59872.100000 µs/# 34 # 33.405 #/sec 1017.826 nett-ms
Average:
-311692.000000 µs/# 2 # 3 #/sec 623.384 nett-ms
+29936.050000 µs/# 17 # 33 #/sec 508.913 nett-ms
Min:
-311474.5 µs/# 2 # 3.211 #/sec 622.949 nett-ms
+29932.2 µs/# 17 # 33.409 #/sec 508.848 nett-ms
Max:
-311909.5 µs/# 2 # 3.206 #/sec 623.819 nett-ms
+29939.9 µs/# 17 # 33.400 #/sec 508.978 nett-ms
********************************************************************************
* bulk generation / bulk update, etc *** 10000 events ***
********************************************************************************
Total 14 cases:
-12713501.000000 µs/# 14 # 1.101 #/sec 12713.501 nett-ms
+51240.000000 µs/# 14 # 273.224 #/sec 51.240 nett-ms
Average:
-908107.214286 µs/# 1 # 1 #/sec 908.107 nett-ms
+3660.000000 µs/# 1 # 273 #/sec 3.660 nett-ms
Min:
-12964.0 µs/# 1 # 77.137 #/sec 12.964 nett-ms
+2613.00 µs/# 1 # 382.70 #/sec 2.613 nett-ms
Max:
-2554426 µs/# 1 # 0.391 #/sec 2554.426 nett-ms
+5470.00 µs/# 1 # 182.82 #/sec 5.470 nett-ms
********************************************************************************
* bulk generation / bulk update, etc *** 20000 events ***
********************************************************************************
Total 14 cases:
-64758697.000000 µs/# 14 # 0.216 #/sec 64758.697 nett-ms
+105973.000000 µs/# 14 # 132.109 #/sec 105.973 nett-ms
Average:
-4625621.214286 µs/# 1 # 0 #/sec 4625.621 nett-ms
+7569.500000 µs/# 1 # 132 #/sec 7.569 nett-ms
Min:
-27952.0 µs/# 1 # 35.776 #/sec 27.952 nett-ms
+5428.00 µs/# 1 # 184.23 #/sec 5.428 nett-ms
Max:
-15404162 µs/# 1 # 0.065 #/sec 15404.162 nett-ms
+11231.0 µs/# 1 # 89.039 #/sec 11.231 nett-ms
********************************************************************************
* bulk generation / bulk update, etc *** 40000 events ***
********************************************************************************
Total 14 cases:
-336984486.000000 µs/# 14 # 0.042 #/sec 336984.486 nett-ms
+221911.000000 µs/# 14 # 63.088 #/sec 221.911 nett-ms
Average:
-24070320.428571 µs/# 1 # 0 #/sec 24070.320 nett-ms
+15850.785714 µs/# 1 # 63 #/sec 15.851 nett-ms
Min:
-54846.0 µs/# 1 # 18.233 #/sec 54.846 nett-ms
+10776.0 µs/# 1 # 92.799 #/sec 10.776 nett-ms
Max:
-84109407 µs/# 1 # 0.012 #/sec 84109.407 nett-ms
+22926.0 µs/# 1 # 43.619 #/sec 22.926 nett-ms
********************************************************************************
* bulk generation / bulk update, etc *** 60000 events ***
********************************************************************************
Total 14 cases:
-1015795596.000000 µs/# 14 # 0.014 #/sec 1015795.596 nett-ms
+341742.000000 µs/# 14 # 40.967 #/sec 341.742 nett-ms
Average:
-72556828.285714 µs/# 1 # 0 #/sec 72556.828 nett-ms
+24410.142857 µs/# 1 # 41 #/sec 24.410 nett-ms
Min:
-83546.0 µs/# 1 # 11.969 #/sec 83.546 nett-ms
+16272.0 µs/# 1 # 61.455 #/sec 16.272 nett-ms
Max:
-272914833 µs/# 1 # 0.004 #/sec 272914.833 nett-ms
+34446.0 µs/# 1 # 29.031 #/sec 34.446 nett-ms
******************************************************************************** |
…ubly linked list, because requires handling from both ends of the list)
…d lists, prevents allocating memory twice for the "after" events (use memory inside timer/idle event for the "after" structure), etc.
…th delete callback)
… weak pointer to timer/idle event, used for fast access to the "after" event (cancel, info etc.); test cases extended to cover it additionally
…ed, because changes the blocking time, also if TCL_TIMER_EVENTS|TCL_IDLE_EVENTS not set), so let do that within Tcl_DoOneEvent cycle only (we have registered an event source). [performance] optimization for "after 0" as immediately execution without time (invoke as soon as possible) - generation and invocation of such timers twice faster now. [performance] leave handler-event in the queue as long as pending timers still available (with expired time or immediate timers) by generation lock, resp. changed/not invalidated timer-queue) - so fewer event/allocations and guarantee to be executed within the next event cycle;
…ter 0) that should be executed immediately (no time); normalizes timer, prompt and idle events structures using common TimerEntry structure for all types;
…x - 5x faster now); [win] prevent listen using PeekMessage twice, and no wait anymore for too short timeouts (because windows can wait too long), compare 0µs with up-to 100µs overhead within MsgWaitForMultipleObjectsEx; [bad behavior] process idle events only as long as no other events available (now TclPeekEventQueued will be used to check new events are available in service idle cycle); [enhancement] new option "noidletasks" for command "update", so "update noidle" means "process all events but not idle";
…eEvent using timer marker in the queue and direct call of TclServiceTimerEvents if marker reached (instead of continuous adding handler event, polling it in the queue and removing hereafter); this provides double performance increase in the service cycle;
…ed to process pending events only (without wait), negative value equivalent execution of "vwait" without timeout (infinite); test cases fixed and extended;
… (introduced threshold to prevent sourcing resp. waiting for new events by no-wait). [enhancement] new event type introduced: TCL_ASYNC_EVENTS, command "update" becomes options to process only specified types, resp. to bypass some event types (including -idle/-noidle that in opposite to "idletasks" does not included window events); test cases extended.
…x "vwait ?options? ?timeout? varname". some small improvements and fixing: - Tcl_DoOneEvent can wait for block time that was set with Tcl_SetMaxBlockTime outside an event source traversal, and stop waiting if Tcl_SetMaxBlockTime was called outside an event source (another event occurs and interrupt waiting loop), etc; - safer more precise pre-lookup by options (use TclObjIsIndexOfTable instead of simply comparison of type with tclIndexType); test cases extended to cover conditional "vwait" usage;
non-blocking wait for event - if block-time set outside an event source traversal, use it as timeout, so can return with result 0 (no events);
…ate to nonsignaled after wake-up), avoids unwanted reset if wake-up for some other reasons (timeout/aio/message).
Otherwise depending on the VC-version, context, include-order it can cause: error C2054: expected '(' to follow 'inline'
… now, avoid busy-wait if the rest of wait-time too small and can be neglected); TMR_RES_TOLERANCE can be defined to use wait-tolerance on *nix platforms (currently windows only as relation resp. deviation between default timer resolution 15.600 in exact milliseconds, means 15600/15000 + small overhead); Decreasing of TMR_RES_TOLERANCE (up to 0) makes tcl more "RTS" resp. NRT-capable (very precise wait-intervals, but more CPU-hungry).
…direct retrieving via internal representation (ignore foreign events), test cases extended.
… tolerance (deviation by waiting); several time-independent test-cases optimized (wait shorter now) + some new cases to cover more situations.
…ted in background and it is an idle-event, give enough time to process it (resp. wait until last idle event is done);
5a9dc5e
to
1041322
Compare
rebased to fossil sebres-8-5-event-perf-branch with full commit-history (ignoring several merge-points) & re-imported here. |
…evel to avoid reset block time by nested event-cycles (if Tcl_SetTimer does not create it), etc. Fixed retarded events (using new retarded list, the involve of the retarded events occurs only after checking of all event sources now). Two opportunities to retard event: - lazy, using the same event-object: in the handler set event->proc to new (or the same) handler (fast, possible only if not entering new event-cycle in handler); - create the event with new position "TCL_QUEUE_RETARDED"; New inline functions TclpQueueEventClientData / TclpQueueEventEx to fast creating resp. queuing event with extra data.
…waitForFiles (regardless of the too short timeout), e. g. test case "chan-io-53.8", etc.
…cle) and max blocking time was not set outside an event source traversal.
606f029
to
d2ae8f3
Compare
I made a small review and several fixes after comparison to my forks of 8.5-th/8.6-th branches. |
…eneration, example: tclsh -c "proc test {} {after 1000 test}; test; vwait infinite"
…ent-cycle (Tcl_DoOneEvent) produced in event-cycle without processing
…onflicts resolved, merge-points restored
…es gcc warning: suggest parentheses around '&&' within '||')
…everal fallbacks for platforms without monotonic time (older Darwin, MacOSX or xcode)
…rocPtr == NativeScaleTime) the relationship is 1:1 and nothing has to be done.
…nresolved external "TclpUSleep" bug).
…ince all other times used in calculation already scaled).
… (the actual time of usleep/nanosleep may be longer, due to system latencies and possible limitations in the timer resolution of the hardware); forces context switch or yield the processor if sleep time to small.
Again rebased over branch Related RFE and branches in Fossil tcl-core repository:
|
…igned wide int (also fixes negative initial base for monotonic time and incorrect time distance by very large offsets)
Interim artificial PR.
Currently implemented (squashed rebased) for 8.5-branch, because I made it basically for fork of this branch (it was just easy as direct for 8.6).
Contains:
after at
for async trigger resp. sleep using absolute time (in seconds), that also time-jump safe, and in contrary toafter $offs
uses absolute based time as due-time.after at [clock scan "16:00:00"] {do_it_in_1600}
after at [clock scan "+1 minute"]; # wait to the next minute
after
(and some internal interfaces) are more faster now, because hold the timer-event in the internal representation of object, so e. g.after info
,after cancel
etc don't need to search an event in event-lists anymore (and therefore don't block the list with lock for the long time);event->proc
to the callback causes reattach of this event to end of the queue, in contrary toreturn 0
, which leaves the event on the current position, and thus can repeat it too early);TclpProlongTimerEvent
, can be later used as new sub-commandafter prolong
);vwait
andupdate
can control which events should be accepted:update -timer
update -noidle
vwait -async -timer x
vwait
can use optional timeout:if {![vwait -timer 10 tmrVar]} continue; # do something other resp. try later
if {![vwait 1000 evVar]} { error "timeout occurred" }
vwait
can work similar toupdate
, without waiting for events (process only already occurring events):if {[info exists evVar] || [vwait -nowait 10 evVar]} { puts "event already launched" }
## update and check we are done:
while {![vwait -nowait 0 done]} { do something other }
after
andvwait
are microsecond precise now (so NRT-capable, also accept time as double):after 0.01 [list accept $socket]; # do it in 10 microseconds
## wait 5 µs for "x" and if not yet ready, 25 µs for "y":
if {![vwait 0.005 x]} {vwait 0.025 y}
clock monotonic
to provide monotonic time at tcl-level also;Current commit-history:
Click here to expand
partially back ported event-performance
after at: added simple workaround for absolute timers/sleep ("after at real-time"): because we use monotonic time in all wait functions, so to avoid too long wait by the absolute timers (to be able to trigger it) if time jumped to the expected absolute time, just let block for maximal 1 second if absolute timers available.
test-cases: time-jumps (TIP #302) test covered now.
Note: on some platforms it is only possible if the user has corresponding privileges to change system date and time.
Ex.: sudo LD_LIBRARY_PATH=. ./tclsh ../tests/timer.test -match timer-20.*
code review and small optimizations
fix check event source threshold (corresponds 100-ns ranges, if the wide-clicks supported);
because of variable width of 1 wide-click: windows - frequency dependent, unix - nanoseconds, darwin/osx - tb.numer / tb.denom nanoseconds.
unix: implements wide-clicks on unix (1 wide-click == 0.001 microseconds (1 nanosecond)), so more precise now (e. g. by time measurement etc.);
unix/configure: regenerated (autoconf)
[unix] fixes conditional-wait: timeout is monotonic based;
Introduced monotonic time as ultimate fix for time-jump issue (fixed for windows and unix now, TIP #302 fully implemented now);
Usage of monotonic time instead of adjustment via timeJump/timeJumpEpoch is more precise and effective.
New sub-command "clock monotonic" to provide monotonic time facility for tcl-level.
don't cancel scheduled event as long as the event list is not bidirectional (too slow by large queue) - rewritten to cancel delayed (by execute it).
fixed timer-marker handling: timer should be always executed after queued event (of the same generation), it was marked (be sure it marked to immediate execution in corresponding checkProc only).
tclIO: scheduled event rewritten using Tcl_Event instead of timer event (IO is not timer, e. g. executed also by usage of
vwait -notimer ...
, etc).Merge branch 'fix-busy-prompt-timers' into event-8.5-perf-branch
Amend to timer-marker: dualize special state of timer-marker (to differentiate between timer generations), so:
INT2PTR(-1) - exec immediate (marker reached);
INT2PTR(-2) - check in the next-cycle (marker reached only if no other events available);
Avoids permanent busy execution of prompt-events (always busy in timer), if they regenerate itself continuously for waiting for other events (like writable/readable), see e. g. socket-2.12.
"after at" set factor to 1000000 (seconds), test cases fixed
revert dual lists (relative/absolute) back to single list (because of better handling, a bit faster, etc.)
don't use tolerance in vwait, because of dual usage, it causes canceling of wait before end-time, on small timeout values (like 0.5, etc.)
call TclWinResetTimerResolution at end of sleep resp. wait for event (no calibration thread anymore)
calibration cycle completely rewritten (no calibration thread needed, soft drifts within 250ms intervals, fewer discrepancy and fewer virtual time gradation, etc).
todo: implement resetting timer-resolution to original value (without calibration thread now).
extended performance test-cases (test-nrt-capability): RTS-near sleeps with very brief sleep-time.
chanio.test: optimize several tests cases running too long (shorten unwanted large sleeps)
bug fix: prevent setting of negative block-time by too few initial wait-time, that may expire immediately (for example
vwait 0.0001 test
).extended performance test-cases (test-nrt-capability): covering of brief wait-times and other RTS-near constructs.
[unix] optimized Tcl_WaitForEvent similar to windows changes (makes Tcl for *nix more "RTS" resp. NRT-capable):
timerate {vwait 0 a}
- 1.5µs now vs. 31.9µs before;added performance test-cases to cover timer-events speed resp. event-driven tcl-handling
(cherry-picked and back-ported from tclSE-9)
fix sporadic errors on some fast cpu/platforms (because bgerror executed in background and it is an idle-event, give enough time to process it (resp. wait until last idle event is done);
make timer test-case more precise and time-independent, ignores short tolerance (deviation by waiting);
several time-independent test-cases optimized (wait shorter now) + some new cases to cover more situations.
after info, after cancel: compare interpreter of the timer-events by direct retrieving via internal representation (ignore foreign events), test cases extended.
resolved some warnings / fixed unix resp. x64 compilation
code review + better usage of the waiting tolerance (fewer CPU-greedy now, avoid busy-wait if the rest of wait-time too small and can be neglected);
TMR_RES_TOLERANCE can be defined to use wait-tolerance on *nix platforms (currently windows only as relation resp. deviation between default timer resolution 15.600 in exact milliseconds, means 15600/15000 + small overhead);
Decreasing of TMR_RES_TOLERANCE (up to 0) makes tcl more "RTS" resp. NRT-capable (very precise wait-intervals, but more CPU-hungry).
[win] fallback to replace C++ keyword "inline" with C keyword "__inline"
Otherwise depending on the VC-version, context, include-order it can cause:
error C2054: expected '(' to follow 'inline'
[win32] use timer resolution handling in Tcl_Sleep also;
Use auto-reset event object (system automatically resets the event state to nonsignaled after wake-up), avoids unwanted reset if wake-up for some other reasons (timeout/aio/message).
optimization of Tcl_LimitExceeded by internal usage (tclInt header)
dynamic increase of timer resolution corresponding wait-time;
non-blocking wait for event - if block-time set outside an event source traversal, use it as timeout, so can return with result 0 (no events);
[enhancement] extend "vwait" with same options as "update", new syntax "vwait ?options? ?timeout? varname".
some small improvements and fixing:
and stop waiting if Tcl_SetMaxBlockTime was called outside an event source (another event occurs and interrupt waiting loop), etc;
test cases extended to cover conditional "vwait" usage;
interim commit: try to extend "vwait" with same options as "update"
[performance] do one event (update / event servicing) cycle optimized (introduced threshold to prevent sourcing resp. waiting for new events by no-wait).
[enhancement] new event type introduced: TCL_ASYNC_EVENTS, command "update" becomes options to process only specified types, resp. to bypass some event types (including -idle/-noidle that in opposite to "idletasks" does not included window events);
test cases extended.
command "vwait" extended with timeout argument (in ms), 0 could be used to process pending events only (without wait), negative value equivalent execution of "vwait" without timeout (infinite);
test cases fixed and extended;
[performance] large performance increase by event servicing cycles (3x - 5x faster now);
[win] prevent listen using PeekMessage twice, and no wait anymore for too short timeouts (because windows can wait too long), compare 0µs with up-to 100µs overhead within MsgWaitForMultipleObjectsEx;
[bad behavior] process idle events only as long as no other events available (now TclPeekEventQueued will be used to check new events are available in service idle cycle);
[enhancement] new option "noidletasks" for command "update", so "update noidle" means "process all events but not idle";
[performance] much better handling for timer events within Tcl_ServiceEvent using timer marker in the queue and direct call of TclServiceTimerEvents if marker reached (instead of continuous adding handler event, polling it in the queue and removing hereafter);
this provides double performance increase in the service cycle;
[performance] introduced additional queue for prompt timer events (after 0) that should be executed immediately (no time);
normalizes timer, prompt and idle events structures using common TimerEntry structure for all types;
bug fix: wrong release of after-id tcl-object if it switch type (object leak)
[bug/stable fix] don't execute TimerSetupProc directly (may be unwanted, because changes the blocking time, also if TCL_TIMER_EVENTS|TCL_IDLE_EVENTS not set), so let do that within Tcl_DoOneEvent cycle only (we have registered an event source).
[performance] optimization for "after 0" as immediately execution without time (invoke as soon as possible) - generation and invocation of such timers twice faster now.
[performance] leave handler-event in the queue as long as pending timers still available (with expired time or immediate timers) by generation lock, resp. changed/not invalidated timer-queue) - so fewer event/allocations and guarantee to be executed within the next event cycle;
after-id: introduced object of type "afterObjType" as self-referenced weak pointer to timer/idle event, used for fast access to the "after" event (cancel, info etc.);
test cases extended to cover it additionally
rewrite interpreter limit handling using new timer event handling (with delete callback)
timer resp. idle events optimized: better handling using doubly linked lists, prevents allocating memory twice for the "after" events (use memory inside timer/idle event for the "after" structure), etc.
[performance] after-event list optimized (interp-assoc switched to doubly linked list, because requires handling from both ends of the list)
closes ticket [0520d17284500573d7c46aa88e0c6b4ebc9b6a02]