-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clock scan - replacing regexp with scan #34
Comments
I think this bounty is closed. BTW, the complete implementation is impossible with pure scan, for example because to be backwards compatible one would need greedy as well as non greedy matching rules for parts with dynamic length, etc pp and you'd almost end up with own implementing of RE-engine (which would be then slower as NRE using scan with dynamic coating). |
My approach -- so far -- was not a complete rewrite, I still use the existing functions to parse various names, for example. My change only replaces Another approach would be to continue using |
This bounty has been completed and we are just waiting for the TCL core team to pick it up. I will update the readme. |
Oh well, as a TCL user I'm certainly glad to hear about this development and I can't wait for the improvements to hit a TCL release (and, maybe, be backported into 8.6 as well). For the record, here is the gist of my own changes. As I indicated, I replace the y { # Two-digit Gregorian year
- append re \\s*(\\d\\d?)
+ append re " %02d"
dict set fieldSet yearOfCentury [incr fieldCount]
- append postcode "dict set date yearOfCentury \[" "::scan \$field" [incr captureCount] " %d" "\]\n"
+ append postcode "dict set date yearOfCentury " "\$field" [incr captureCount] "\n"
} And then, at the end: @@ -340,13 +336,9 @@
}
- # Clean up any unfinished format groups
-
- append re $state \\s*\$
-
# Build the procedure
set procBody {}
append procBody "variable ::tcl::clock::TZData" \n
- append procBody "if \{ !\[ regexp -nocase [list $re] \$string ->"
+ append procBody "if \{ !\[ ::scan \$string {$re} "
for { set i 1 } { $i <= $captureCount } { incr i } {
append procBody " " field $i On my humble i5, running FreeBSD-i386, I get the following timings (microseconds per iteration):
May this be a lesson to anyone using (Oh, and making the regular-expression strings global didn't help at all. I guess, the compilation-results are cached with the procedure's bytecode, when the regex is a known constant string.) |
The emphasis is here on the word "where". Again, it is not really always possible and I'm sure you'll see that not later than you would start a test-cases for you modified clock (like BTW, I use neither the -# original tcl
+# tclclockmod (or clock-perf-branch)
% timerate {clock scan "Thu May 02 19:18:21 2019" -format "%a %b %d %H:%M:%S %Y" -gmt 1 -locale en}
-167.844 µs/# 5957 # 5957.9 #/sec 999.846 nett-ms
+1.006097 µs/# 947962 # 993939 #/sec 953.742 nett-ms Additionally I cannot believe the % timerate {scan 02 "%02d"}
0.579908 µs/# 1598753 # 1724412 #/sec 927.129 nett-ms
% timerate {regexp {^\d\d?$} 02}
0.269321 µs/# 3175604 # 3713044 #/sec 855.256 nett-ms |
The i5 I used before was at 2GHz -- used inside a FreeBSD/i386 VM running inside a Windows guest. Not sure, what your puts [time {scan 11 %2d ns} 10000]
puts [time {regexp {^(\d{2})$} 11 nr} 10000]
puts "$ns vs. $nr" on a FreeBSD/amd64 using the E5-1620 CPUs at 3.6GHz, I get (with tcl-8.6):
That is, Now, the default compiler on FreeBSD is clang. So I built tcl8.7 with gcc8 and tried the same script again. Same thing:
|
You may be a bit wrong:
Anyway, also by your times, 0.7 to 1.1 is not really a lot faster in my opinion (factor 1.5) and does not explain the times you meant before - 128 vs. 822 (factor 6.5). |
Interesting -- FreeBSD ports do not install it (neither for tcl86 nor for tcl87). I'll investigate...
Ok, here is the updated script, taking your concerns into account: 100-times more iterations, one separate iteration each to have the expression compiled: set scan {scan 11 %2d ns}
set regexp {regexp {^(\d{2})$} 11 nr}
eval $scan ;# The first, throw-
eval $regexp ;# away evaluations.
puts [time $scan 1000000]
puts [time $regexp 1000000]
puts "$ns vs. $nr" The per-iteration numbers are as before...
That's consistent with what I've got for the short strings above (like
These differences are what I've got for the longer expressions -- |
Although it is merged but not yet released. :) You must compile newest sources from fossil-repository or clones (e. g. on github). |
In competition with #4, I took a shot today at modifying the
::tcl::clock::scan
to replace usage ofregexp
withscan
. Obviously, the gains are different depending on the format-string complexity.The below output from my non-regression test shows the timings for both the current code (in Tcl-8.7a1) and mine for several formats.
In both cases the first iteration was ignored -- those take the longest to parse the format initially. The results show average timings for 1000 subsequent iterations, after the scanner-procedure is already created.
With TCL compiled for debugging (no optimization)
TCL compiled with
-O2
The default -- the longest of the tested -- format shows the biggest improvement (7-8 times -- I hope, the bounty would be based on this :-) ).
Should I continue working in this direction -- making sure, existing test-cases all pass, etc., or is @aidanhs so far ahead of me already, I may as well not bother?
The text was updated successfully, but these errors were encountered: