Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clock scan - replacing regexp with scan #34

Closed
UnitedMarsupials-zz opened this issue May 3, 2019 · 9 comments
Closed

Clock scan - replacing regexp with scan #34

UnitedMarsupials-zz opened this issue May 3, 2019 · 9 comments

Comments

@UnitedMarsupials-zz
Copy link

In competition with #4, I took a shot today at modifying the ::tcl::clock::scan to replace usage of regexp with scan. Obviously, the gains are different depending on the format-string complexity.

The below output from my non-regression test shows the timings for both the current code (in Tcl-8.7a1) and mine for several formats.

In both cases the first iteration was ignored -- those take the longest to parse the format initially. The results show average timings for 1000 subsequent iterations, after the scanner-procedure is already created.

With TCL compiled for debugging (no optimization)

Thu May 02 19:18:21 2019:       1024.239 microseconds per iteration
Thu May 02 19:18:21 2019:       230.092 microseconds per iteration
Monday 2019:    412.82 microseconds per iteration
Monday 2019:    195.325 microseconds per iteration
Thu May 02 19:18:21 EDT 2019:   1780.403 microseconds per iteration
Thu May 02 19:18:21 EDT 2019:   246.275 microseconds per iteration
Thu 02 19:18:21 2019:   753.468 microseconds per iteration
Thu 02 19:18:21 2019:   252.612 microseconds per iteration
02/11/18:       273.898 microseconds per iteration
02/11/18:       188.197 microseconds per iteration

TCL compiled with -O2

Thu May 02 19:18:21 2019:       835.164 microseconds per iteration
Thu May 02 19:18:21 2019:       159.321 microseconds per iteration
Monday 2019:    266.272 microseconds per iteration
Monday 2019:    139.571 microseconds per iteration
Thu May 02 19:18:21 EDT 2019:   1396.415 microseconds per iteration
Thu May 02 19:18:21 EDT 2019:   168.879 microseconds per iteration
Thu 02 19:18:21 2019:   532.426 microseconds per iteration
Thu 02 19:18:21 2019:   273.006 microseconds per iteration
02/11/18:       188.973 microseconds per iteration
02/11/18:       134.863 microseconds per iteration

The default -- the longest of the tested -- format shows the biggest improvement (7-8 times -- I hope, the bounty would be based on this :-) ).

Should I continue working in this direction -- making sure, existing test-cases all pass, etc., or is @aidanhs so far ahead of me already, I may as well not bother?

@sebres
Copy link

sebres commented May 3, 2019

I think this bounty is closed.
Flightaware uses my module tclclockmod as long as it is still not released in tcl.core (branch "sebres-8-6-clock-speedup-cr2"), and looks like already accepted from TCT in large part.

BTW, the complete implementation is impossible with pure scan, for example because to be backwards compatible one would need greedy as well as non greedy matching rules for parts with dynamic length, etc pp and you'd almost end up with own implementing of RE-engine (which would be then slower as NRE using scan with dynamic coating).

@UnitedMarsupials-zz
Copy link
Author

My approach -- so far -- was not a complete rewrite, I still use the existing functions to parse various names, for example. My change only replaces regexp with scan, which is quite a bit faster. I expect it to be fully compatible with the existing implementation -- warts and all -- whatever that's worth.

Another approach would be to continue using regexp, but move the regular expression-strings outside of the scanner-procedures. I suspect, the expressions are currently recompiled anew every time the proc is called. Making them global (inside the namespace) would retain the first compilations' result for each one, I think.

@resuna
Copy link
Member

resuna commented May 3, 2019

This bounty has been completed and we are just waiting for the TCL core team to pick it up. I will update the readme.

@resuna resuna closed this as completed May 3, 2019
@UnitedMarsupials-zz
Copy link
Author

UnitedMarsupials-zz commented May 3, 2019

Oh well, as a TCL user I'm certainly glad to hear about this development and I can't wait for the improvements to hit a TCL release (and, maybe, be backported into 8.6 as well).

For the record, here is the gist of my own changes. As I indicated, I replace the regexp-call followed by multiple calls to scan with a single scan. For example, here are the changes for the handling of %y - changes for other for other formatting characters are very similar:

                    y {                 # Two-digit Gregorian year
-                       append re \\s*(\\d\\d?)
+                       append re " %02d"
                        dict set fieldSet yearOfCentury [incr fieldCount]
-                       append postcode "dict set date yearOfCentury \["  "::scan \$field" [incr captureCount] " %d"  "\]\n"
+                       append postcode "dict set date yearOfCentury "  "\$field" [incr captureCount] "\n"
                    }

And then, at the end:

@@ -340,13 +336,9 @@
     }

-    # Clean up any unfinished format groups
-
-    append re $state \\s*\$
-
     # Build the procedure

     set procBody {}
     append procBody "variable ::tcl::clock::TZData" \n
-    append procBody "if \{ !\[ regexp -nocase [list $re] \$string ->"
+    append procBody "if \{ !\[ ::scan \$string {$re} "
     for { set i 1 } { $i <= $captureCount } { incr i } {
        append procBody " " field $i

On my humble i5, running FreeBSD-i386, I get the following timings (microseconds per iteration):

Format String Current regexp-based implementation New scan-based implementation
%a %b %d %H:%M:%S %Y Thu May 02 19:18:21 2019 822.6088 128.5014
%A %Y Monday 2019 272.235 129.8344
%A %Y Monday 2011 275.4886 125.038
%a %b %d %H:%M:%S %Z %Y Thu May 02 19:18:21 EDT 2019 1344.6232 136.2222
%a %d %H:%M:%S %Y Thu 02 19:18:21 2019 536.1878 146.4762
%Z %Z EDT EST 227.217 135.1222
%w %y 3 11 172.109 123.082
%A %Y Wombat 2019 Both fail
%m/%d/%y 02/11/18 191.5808 117.0322

May this be a lesson to anyone using regexp, where scan would do... :)

(Oh, and making the regular-expression strings global didn't help at all. I guess, the compilation-results are cached with the procedure's bytecode, when the regex is a known constant string.)

@sebres
Copy link

sebres commented May 4, 2019

May this be a lesson to anyone using regexp, where scan would do... :)

The emphasis is here on the word "where". Again, it is not really always possible and I'm sure you'll see that not later than you would start a test-cases for you modified clock (like tclsh tests/clock.test).

BTW, I use neither the regexp nor the scan in my engine... (C-lang rules;).
Anyway on my humble i5, I don't see the times (822µs) you have measured by regexp (you may missed warming up or something else). Anyway here are my times:

-# original tcl
+# tclclockmod (or clock-perf-branch)
 % timerate {clock scan "Thu May 02 19:18:21 2019" -format "%a %b %d %H:%M:%S %Y" -gmt 1 -locale en}
-167.844 µs/# 5957 # 5957.9 #/sec 999.846 nett-ms
+1.006097 µs/# 947962 # 993939 #/sec 953.742 nett-ms

Additionally I cannot believe the regexp should be so slower as scan (vise versa I saw regexp many time faster sometimes). Like here in this example:

% timerate {scan 02 "%02d"}
0.579908 µs/# 1598753 # 1724412 #/sec 927.129 nett-ms
% timerate {regexp {^\d\d?$} 02}
0.269321 µs/# 3175604 # 3713044 #/sec 855.256 nett-ms

@UnitedMarsupials-zz
Copy link
Author

C-lang rules
It certainly does -- and the best illustration is you getting the benchmark down 160 times!

The i5 I used before was at 2GHz -- used inside a FreeBSD/i386 VM running inside a Windows guest.

Not sure, what your timerate does... Trying the below simple script:

puts [time {scan 11 %2d ns} 10000]
puts [time {regexp {^(\d{2})$} 11 nr} 10000]

puts "$ns vs. $nr"

on a FreeBSD/amd64 using the E5-1620 CPUs at 3.6GHz, I get (with tcl-8.6):

0.763 microseconds per iteration
1.1125 microseconds per iteration
11 vs. 11

That is, scan is a lot faster even for simple anchored expressions...

Now, the default compiler on FreeBSD is clang. So I built tcl8.7 with gcc8 and tried the same script again. Same thing:

0.7488 microseconds per iteration
1.1592 microseconds per iteration
11 vs. 11

@sebres
Copy link

sebres commented May 5, 2019

Not sure, what your timerate does...

timerate is a new measurement command for all newest Tcl-versions (since TIP 527 got merged in Tcl >= 8.5)

That is, scan is a lot faster even for simple anchored expressions...

You may be a bit wrong:

  • your number of iterations (10000) is too small - the time and overhead of the compilation of regex (occurred 1st time only) is too large compared to the whole execution time (in my example above using timerate it has reached 3175604/1598753 iterations by regexp/scan accordingly in 1 second); this is that what I meant as warming-up.
  • command time is not really suitable for a performance measurement (because the script body does not invoked as compiled inside single iterations, so the overhead for jit-related thing is simply too large, especially on very fast commands); this means also it is executed not as byte-compiled instructions (and then scan could be faster, due to simpler and faster "non compiled" execution).

Anyway, also by your times, 0.7 to 1.1 is not really a lot faster in my opinion (factor 1.5) and does not explain the times you meant before - 128 vs. 822 (factor 6.5).

@UnitedMarsupials-zz
Copy link
Author

timerate is a new measurement command for all newest Tcl-versions

Interesting -- FreeBSD ports do not install it (neither for tcl86 nor for tcl87). I'll investigate...

You may be a bit wrong:

Ok, here is the updated script, taking your concerns into account: 100-times more iterations, one separate iteration each to have the expression compiled:

set scan	{scan 11 %2d ns}
set regexp	{regexp {^(\d{2})$} 11 nr}

eval $scan	;# The first, throw-
eval $regexp	;# away evaluations.

puts [time $scan 1000000]
puts [time $regexp 1000000]

puts "$ns vs. $nr"

The per-iteration numbers are as before...

0.7 to 1.1 is not really a lot faster in my opinion (factor 1.5)

That's consistent with what I've got for the short strings above (like %Z %Z).

128 vs. 822 (factor 6.5)

These differences are what I've got for the longer expressions -- %a %b %d %H:%M:%S %Y.

@sebres
Copy link

sebres commented May 6, 2019

FreeBSD ports do not install it (neither for tcl86 nor for tcl87). I'll investigate...

Although it is merged but not yet released. :) You must compile newest sources from fossil-repository or clones (e. g. on github).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants