Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected performance boost if preallocate scratchpad ? #262

Open
mingodad opened this issue Jan 7, 2023 · 3 comments
Open

Unexpected performance boost if preallocate scratchpad ? #262

mingodad opened this issue Jan 7, 2023 · 3 comments

Comments

@mingodad
Copy link
Contributor

mingodad commented Jan 7, 2023

When doing tests with string concatenation I found that somehow if we preallocate a large scratchpad (I'm guessing here) the time needed to concatenate strings goes down to 1/4.
Without print(blob(1000000).write("hi"));:

/usr/bin/time sq joinStrings.nut 
strPlusStr took	840000	6.300323
strToArrayPush took	840000	0.0073100000000004
2.63user 3.67system 0:06.31elapsed 99%CPU (0avgtext+0avgdata 7036maxresident)k
0inputs+0outputs (0major+2507868minor)pagefaults 0swaps

With print(blob(1000000).write("hi"));:

/usr/bin/time sq joinStrings.nut 
2
strPlusStr took	840000	1.599949
strToArrayPush took	840000	0.007053
1.40user 0.20system 0:01.61elapsed 99%CPU (0avgtext+0avgdata 7392maxresident)k
0inputs+0outputs (0major+140519minor)pagefaults 0swaps
local str1 = "abc";
local str2 = "defghijklmnñopqrstuvwxyz";

// it seems that enlarging the scratchpad beforehand improve performance of string concatenation
print(blob(1000000).write("hi"));

function strPlusStr(times) {
  local str = "";
  for(local i =0; i < times; ++i) {
    str += str1 +  str2;
  }
  return str;
}

function strPlusStrBuf(times) {
  local str = blob();
  for(local i =0; i < times; ++i) {
    str.write(str1);
    str.write(str2);
  }
  return str.tostring();
}

function strToArrayPush(times) {
  local str = [];
  for(local i=0; i < times; ++i) {
    str.append(str1);
    str.append(str2);
  }
  return str.join("");
}

function test(times) {
  local time, str;
/*
  time = os.clock();
  str = strPlusStrBuf(times); //with this call before strPlusStr then strPlusStr takes 1/4 of time
  print("strPlusStrBuf took", str.len(), os.clock()-time)
*/
  time = os.clock();
  str = strPlusStr(times);
  print("strPlusStr took", str.len(), os.clock()-time)

  time = os.clock();
  str = strToArrayPush(times);
  print("strToArrayPush took", str.len(), os.clock()-time)

}

test(30000);
@zeromus
Copy link
Contributor

zeromus commented Jan 7, 2023

Two problems here

  1. SQSharedState::GetScratchPad needs to expand geometrically instead of only the the barely-needed size. If that happened, the "performance boost" would be so small that it would be hard to measure
  2. SQVM::StringCat could be optimized so that the scratchpad isn't needed. We would do this by hashing the source strings and combining it, then adding a new SQStringTable::Add() which can receive the hash already calculated. This would DRAMATICALLY speed up the usecase of concatenating characters one at a time, which is not an uncommon job

@albertodemichelis
Copy link
Owner

  1. is very environment specific, ask the ppl that run on a arm cortex M3 with 256Kb of ram and they will tell you something else.
  2. is a very interesting idea that could eliminate the need of the scratchpad in string concatenation

@albertodemichelis
Copy link
Owner

I did a very quick implementation of solution number 2 and the test suite passes, I'd have to do some benchmark but I'd assume is faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants