Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-added the string cache, disabled by default (bonus: also thread-safe) #172

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

vintagedave
Copy link
Contributor

I added a string cache to DAST in August last year, which kept down memory fragmentation when it was used repeatedly. It was never pulled into the main branch and that branch is now very hard to merge. I've re-implemented it.

It's controlled by a define, which is off by default, so behaviour is unchanged unless you explicitly turn it on. If you turn it on, attributes and valued syntax nodes share strings. This greatly reduces memory fragmentation and also memory usage. The actual changed code is quite small so it should be easy to maintain or keep in the product.

I investigated using injection or some similar "clean code" behaviour to control this, instead of a define, but it's hard when also keeping it performant. Since not much code is affected, ifdef-ing the required places is cheap and hidden from the external user.

The string cache also has basic thread safety added, where the Add and Get operations hold a critical section while modifying the values. This too is controlled by a define, but it's on by default. It hasn't been extensively reviewed, but I couldn't see any major logic errors.

vintagedave and others added 10 commits August 4, 2015 14:20
…lso very importantly memory fragmentation).

Added StringCache.pas.
Changed attributes to use the string cache.
Note required a define USESTRINGCACHE which is off by default. The cache is not threadsafe.
…ionary of attributes, but attributes no longer work as a dictionary.)
…en threadsafe, any individual instance has get/add method contents wrapped in a lock, to lock the internal structures. The instance is never cleared when the count of objects using it drops to 0, since that's a bit more complex to get right (should only clear while the refcount is 0, but have to lock that to ensure it's not changed while clearing is happening, which severely slows inc/decrementing the count. I couldn't think of a good compare-lock-exchange algorithm. So just don't bother; it's never cleared while alive, when threadsafe.)
…both operations; now holds it for both. Safe to enter a CS twice.)
This reverts commit 95233e9.
@RomanYankovsky
Copy link
Owner

Thanks, @vintagedave
Let's wait a couple of days, if everyone is happy, I'll merge.

Btw, I'm asking because I've never tried. You use class as a dictionary key. Does it work? I afraid it could use pointer, not a content.

@vintagedave
Copy link
Contributor Author

Great :)

I also have another branch (not done yet) for using a memory pool for all syntax nodes, using this technique: https://parnassus.co/custom-object-memory-allocation-in-delphi-bypassing-fastmm-for-fun-and-profit/ Is that of interest too?

Re dictionary - is that TStringCache.FStringToId? It would use the pointer, but I construct the dictionary with a TStringRecValueEqualityComparer instance which is use for the comparisons. It compares by the string value in each TStringRec only.

@RomanYankovsky
Copy link
Owner

Ah, OK. I see :)

@RomanYankovsky
Copy link
Owner

Do you think memory fragmentation is an issue?

@vintagedave
Copy link
Contributor Author

For me it is, but that only occurs when using DAST a lot, and under memory pressure, eg in the pre-XE8 IDE. It might be for you for FixInsight over a large project, for example. I think FI still runs DAST a lot less than Navigator or Bookmarks do, which basically re-parse after a short delay after every single code change.

@sglienke
Copy link
Contributor

Can't this be solved way easier?

type
  TStringInternPool = class
  private
    fStrings: TDictionary<string,string>;
  public
    constructor Create;
    destructor Destroy; override;
    function Intern(const s: string): string;
  end;

{ TStringInternPool }

constructor TStringInternPool.Create;
begin
  fStrings := TDictionary<string,string>.Create;
end;

destructor TStringInternPool.Destroy;
begin
  fStrings.Free;
  inherited;
end;

function TStringInternPool.Intern(const s: string): string;
begin
  MonitorEnter(fStrings);
  try
    if not fStrings.TryGetValue(s, Result) then
    begin
      Result := s;
      fStrings.Add(s, Result);
    end;
  finally
    MonitorExit(fStrings);
  end;
end;

@vintagedave
Copy link
Contributor Author

Yes, though the current code allows usage tracking - you can get stats out
about how effective it is.

On 28 April 2016 at 18:44, Stefan Glienke [email protected] wrote:

Can't this be solved way easier?

type
TStringInternPool = class
private
fStrings: TDictionary<string,string>;
public
constructor Create;
destructor Destroy; override;
function Intern(const s: string): string;
end;

{ TStringInternPool }

constructor TStringInternPool.Create;
begin
fStrings := TDictionary<string,string>.Create;
end;

destructor TStringInternPool.Destroy;
begin
fStrings.Free;
inherited;
end;

function TStringInternPool.Intern(const s: string): string;
begin
MonitorEnter(fStrings);
try
if not fStrings.TryGetValue(s, Result) then
begin
Result := s;
fStrings.Add(s, Result);
end;
finally
MonitorExit(fStrings);
end;
end;


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#172 (comment)

@RomanYankovsky
Copy link
Owner

Stefan's approach seems to be less intrusive for me. I think it can be improved by adding a method that will output statistics. We can read string's ref counter for usage tracking. What do you think @vintagedave ?

@vintagedave
Copy link
Contributor Author

Yes, I think it's much better. @sglienke told me he submitted a patch where strings are interned in the lexer itself, which means it's much lower-level too. I'd go with that option as much better than mine.

@RomanYankovsky
Copy link
Owner

I don't see his pull request :)

@sglienke
Copy link
Contributor

Look at the feature/stringinterning branch of my fork to see my approach - I did not submit a PR because this is just to show the concept.

@RomanYankovsky
Copy link
Owner

So everyone agree to take Stefan's implementation of string cache?

@vintagedave
Copy link
Contributor Author

For me, yes.

@sglienke
Copy link
Contributor

sglienke commented Aug 5, 2022

My string pool implementation was taken into master long ago - this can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants