Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# improvements #6

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

C# improvements #6

wants to merge 6 commits into from

Conversation

tedd
Copy link

@tedd tedd commented Oct 23, 2022

Huge PR, so here is some info:

Had a look at the C# version, and there were some obvious issues. So I forked, set up a benchmark project and added progressive improvements to it - since its fun to benchmark things. See TrannetVersions

Benchmark results: https://github.com/tedd/transit-lang-cmp/tree/main/Trannet.Benchmark
Also added changes + simplified webapi implementation here: https://github.com/tedd/transit-lang-cmp/tree/main/Trannet

@jeremylcarter
Copy link

Great job!

@MarkPflug
Copy link

Good choice of CSV library. I approve.

Copy link

@MarkPflug MarkPflug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few suggestions. The list/dictionary capacity might have a measurable impact on the benchmarks.

var filename = Path.Join(RootDir, "MBTA_GTFS", "/trips.txt");
var opts = new CsvDataReaderOptions { HasHeaders = true, Delimiter = ',', Culture = new System.Globalization.CultureInfo("en-US") };
var csv = CsvDataReader.Create(filename, opts);
csv.Read();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Read should be removed. When HasHeaders = true, the header row will already be consumed when Create is called. The Culture doesn't need to be set, it defaults to Invariant, which is usually what you want.


public static (List<StopTime>, Dictionary<string, List<int>>) LoadStopTimes()
{
var stopTimes = new List<StopTime>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying an initial capacity on the List can avoid growing/reallocating the internal buffer.

public static (List<StopTime>, Dictionary<string, List<int>>) LoadStopTimes()
{
var stopTimes = new List<StopTime>();
var stopTimesIxByTrip = new Dictionary<String, List<int>>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same is true for this dictionary.

var stopTimesIxByTrip = new Dictionary<String, List<int>>();
var filename = Path.Join(RootDir, "MBTA_GTFS", "/stop_times.txt");
var opts = new CsvDataReaderOptions { HasHeaders = true, Delimiter = ',', Culture = new System.Globalization.CultureInfo("en-US") };
var csv = CsvDataReader.Create(filename, opts);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using var csv =... to ensure things get closed if exceptions happen.

stopTimesIxByTrip.Add(tripID, new List<int> { stopTimes.Count - 1 });
}
}
csv.Close();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close is not needed when using is added above.

@MarkPflug
Copy link

MarkPflug commented Oct 24, 2022

Loading the stop_times.txt file also benefits from enabling string pooling. The data in that file is quite repetitive, and the string pooling allows only loading each unique string once. A quick test on my machine shows ~40% reduction in time to load the file. The StringPool comes from the Sylvan.Common nuget package.

var pool = new Sylvan.StringPool();

var opts = new CsvDataReaderOptions { 
    StringFactory = pool.GetString 
};

UPDATE: after some more experimentation I found this to be wrong. In ASP.NET, under the server GC, the difference, if any, is negligible. My original measurements were in a console app (non-server GC), where the performance difference was significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants