Full Scale Performance: Single Process, Sharrow On, Explicit Chunking #19

aletzdy · 2024-06-06T14:03:46Z

This is the issue to report on memory usage and runtime performance...

data_dir: "data-full" full scale skims (24333 MAZs)
households_sample_size: 0 (full scale 100% sample of households)
sharrow: "require"
multiprocess: false single process
chunk_training_mode: explicit

aletzdy · 2024-06-06T15:02:49Z

Run with full sample, sharrow on, and single process (1 TB memory, Intel Xeon Gold 6342 @ 2.8GHz machine).

Runtime is 2316.9mins (compared to 2177.9mins for a similar run with NO chunking)

Chunking is set to explicit in settings_mp.yaml, and explicit chunking is enabled for a number of models whose memory usage was higher than 300Gb in a previous sharrow run with same specs but with NO chunking. These models are:

compute_accessbility
mandatory_tour_scheduling
non-mandatory_tour_scheduling
trip_destination

For these models, the argument explicit_chunk was set to 0.5 in each respective model's yaml file, indicating the need for 2 (=1/0.5) chunks. Further chunks may be added in other tests and as required.
timing_log.csv
activitysim.log
memory_profile.csv

Here's how the memory profile looks like:

Compare the above memory profile to the one with the same settings, except no chunking:

Here's a comparison table of max memory usage of each step, with a negative difference meaning the chunking run uses less memory:

event	full_rss_gb_chunking	full_rss_gb_No_chunking	Difference	% Diff accounting for skims
preload_injectables	1.927602176	3.40840448	-1.4808	NA
initialize_proto_population	165.5326392	165.9972444	-0.46461	NA
compute_disaggregate_accessibility	167.7904036	168.6871695	-0.89677	-5.1%
initialize_landuse	146.0646502	148.4756828	-2.41103	#DIV/0!
initialize_households	163.6487004	167.2370094	-3.58831	-289.4%
compute_accessibility	245.1278234	344.963457	-99.8356	-55.8%
av_ownership	153.2263014	153.190015	0.036286	-0.3%
auto_ownership_simulate	165.8563215	171.3207624	-5.46444	-102.6%
work_from_home	153.5603507	161.8880635	-8.32771	202.7%
external_worker_identification	155.0026179	162.3008256	-7.29821	197.4%
external_workplace_location	155.2053207	162.4345354	-7.22921	202.9%
school_location	187.0748877	187.061289	0.013599	0.1%
workplace_location	218.1287363	218.731561	-0.60282	-1.1%
transit_pass_subsidy	164.6922015	166.9283922	-2.23619	-240.2%
transit_pass_ownership	166.2044733	167.5616952	-1.35722	-86.8%
vehicle_type_choice	164.520702	177.2367913	-12.7161	-113.1%
adjust_auto_operating_cost	145.6777503	157.0783519	-11.4006	127.8%
transponder_ownership	149.0361467	159.9991685	-10.963	182.8%
free_parking	156.8398828	156.3868774	0.453005	-4.7%
telecommute_frequency	156.5709025	155.5312804	1.039622	-9.9%
cdap_simulate	156.639617	156.9351352	-0.29552	3.3%
mandatory_tour_frequency	156.8663839	156.050772	0.815612	-8.2%
mandatory_tour_scheduling	335.6519547	417.5135498	-81.8616	-32.5%
school_escorting	159.3236521	156.8768901	2.446762	-26.8%
joint_tour_frequency_composition	167.3451192	168.112853	-0.76773	-36.3%
external_joint_tour_identification	164.685353	163.7067776	0.978575	-42.7%
joint_tour_participation	166.5587773	162.3630479	4.195729	-115.5%
joint_tour_destination	168.4934328	167.9516017	0.541831	27.7%
external_joint_tour_destination	161.2138127	157.831168	3.382645	-41.4%
joint_tour_scheduling	179.6869489	175.8333297	3.853619	39.2%
non_mandatory_tour_frequency	190.2081884	181.86061	8.347578	52.6%
external_non_mandatory_identification	172.139692	180.9013842	-8.76169	-58.8%
non_mandatory_tour_destination	240.3257958	240.2470052	0.078791	0.1%
external_non_mandatory_destination	190.4231711	191.110828	-0.68766	-2.7%
non_mandatory_tour_scheduling	227.1283814	261.411414	-34.283	-35.9%
vehicle_allocation	182.6668093	179.8298706	2.836939	20.5%
tour_mode_choice_simulate	183.2361574	180.0804557	3.155702	22.4%
atwork_subtour_frequency	163.4289213	158.4350372	4.993884	-66.0%
atwork_subtour_destination	203.2547389	198.2499512	5.004788	15.5%
atwork_subtour_scheduling	184.3515597	180.9987052	3.352855	22.4%
atwork_subtour_mode_choice	179.6341637	170.6008289	9.033335	196.2%
stop_frequency	189.1866747	191.2116716	-2.025	-8.0%
trip_purpose	165.47106	163.3115464	2.159514	-80.4%
trip_destination	326.5262551	324.2506404	2.275615	1.4%
trip_purpose_and_destination	236.8783278	178.4622653	58.41606	468.6%
trip_scheduling	239.4423788	171.3450885	68.09729	1273.4%
trip_mode_choice	247.602602	190.5106452	57.09196	232.9%
parking_location	268.5247119	250.8484936	17.67622	20.8%
write_data_dictionary	265.4285496	204.4660818	60.96247	158.5%
track_skim_usage	258.8279767	198.1093274	60.71865	189.1%
write_trip_matrices	291.1253258	262.6623529	28.46297	29.4%
write_tables	225.5399281	222.7437937	2.796134	4.9%
finalizing	197.1774505	193.4436598	3.733791	13.6%

Out of the models with explicit chunking turned on, I am not seeing any real difference between chunking and no chunking in trip_destination, but:

compute_accessibility shows a 52% reduction in ram usage,
mandatory_tour_scheduling shows a 33% reduction,
non_mandatory_tour_scheduling shows a 36% reduction

Small differences in other model steps may be expected, but some other show bigger differences than I expected.

I should note that the no chunking settings run was from two weeks ago and did not set the numba and openblaus environment variables to 0 (but the chunking run did), so it may be a theory as to why.

dhensle · 2024-06-18T03:23:58Z

Performed again on an RSG machine and got similar results (compare the below memory profile to #6 (comment) which was run on the same machine immediately before.)

Set explicit_chunk: 0.25 for accessibilities, scheduling models, trip destination, parking location, and write trip matrices

Looks like some investigation on trip destination spikes is warranted as the peak did not drop much there.

Overall run time increased from 1267 mins without chunking to 1291 mins with chunking, or 21.1 to 21.5 hrs. Memory peak went from 442 GB to 321 GB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full Scale Performance: Single Process, Sharrow On, Explicit Chunking #19

Full Scale Performance: Single Process, Sharrow On, Explicit Chunking #19

aletzdy commented Jun 6, 2024

aletzdy commented Jun 6, 2024 •

edited

Loading

dhensle commented Jun 18, 2024

Full Scale Performance: Single Process, Sharrow On, Explicit Chunking #19

Full Scale Performance: Single Process, Sharrow On, Explicit Chunking #19

Comments

aletzdy commented Jun 6, 2024

aletzdy commented Jun 6, 2024 • edited Loading

dhensle commented Jun 18, 2024

aletzdy commented Jun 6, 2024 •

edited

Loading