Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Scale Performance: Single Process, Sharrow On, Explicit Chunking #19

Open
aletzdy opened this issue Jun 6, 2024 · 2 comments
Open

Comments

@aletzdy
Copy link

aletzdy commented Jun 6, 2024

This is the issue to report on memory usage and runtime performance...

data_dir: "data-full" full scale skims (24333 MAZs)
households_sample_size: 0 (full scale 100% sample of households)
sharrow: "require"
multiprocess: false single process
chunk_training_mode: explicit

@aletzdy
Copy link
Author

aletzdy commented Jun 6, 2024

Run with full sample, sharrow on, and single process (1 TB memory, Intel Xeon Gold 6342 @ 2.8GHz machine).

  • Runtime is 2316.9mins (compared to 2177.9mins for a similar run with NO chunking)

Chunking is set to explicit in settings_mp.yaml, and explicit chunking is enabled for a number of models whose memory usage was higher than 300Gb in a previous sharrow run with same specs but with NO chunking. These models are:

  • compute_accessbility
  • mandatory_tour_scheduling
  • non-mandatory_tour_scheduling
  • trip_destination

For these models, the argument explicit_chunk was set to 0.5 in each respective model's yaml file, indicating the need for 2 (=1/0.5) chunks. Further chunks may be added in other tests and as required.
timing_log.csv
activitysim.log
memory_profile.csv

Here's how the memory profile looks like:
image

Compare the above memory profile to the one with the same settings, except no chunking:

image

Here's a comparison table of max memory usage of each step, with a negative difference meaning the chunking run uses less memory:

<style> </style>
event full_rss_gb_chunking full_rss_gb_No_chunking Difference % Diff accounting for skims
preload_injectables 1.927602176 3.40840448 -1.4808 NA
initialize_proto_population 165.5326392 165.9972444 -0.46461 NA
compute_disaggregate_accessibility 167.7904036 168.6871695 -0.89677 -5.1%
initialize_landuse 146.0646502 148.4756828 -2.41103 #DIV/0!
initialize_households 163.6487004 167.2370094 -3.58831 -289.4%
compute_accessibility 245.1278234 344.963457 -99.8356 -55.8%
av_ownership 153.2263014 153.190015 0.036286 -0.3%
auto_ownership_simulate 165.8563215 171.3207624 -5.46444 -102.6%
work_from_home 153.5603507 161.8880635 -8.32771 202.7%
external_worker_identification 155.0026179 162.3008256 -7.29821 197.4%
external_workplace_location 155.2053207 162.4345354 -7.22921 202.9%
school_location 187.0748877 187.061289 0.013599 0.1%
workplace_location 218.1287363 218.731561 -0.60282 -1.1%
transit_pass_subsidy 164.6922015 166.9283922 -2.23619 -240.2%
transit_pass_ownership 166.2044733 167.5616952 -1.35722 -86.8%
vehicle_type_choice 164.520702 177.2367913 -12.7161 -113.1%
adjust_auto_operating_cost 145.6777503 157.0783519 -11.4006 127.8%
transponder_ownership 149.0361467 159.9991685 -10.963 182.8%
free_parking 156.8398828 156.3868774 0.453005 -4.7%
telecommute_frequency 156.5709025 155.5312804 1.039622 -9.9%
cdap_simulate 156.639617 156.9351352 -0.29552 3.3%
mandatory_tour_frequency 156.8663839 156.050772 0.815612 -8.2%
mandatory_tour_scheduling 335.6519547 417.5135498 -81.8616 -32.5%
school_escorting 159.3236521 156.8768901 2.446762 -26.8%
joint_tour_frequency_composition 167.3451192 168.112853 -0.76773 -36.3%
external_joint_tour_identification 164.685353 163.7067776 0.978575 -42.7%
joint_tour_participation 166.5587773 162.3630479 4.195729 -115.5%
joint_tour_destination 168.4934328 167.9516017 0.541831 27.7%
external_joint_tour_destination 161.2138127 157.831168 3.382645 -41.4%
joint_tour_scheduling 179.6869489 175.8333297 3.853619 39.2%
non_mandatory_tour_frequency 190.2081884 181.86061 8.347578 52.6%
external_non_mandatory_identification 172.139692 180.9013842 -8.76169 -58.8%
non_mandatory_tour_destination 240.3257958 240.2470052 0.078791 0.1%
external_non_mandatory_destination 190.4231711 191.110828 -0.68766 -2.7%
non_mandatory_tour_scheduling 227.1283814 261.411414 -34.283 -35.9%
vehicle_allocation 182.6668093 179.8298706 2.836939 20.5%
tour_mode_choice_simulate 183.2361574 180.0804557 3.155702 22.4%
atwork_subtour_frequency 163.4289213 158.4350372 4.993884 -66.0%
atwork_subtour_destination 203.2547389 198.2499512 5.004788 15.5%
atwork_subtour_scheduling 184.3515597 180.9987052 3.352855 22.4%
atwork_subtour_mode_choice 179.6341637 170.6008289 9.033335 196.2%
stop_frequency 189.1866747 191.2116716 -2.025 -8.0%
trip_purpose 165.47106 163.3115464 2.159514 -80.4%
trip_destination 326.5262551 324.2506404 2.275615 1.4%
trip_purpose_and_destination 236.8783278 178.4622653 58.41606 468.6%
trip_scheduling 239.4423788 171.3450885 68.09729 1273.4%
trip_mode_choice 247.602602 190.5106452 57.09196 232.9%
parking_location 268.5247119 250.8484936 17.67622 20.8%
write_data_dictionary 265.4285496 204.4660818 60.96247 158.5%
track_skim_usage 258.8279767 198.1093274 60.71865 189.1%
write_trip_matrices 291.1253258 262.6623529 28.46297 29.4%
write_tables 225.5399281 222.7437937 2.796134 4.9%
finalizing 197.1774505 193.4436598 3.733791 13.6%

Out of the models with explicit chunking turned on, I am not seeing any real difference between chunking and no chunking in trip_destination, but:

  • compute_accessibility shows a 52% reduction in ram usage,
  • mandatory_tour_scheduling shows a 33% reduction,
  • non_mandatory_tour_scheduling shows a 36% reduction

Small differences in other model steps may be expected, but some other show bigger differences than I expected.

I should note that the no chunking settings run was from two weeks ago and did not set the numba and openblaus environment variables to 0 (but the chunking run did), so it may be a theory as to why.

@dhensle
Copy link
Contributor

dhensle commented Jun 18, 2024

Performed again on an RSG machine and got similar results (compare the below memory profile to #6 (comment) which was run on the same machine immediately before.)

Set explicit_chunk: 0.25 for accessibilities, scheduling models, trip destination, parking location, and write trip matrices

Looks like some investigation on trip destination spikes is warranted as the peak did not drop much there.

Overall run time increased from 1267 mins without chunking to 1291 mins with chunking, or 21.1 to 21.5 hrs. Memory peak went from 442 GB to 321 GB.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants