Handling of mexico-city survey data for scenario generation #26

simei94 · 2023-12-07T18:43:14Z

With this PR a new dataformat for surveys "eodmx" is added. The code, which uses the data formats is adapted, such that it can handle the new data format. The survey EOD2017 (Encuesta Origen Destino) is undertaken for the metropolitan area of Mexico City (ZMVM) by INEGI (Instituto Nacional de Estadística y Geografía), the mexican secretary for statistics and geography.

simei94 · 2023-12-07T18:48:13Z

@rakow What do you think about merging this branch into master? To be able to handle the mexican dataset I had to perform some changes on the general scripts (preparation.py, init.py ...). So it would cost us / me some more work to make the changes on the general scripts modifiable or better said to make the general data handling script more flexible -> able to handle a wider spectrum of specific datasets, which are not assuming the application of german law (like MID and SrV).
Another ooption would be to copy the code of this branch to the matsim-mexico-city scenario, which then basically has duplicated code of this contrib, which I personally find rather ugly..

rakow · 2023-12-11T08:23:33Z

Thank you, I really like the idea to make the scripts more generally applicable. I will take a look at what you did in the next weeks.

simei94 · 2023-12-12T01:25:05Z

Whenever you find the time feel free to contact me about this as I already have some ideas on what segments to generalize.

matsim/scenariogen/__main__.py

matsim/scenariogen/data/__init__.py

matsim/scenariogen/data/formats/eodmx.py

rakow · 2024-01-03T15:38:39Z

matsim/scenariogen/data/preparation.py

@@ -17,11 +17,13 @@ def prepare_persons(hh, pp, tt, augment=5, max_hh_size=5, core_weekday=False, re

    # Augment data using p_weight
    if augment > 1:
-        df = augment_persons(df, augment)
+        # in the cdmx case we do not need to do p_weight * augment = 5 (see method augment_persons)


prepare_persons should probably be split into multiple function so you can only use these parts that you want in your scenario

Would you do this by defining sub-methods / -functions inside of prepare_persons? I can try to do that if that's the way you want to go

I will try to do it, as it requires changing the API and design a little bit.

rakow · 2024-01-03T15:42:34Z

matsim/scenariogen/data/__init__.py

@@ -309,6 +313,7 @@ class Person:
    present_on_day: bool
    reporting_day: int
    n_trips: int
+    home_district: str = ""


This should belong to the household ?

Household already has location and geometry. Is an additional attribute needed?

You are right, BUT for the simple routing in the next activity sampling step (because survey data does not provide leg length) this information is needed. It is added to the persons, because I do not want to have to read the whole households.csv in the next step just for one parameter (as the persons / activities datasets already are huge files).

I see the problem, but I generally don't like duplicating information. CSV reading should be superfast, is it really a concern?

Yes, we are talking about 4GB combined only for persons.csv and activities.csv already.. Therefore I cannot run it on my hardware and have to run it on the math cluster, which is annoying for debugging and testing. You have to take into account that we are talking about an area with about 20 million inhabitants, which is way above what we are usually handling (Berlin Brandenburg e.g.)

matsim/scenariogen/data/__init__.py

simei94 added 3 commits December 6, 2023 11:53

mexico city activity + persons sampling from survey data

89209d0

add mexico city metropolitan area format

f1856fe

hotfix

271a7df

simei94 marked this pull request as draft December 7, 2023 18:43

simei94 changed the title ~~Handling of mexico-city survex data for scenario generation~~ Handling of mexico-city survey data for scenario generation Dec 7, 2023

delete non-necessary code segment

c2a02bd

some added data + more detailed trip validation

b6c8ebe

simei94 added 6 commits December 12, 2023 10:30

delete end_time

b6c1575

correction of id assignment

38c95e6

comments + corrections of data handling

7db47b0

add column

a513794

do not save persons which have only invalid trips

617de18

colectivo handled separately from other pt modes

fcb0bf9

rakow reviewed Jan 3, 2024

View reviewed changes

simei94 added 5 commits January 18, 2024 14:19

changes after comments on PR

1732106

delete ride from trip mode determination

4174380

bugfix

d093741

change trip purpose transport to accomp

f2d1d52

fix deleted arrival

925950e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of mexico-city survey data for scenario generation #26

Handling of mexico-city survey data for scenario generation #26

simei94 commented Dec 7, 2023

simei94 commented Dec 7, 2023

rakow commented Dec 11, 2023

simei94 commented Dec 12, 2023

rakow Jan 3, 2024

simei94 Jan 18, 2024

rakow Jan 18, 2024

rakow Jan 3, 2024

rakow Jan 3, 2024

simei94 Jan 18, 2024

rakow Jan 18, 2024

simei94 Jan 18, 2024

Handling of mexico-city survey data for scenario generation #26

Are you sure you want to change the base?

Handling of mexico-city survey data for scenario generation #26

Conversation

simei94 commented Dec 7, 2023

simei94 commented Dec 7, 2023

rakow commented Dec 11, 2023

simei94 commented Dec 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment