Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of mexico-city survey data for scenario generation #26

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

simei94
Copy link
Contributor

@simei94 simei94 commented Dec 7, 2023

With this PR a new dataformat for surveys "eodmx" is added. The code, which uses the data formats is adapted, such that it can handle the new data format. The survey EOD2017 (Encuesta Origen Destino) is undertaken for the metropolitan area of Mexico City (ZMVM) by INEGI (Instituto Nacional de Estadística y Geografía), the mexican secretary for statistics and geography.

@simei94 simei94 marked this pull request as draft December 7, 2023 18:43
@simei94
Copy link
Contributor Author

simei94 commented Dec 7, 2023

@rakow What do you think about merging this branch into master? To be able to handle the mexican dataset I had to perform some changes on the general scripts (preparation.py, init.py ...). So it would cost us / me some more work to make the changes on the general scripts modifiable or better said to make the general data handling script more flexible -> able to handle a wider spectrum of specific datasets, which are not assuming the application of german law (like MID and SrV).
Another ooption would be to copy the code of this branch to the matsim-mexico-city scenario, which then basically has duplicated code of this contrib, which I personally find rather ugly..

@simei94 simei94 changed the title Handling of mexico-city survex data for scenario generation Handling of mexico-city survey data for scenario generation Dec 7, 2023
@rakow
Copy link
Collaborator

rakow commented Dec 11, 2023

Thank you, I really like the idea to make the scripts more generally applicable. I will take a look at what you did in the next weeks.

@simei94
Copy link
Contributor Author

simei94 commented Dec 12, 2023

Whenever you find the time feel free to contact me about this as I already have some ideas on what segments to generalize.

matsim/scenariogen/__main__.py Outdated Show resolved Hide resolved
matsim/scenariogen/data/__init__.py Show resolved Hide resolved
matsim/scenariogen/data/__init__.py Outdated Show resolved Hide resolved
matsim/scenariogen/data/__init__.py Outdated Show resolved Hide resolved
matsim/scenariogen/data/__init__.py Outdated Show resolved Hide resolved
matsim/scenariogen/data/formats/eodmx.py Show resolved Hide resolved
@@ -17,11 +17,13 @@ def prepare_persons(hh, pp, tt, augment=5, max_hh_size=5, core_weekday=False, re

# Augment data using p_weight
if augment > 1:
df = augment_persons(df, augment)
# in the cdmx case we do not need to do p_weight * augment = 5 (see method augment_persons)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare_persons should probably be split into multiple function so you can only use these parts that you want in your scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you do this by defining sub-methods / -functions inside of prepare_persons? I can try to do that if that's the way you want to go

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to do it, as it requires changing the API and design a little bit.

@@ -309,6 +313,7 @@ class Person:
present_on_day: bool
reporting_day: int
n_trips: int
home_district: str = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should belong to the household ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Household already has location and geometry. Is an additional attribute needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, BUT for the simple routing in the next activity sampling step (because survey data does not provide leg length) this information is needed. It is added to the persons, because I do not want to have to read the whole households.csv in the next step just for one parameter (as the persons / activities datasets already are huge files).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem, but I generally don't like duplicating information. CSV reading should be superfast, is it really a concern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are talking about 4GB combined only for persons.csv and activities.csv already.. Therefore I cannot run it on my hardware and have to run it on the math cluster, which is annoying for debugging and testing. You have to take into account that we are talking about an area with about 20 million inhabitants, which is way above what we are usually handling (Berlin Brandenburg e.g.)

matsim/scenariogen/data/__init__.py Outdated Show resolved Hide resolved
matsim/scenariogen/data/__init__.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants