-
Notifications
You must be signed in to change notification settings - Fork 99
Project Meeting 2024.09.12
Jeffrey Newman edited this page Sep 15, 2024
·
1 revision
Purpose: The meeting focused on reviewing the scope and questions related to estimation improvements for the ActivitySim project, specifically discussing enhancements to runtime and usability in estimation mode.
-
Estimation Mode Improvements:
-
Runtime Enhancements:
- Addition of multiprocessing to improve estimation speed.
- Reducing the quantity of data written out, particularly for destination choice models, where previously all alternatives (zones/mazs) were included. The update aims to limit the output to sampled alternatives only.
- Change in file formats from CSV to more efficient formats like Parquet and Pickle.
-
Usability Improvements:
- Tools to allow easier testing of model specifications.
- Intelligent error-checking and reporting for model specification issues.
- Introduction of a predict functionality that takes new coefficients and applies them to existing data.
-
-
Comparing Models and Data:
-
SANDAG vs. MTC Data:
- SANDAG data: Real-world data from two survey sets (2016, 2022) with complex models. Larger zone systems (20k+ MAZ) but fewer households.
- MTC data: Smaller synthetic data, used for continuous integration (CI) testing, primarily in San Francisco with around 190 zones.
- Discussion on advantages and limitations of using synthetic data (MTC) vs. real data (SANDAG), particularly for large-scale testing and error reporting.
-
SANDAG vs. MTC Data:
-
Approach Moving Forward:
-
Initial Development:
- Focus on using MTC’s synthetic data for development due to ease of CI integration and scalability.
-
Testing on SANDAG:
- Once development is stable on MTC, the proposal is to test it on SANDAG data to ensure robustness for larger models and more complex real-world scenarios.
- Concerns were raised about ensuring that improvements scale properly to larger datasets like SANDAG.
-
Initial Development:
-
Continuous Integration (CI) Discussion:
- MTC data can be used publicly for CI testing, but using real-world SANDAG data may raise privacy concerns (e.g., PII from smaller zones).
- Potential solution: Use real-world data for testing but synthetic data for public CI testing.
- Discussion on whether to explore a private CI environment, though it would come with additional costs.
-
Budget Constraints:
- Current funds do not cover full processing of SANDAG data, particularly trip-level data. Any further development beyond the current scope would require additional budget.
- Joe Flood indicated that while additional funds are unlikely for FY25, they will explore the possibility for FY26.
- RSG team to develop a clear proposal for handling synthetic vs. real data testing to ensure comprehensive coverage and robust software testing.
- Joe Flood to follow up with Bhargav regarding potential additional funding for SANDAG data processing.