-
Notifications
You must be signed in to change notification settings - Fork 117
Description
Using Awkward Array in a future version of ActivitySim could provide meaningful benefits because ActivitySim has to manipulate very large, hierarchical, and heterogeneous datasets of agents, tours, trips, and households. Awkward Array is designed specifically to make these types of nested, irregular data structures efficient and expressive, while retaining the performance advantages of NumPy-style array programming.
Here are the specific reasons why Awkward Array could be a good fit for ActivitySim:
1. Efficient Representation of Nested Structures
- ActivitySim’s data model is hierarchical: households → persons → tours → trips.
- Currently, these relationships are managed by separate flat tables (Pandas DataFrames), with foreign keys linking them.
- Awkward Array directly supports jagged/nested arrays, so a household array could contain variable-length lists of persons, each person variable-length lists of tours, etc.
- This reduces the need for repeated joins/merges across flat tables, which are computationally expensive at ActivitySim’s scale (millions of agents).
2. Vectorized Operations on Irregular Data
- One of ActivitySim’s performance bottlenecks is applying vectorized choice models to irregularly nested data (e.g., each person’s set of available tours, or each trip’s possible modes).
- With Pandas/NumPy, you often have to “explode” or broadcast arrays to a rectangular shape before applying vectorized math, which increases memory use.
- Awkward Array allows you to apply NumPy-style ufuncs directly to jagged arrays without exploding them, meaning you can do calculations “in place” on each agent’s unique set of alternatives.
- This would let ActivitySim express model steps more directly and run faster with less memory overhead.
3. Interoperability with NumPy, Numba, and Machine Learning Frameworks
- Awkward Arrays are NumPy-compatible, so ActivitySim’s some existing code for mathematical operations might be able to run unchanged.
- They are also Numba JIT–friendly, which aligns with ActivitySim’s emphasis on performance.
- They can be converted easily to Torch or TensorFlow tensors, opening the door for deep learning–based choice models to be integrated into the pipeline without extensive reshaping.
4. Memory Efficiency for Large-Scale Models
- Regional and statewide ActivitySim implementations already push memory limits when simulating tens of millions of trips.
- Awkward Array stores nested lists compactly in contiguous buffers, avoiding the overhead of Python objects or repeated index merges.
- This would make scaling ActivitySim to very large geographies more feasible on commodity hardware.
5. Cleaner Expression of Model Logic
-
Currently, ActivitySim users and developers have to juggle multiple DataFrames and index alignment. For example:
- Get a household attribute, broadcast it to persons, join with tour-level data, then filter by trip.
-
With Awkward Arrays, you can write something like
households.people.tours.trips.mode_choicedirectly. -
This would make the model specification closer to how practitioners conceptually think about the hierarchy, improving readability and reducing bugs.
6. Support for Variable-Length Choice Sets
- Discrete choice models in ActivitySim often involve individualized choice sets (e.g., the set of transit alternatives available to a specific trip).
- Representing these in rectangular arrays requires padding or exploding, which is inefficient.
- Awkward Array naturally represents each agent’s unique choice set length, and lets you apply vectorized utilities and logit formulas across them without flattening.
7. Pathway to GPU Acceleration
- Because Awkward Array is designed to work with Arrow buffers and can integrate with libraries like CuPy, it provides a future path to GPU acceleration of ActivitySim’s most computationally intensive steps.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status