aperta.od_pairs

Build tiered origin-destination tables for a routable network.

The names cells and zones describe tier roles, not specific spatial units. A cell is the finest analysis unit (typically hectare-scale or smaller — H3 res-9/10 hexes, 100 m square grids, individual buildings); a zone is a coarser aggregation (typically neighbourhood / municipality scale — a traffic-analysis zone, a census tract, an H3 res-7/8 hex). The library imposes no constraint on what each tier represents in the real world; what matters is that cells ⊂ zones in a many-to-one sense. Pick whatever instantiations match the analytical question; the tier names just label the roles.

get_pairs returns a TieredODPairs with up to three OD dicts at three combinations of origin / dest resolution:

cells_to_cells: cell_node -> [cell_nodes] # close (cell origin, cell dest) cells_to_zones: cell_node -> [zone_nodes] # medium (cell origin, zone dest) zones_to_zones: zone_node -> [zone_nodes] # far (zone origin, zone dest)

The three tiers trade off per-OD-pair precision against storage size:

  • cells_to_cells: highest precision (both endpoints at cell resolution). Most expensive per OD pair — kept only for close distances (d < r_cells).

  • cells_to_zones: preserves cell-origin precision (the cell asking matters) while aggregating dests at zone level. Cheaper to store than per-cell dest arrays. Used at medium distances (r_cells ≤ d < r_medium).

  • zones_to_zones: both endpoints at zone resolution. Cheapest — fewer origins (per-zone, not per-cell) AND fewer dests per origin. Used at far distances (d ≥ r_medium).

The per-cell origin precision in cells_to_zones matters when zone diameter is meaningful relative to trip cost. At ~1 km zone diameter and ~10 km medium-tier radius, cell variation within a zone is ~10% of trip cost — not noise.

Tier rule (per ordered cell c, c’ with parent zones Z, Z’):

d(c, c’) < r_cells → cell tier (cells_to_cells) r_cells ≤ d(c, c’) < r_medium → cells_to_zones (cell c → zone(c’)) d(c, c’) ≥ r_medium → zones_to_zones (zone(c) → zone(c’)) else (beyond outer cutoff) → drop

Symmetry: distances are symmetric, so tier assignment is too.

Architectural note: aperta used to have a four-tier scheme with a separate zones_to_regions slot keyed by zones with region-level dests. Replaced 2026-05-27 by the cell-origin precision tier (cells_to_zones), which preserves per-cell origin variation in the medium-distance regime where it matters most. Drops one geo layer (regions) for a simpler API. See memory aperta-flat-refactor-plan for the design discussion.

class aperta.od_pairs.TieredODPairs(cells_to_cells=None, cells_to_zones=None, zones_to_zones=None)[source]

Bases: object

Abstract base for tiered origin-destination pair tables.

Holds three per-tier dict-of-arrays. The two concrete subclasses differ in what the dict KEYS represent — see [[TieredODNodePairs]] and [[TieredODGeoPairs]] below. Functions that don’t care about key space (e.g. make_mask, __repr__, describe) accept this base type.

Any tier may be None when the corresponding tier wasn’t requested or produced. cells_to_cells is populated by get_pairs in normal usage, but intermediate / utility constructors may leave it unset.

Parameters:
  • cells_to_cells (dict | None)

  • cells_to_zones (dict | None)

  • zones_to_zones (dict | None)

cells_to_cells: dict | None = None
cells_to_zones: dict | None = None
zones_to_zones: dict | None = None
describe()[source]

Print and return a richer per-tier summary than repr().

Always shows origin and destination counts. For tiers whose values are numeric (typical for cost / distance / weight TieredODPairs), also shows mean, median, 5-95th percentile, and min-max. Non-finite entries (np.inf for unreachable / masked-out, np.nan) are excluded from the stats and reported separately. For bool-typed tiers (masks), reports the True count and rate instead.

Goes via print (not logging) so it is always visible regardless of logging config, and returns the string so the caller can route it elsewhere (e.g. into a log file).

Return type:

str

class aperta.od_pairs.TieredODNodePairs(cells_to_cells=None, cells_to_zones=None, zones_to_zones=None)[source]

Bases: TieredODPairs

Tiered OD pairs keyed by network node IDs of one mode’s graph.

Dict keys at every tier are network node IDs; per-origin arrays carry destination node IDs (for get_pairs) or per-OD-pair values (costs, weights, masks, distances) aligned to those dest IDs.

Produced by get_pairs, routing.tiered_path_costs, routing.tiered_path_aggregate, dest_values, get_euclidean_dists, make_mask, overhead.add_node_overheads, utility.route_utility, utility.add_endpoint_utility. The default working representation for single-mode pipelines — lightweight, no fan-out.

Accessibility metrics consuming a TieredODNodePairs return node-indexed DataFrames. For per-cell accessibility output, per-cell origin overhead, or cross-modal aggregation, lift to TieredODGeoPairs via od_pairs.reindex_by_geo_unit.

Parameters:
  • cells_to_cells (dict | None)

  • cells_to_zones (dict | None)

  • zones_to_zones (dict | None)

class aperta.od_pairs.TieredODGeoPairs(cells_to_cells=None, cells_to_zones=None, zones_to_zones=None)[source]

Bases: TieredODPairs

Tiered OD pairs keyed by geo-unit IDs (cell_id / zone_id).

Dict keys are mode-agnostic geo-unit IDs:

cells_to_cells:    cell_id  ->  array of dest cell_ids
cells_to_zones:    cell_id  ->  array of dest zone_ids
zones_to_zones:    zone_id  ->  array of dest zone_ids

Created via od_pairs.reindex_by_geo_unit from a TieredODNodePairs + cells (+ optional zones). Required input to:

  • od_pairs.aggregate_across_modes for cross-modal accessibility,

  • accessibility metrics that should return cell/zone-indexed output,

  • add_geo_overheads / add_origin_cell_overhead for geo-unit-keyed overhead baking.

Heavier than TieredODNodePairs (multiple cells sharing a node fan out into per-cell entries), but mode-agnostic by construction: cell IDs are the same across modes, so per-mode geo-keyed ODMs align directly.

Parameters:
  • cells_to_cells (dict | None)

  • cells_to_zones (dict | None)

  • zones_to_zones (dict | None)

aperta.od_pairs.build_cell_to_zone_node_map(cells, zones, node_column)[source]

Build the {cell_node -> zone_node} lookup that tiered helpers use to find each origin cell’s parent zone (which keys zones_to_zones).

Cells without a mapped network node, or whose zone has no mapped network node, are omitted (they can’t participate in zone-tier sampling).

Parameters:
Return type:

dict

aperta.od_pairs.make_mask(values, rule)[source]

Build a boolean-mask TieredODPairs by applying rule to every per-origin value array.

rule is a vectorized callable: it takes a 1-D numpy array and returns a bool array of the same length, e.g. lambda d: d < 50_000 to keep only pairs with distance under 50 km.

The returned TieredODPairs has the same structure as values (same origins, same per-origin array lengths) but with bool arrays. Pass it as mask= to routing.tiered_path_costs, traffic_flows.nested_node_sample, and other tiered helpers to ignore False entries.

Tiers that are None in values stay None in the result.

Parameters:
Return type:

TieredODPairs

aperta.od_pairs.get_pairs(cells, r_cells, node_column, *, zones=None, r_zones=None, r_medium=None, zones_centroids=None, orig_cells=None, dest_cells=None, dest_zones=None)[source]

Build a tiered OD-pair table with cells_to_cells + cells_to_zones + zones_to_zones tiers.

Tier classification uses ZONE-PAIR distance (for clean mutual exclusion):

d(Z, Z') < r_cells            → cells_to_cells (close)
r_cells ≤ d(Z, Z') < r_medium → cells_to_zones (medium — cell origin, zone dest)
r_medium ≤ d(Z, Z') < r_zones → zones_to_zones (far — zone origin, zone dest)
else                          → drop

The middle tier preserves cell-origin precision (different cells in the same zone get separate per-origin dest arrays) while aggregating dests at zone level. Important when zone diameter is meaningful relative to the medium-tier radius — cell variation within a zone is ~10% of trip cost at ~10 km medium-tier distances.

Input contract: cells must have node_column (and zone_id if zones is given). zones, if provided, must also have node_column. NaN values in node_column mean the cell / zone contributes no destinations.

Parameters:
  • cells (GeoDataFrame) – cell-level GeoDataFrame.

  • r_cells (float) – per-zone-pair distance threshold (CRS units, typically metres) for the cell tier. Zone pairs closer than this emit per-cell OD pairs.

  • node_column (str) – column name on cells/zones holding the network node ID.

  • zones (GeoDataFrame | None) – optional zones GeoDataFrame to enable the medium and far tiers.

  • r_zones (float | None) – per-zone-pair OUTER distance threshold for the far (zones_to_zones) tier. Required iff zones is given.

  • r_medium (float | None) – per-zone-pair distance threshold separating the middle (cells_to_zones) and far (zones_to_zones) tiers. Must satisfy r_cells ≤ r_medium ≤ r_zones. Optional: when None, auto-inferred as min(r_cells * 10, r_zones) — a reasonable default for the typical case where the medium-tier sweet spot is ~10× the cell-tier radius (e.g. r_cells=1 km, r_medium=10 km, for car). Set explicitly to control the medium-vs-far storage trade-off.

  • zones_centroids (GeoSeries | None) – optional custom zone centroids (e.g. population-weighted). Falls back to zones.geometry.centroid.

  • orig_cells (Series | ndarray | None) – optional boolean mask (Series or numpy array) aligned with cells.index. When provided, only cells where the mask is True act as origins; cells where False contribute no OD pairs FROM them. None (default) treats every cell as an origin.

  • dest_cells (Series | ndarray | None) – optional boolean mask aligned with cells.index. When provided, only cells where True are emitted as cell-tier destinations (other cells can still be routed TO at zone tier). None = every cell is a valid cell-tier destination.

  • dest_zones (Series | ndarray | None) – optional boolean mask aligned with zones.index. When provided, only zones where True are emitted as middle- or far-tier destinations. None = every zone is valid.

  • analyses (The mask filters are critical for large-area) – when most

  • in (of the area has no opportunity-of-interest (e.g. supermarkets exist)

  • cells) (~1% of)

  • counts (filtering to relevant destinations drops OD-pair)

  • accordingly. (by 1-2 orders of magnitude and routing time)

Returns:

TieredODPairs with cells_to_cells always populated, plus cells_to_zones and zones_to_zones if zones is given (either may be empty / absent if the corresponding annulus is empty — e.g. when r_medium == r_zones, the far tier is empty).

Return type:

TieredODPairs

aperta.od_pairs.node_values(column, df, node_column, node_list)[source]

Single-tier lookup of column for every node in node_list.

Parameters:
  • column (str) – column on df whose value to return for each node.

  • df (DataFrame) – source DataFrame.

  • node_column (str) – column on df holding the network node ID for each row.

  • node_list (Series | list | ndarray) – node IDs to look up. Returned array is position-aligned with this input.

Return type:

ndarray

aperta.od_pairs.dest_values(column, pairs, cells, node_column, zones=None, *, dtype=<class 'numpy.float32'>)[source]

Look up column for every destination in pairs, tier by tier.

Returns a TieredODPairs of value arrays paired position-wise with the input destination arrays. The middle tier (cells_to_zones) and the far tier (zones_to_zones) both look values up in zones[column] at the zone node IDs.

Conservation invariant: if column is additive (e.g. population) and is consistently aggregated cells → zones, then for each origin cell i the sum of values across all three tiers equals the total of cells[column] within the routing cutoff (no double-counting across tiers).

Parameters:
  • column (str) – see above.

  • pairs (TieredODPairs) – see above.

  • cells (DataFrame) – see above.

  • node_column (str) – see above.

  • zones (DataFrame | None) – see above.

  • dtype (dtype | type) – dtype of returned value arrays (default np.float32 — matches the default for routing.tiered_path_costs so that downstream accessibility arithmetic stays in FP32 end-to-end). Missing destinations get np.nan, so dtype must be a float dtype.

Return type:

TieredODPairs

aperta.od_pairs.reindex_by_geo_unit(pairs, odm, cells, *, cell_node_column, zones=None, zone_node_column=None, r_cells=None, r_medium=None, r_zones=None, zones_centroids=None)[source]

Convert a node-keyed (pairs, odm) pair into geo-unit-keyed form.

Keys at each tier become:

cells_to_cells : cell_id (from cells.index) → cell_id dest array cells_to_zones : cell_id → zone_id dest array zones_to_zones : zone_id (from zones.index) → zone_id dest array

Dest arrays are sorted by ID per origin — this canonical ordering enables cross-mode alignment in aggregate_across_modes (different modes produce different node-level snapping, but their geo-keyed forms align on the shared cell / zone ID universe).

Fan-out: each (origin_node, dest_node) entry in the input expands to |cells at origin_node| × |cells at dest_node| entries at cell tier (same pattern at zone tier). Memory cost scales with average units-per-node.

Per-cell tier filtering (strongly recommended when zones is given):

Pass r_cells, r_medium, and (for z2z) r_zones to apply each origin cell’s own zone-pair tier classification at reindex time.

Without this filter, a snap node N that serves cells across multiple zones inherits the union of those zones’ tier dest-sets in the node-keyed pairs (see get_pairs). At reindex, every cell on N receives that union — so the same destination zone can end up in both c2c (as individual cells) and c2z (as aggregated zone) for the same origin cell, double-counting it in cell-level logsums. The filter restricts each cell’s output to its own zone’s tier sets; nothing is lost (every kept destination was routed) and the c2c/c2z/z2z tiers become mutually disjoint per cell.

Parameters:
  • pairs (TieredODNodePairs) – node-keyed destination-ID table from get_pairs.

  • odm (TieredODNodePairs | None) – node-keyed cost / utility / value ODM aligned to pairs. None to reindex only pairs (returns (new_pairs, None)).

  • cells (DataFrame) – cell-level DataFrame, indexed by cell_id. Must have cell_node_column. When the per-cell tier filter is active (r_cells/r_medium given), must also have zone_id.

  • cell_node_column (str) – column on cells carrying the cell-tier network node ID.

  • zones (DataFrame | None) – optional zones DataFrame indexed by zone_id. Required iff pairs.cells_to_zones or pairs.zones_to_zones is set.

  • zone_node_column (str | None) – column on zones carrying the zone-tier network node ID. Required iff zones is given.

  • r_cells (float | None) – per-zone-pair cell-tier radius (CRS units). Pass the same value used in get_pairs. Enables per-cell tier filtering when given alongside r_medium.

  • r_medium (float | None) – per-zone-pair c2z outer radius (CRS units). Same value as in get_pairs. Required iff r_cells is given.

  • r_zones (float | None) – per-zone-pair z2z outer radius (CRS units). Same value as in get_pairs. Required to filter the z2z tier; if omitted while z2z is present, a warning is issued and z2z is reindexed unfiltered.

  • zones_centroids (GeoSeries | None) – optional custom zone centroids (e.g. population- weighted). Falls back to zones.geometry.centroid. Only used when the per-cell tier filter is active.

Returns:

(new_pairs, new_odm) — both TieredODGeoPairs (or (new_pairs, None) if odm was None). Tiers absent from pairs stay None.

Return type:

tuple[TieredODGeoPairs, TieredODGeoPairs | None]

aperta.od_pairs.dest_values_geo(column, pairs, cells, zones=None, *, dtype=<class 'numpy.float32'>)[source]

Look up column for every destination in a geo-keyed pairs, per tier.

The geo-keyed twin of dest_values. Because destinations in TieredODGeoPairs are already individual geo-units (no node-level aggregation), each tier just looks up the value column on the matching DataFrame — no per-node summing. Structurally simpler and more honest than the node-keyed version: no implicit “many cells share a node, sum their values” assumption baked in.

Parameters:
  • column (str) – name of the value column to look up. Must be present on cells (and on zones for tiers that use it).

  • pairs (TieredODGeoPairs) – geo-keyed destination-ID table (typically from reindex_by_geo_unit).

  • cells (DataFrame) – cell-level DataFrame indexed by cell_id.

  • zones (DataFrame | None) – optional zones DataFrame indexed by zone_id. Required iff pairs.cells_to_zones or pairs.zones_to_zones is set.

  • dtype (dtype | type) – dtype of returned value arrays (default np.float32 — matches the default for routing.tiered_path_costs so downstream accessibility arithmetic stays in FP32 end-to-end). Missing destinations get np.nan, so dtype must be a float dtype.

Returns:

TieredODGeoPairs of value arrays, paired position-wise with the input destination arrays.

Return type:

TieredODGeoPairs

aperta.od_pairs.aggregate_across_modes(odms, *, aggregator='min', scale=1.0)[source]

Aggregate per-mode geo-keyed cost ODMs into a combined cost ODM.

Enables cross-modal accessibility metrics where the aggregation across modes happens inside the accessibility computation rather than externally to it. Inputs must be TieredODGeoPairs — different modes typically live on different graphs (different node ID universes), but their geo-unit IDs are shared, so alignment is only possible in geo-unit space. Use reindex_by_geo_unit to lift per-mode node-keyed ODMs first.

Each mode contributes (pairs, costs):

  • pairs: geo-keyed TieredODGeoPairs of dest unit IDs.

  • costs: geo-keyed TieredODGeoPairs of cost values aligned to pairs.

For each (origin, dest_unit) pair in the UNION across modes:

  • If a mode has the pair, use its cost.

  • If a mode is missing it (origin not in the mode, or dest not in the mode’s per-origin dest array), fill with +inf (“unreachable by this mode”).

Then apply the aggregator across modes to produce a single combined cost.

Three aggregator semantics:

  • `’min’` (default): per OD pair, take the minimum cost across modes. Use case: “how reachable is each destination under the fastest available mode?” Combine with cumulative_opportunities for “destinations within X min by ANY mode”; with gravity or nearest_k for fastest-mode variants.

  • `’logsum’`: per OD pair, compute -scale * ln Σ_m exp(-cost_m / scale) — the discrete-choice log-sum-cost across modes. scale is the nest scale parameter (θ); defaults to 1.0, which gives the canonical -ln Σ exp(-U) when the per-mode cost is interpreted as utility (positive = disutility). Combine with gravity(beta=1, family=’exp’) to produce the canonical cross-modal logsum accessibility.

  • Custom callable: takes a (n_modes, n_dests) numpy array and returns a (n_dests,) array. Use for any aggregator not covered above (weighted average, max, etc.).

Sign convention: per-mode costs should be positive disutilities (travel time, generalised cost, negated utility). For utility-as-benefit conventions (positive = attractive), negate before passing.

Parameters:
  • odms (dict[str, tuple[TieredODGeoPairs, TieredODGeoPairs]]) – {mode_name -> (pairs, costs)}. Must be non-empty. Both pairs and costs must be TieredODGeoPairs. Tier structure (which tiers are populated) must be consistent across modes.

  • aggregator (str | Callable) – ‘min’, ‘logsum’, or a callable.

  • scale (float) – nest scale parameter for ‘logsum’ aggregation. Ignored for other aggregators.

Returns:

(union_pairs, combined_costs) — both TieredODGeoPairs. The union_pairs carries the per-origin union of dest IDs across modes (sorted canonically); combined_costs is aligned to it. NaN/inf are handled per-aggregator (‘min’ skips NaN, treats inf as finite-worst; ‘logsum’ treats both as “mode contributes nothing to the sum”).

Return type:

tuple[TieredODGeoPairs, TieredODGeoPairs]

aperta.od_pairs.get_euclidean_dists(nodes, pairs, *, dtype=<class 'numpy.float32'>)[source]

Euclidean origin→destination distance for every pair in pairs, per tier.

nodes must cover every node ID referenced anywhere in pairs (cell and zone nodes). Distance is in the units of nodes’ CRS.

Parameters:
Return type:

TieredODPairs

aperta.od_pairs.max_cost(costs)[source]

Largest finite value across all tiers of costs.

Designed as a one-liner upper bound to pass as cutoff= to downstream routing helpers (routing.tiered_path_costs, traffic_flows.estimate_edge_flows, network_processing.get_nested_edge_betweenness, etc.) when the routing targets are guaranteed to live inside this costs ODM — e.g. the canonical traffic-flows workflow:

costs  = routing.tiered_path_costs(pairs, g, weight)
sample = traffic_flows.nested_node_sample(pairs, weights, costs, ...)
flows  = traffic_flows.estimate_edge_flows(g, weight, expected_km, sample,
                           cutoff=od_pairs.max_cost(costs))

Correctness-preserving: every (origin, dest) pair in costs is reachable within this distance by definition, so a downstream csg.dijkstra(limit=cutoff) won’t clip any pair the caller cares about. Loose vs the per-origin or per-sampled-pair max — but still captures the bulk of the cutoff speedup at zero plumbing cost.

Non-finite entries (np.inf for unreachable, np.nan) are ignored. Returns 0.0 if every tier is empty / all-unreachable — safe to pass as a cutoff (means “route nothing”).

Parameters:

costs (TieredODPairs)

Return type:

float