`aperta.routing`¶

Routing primitives over networkx.Graph (and its multi/directed variants). Two engines, picked by query shape:

one-to-many / many-to-many: scipy CSR Dijkstra (with optional cutoff). tiered_path_costs, tiered_path_aggregate. Scipy’s per-origin SSSP amortizes across all targets sharing that origin and supports a Dijkstra limit= to bound the frontier expansion.

one-to-one: networkx bidirectional Dijkstra. Empirically faster than scipy CSR for unique-source per-trip queries (bidirectional pruning halves the explored node count; no per-call C-boundary overhead). shortest_path_metrics_one_to_one (with edge-feature aggregation), shortest_path_costs_one_to_one (cost-only, lean).

The module also exposes the edge-weighting layer used by both engines: apply_edge_weights runs a user-supplied callable on each edge and writes the result to a named edge attribute. mask_excluded_edges wraps a weight callable to return cost = ∞ for edges flagged by routing_prep.prepare_network as non-traversable for a given mode.

Why no contraction-hierarchy backend (Pandana / OSRM): aperta’s routing workflow is one-shot Dijkstra on a live graph whose edge weights are routinely mutated (calibration loop, scenario comparison, time-of-day variants). Preprocess-once-reuse-many fights this pattern.

A future RoutingProfile class will bundle the duration callable + parameters + graph into one object; for now, callers compose the pieces.

aperta.routing.apply_edge_weights(graph, weight_fn, weight, **fn_kwargs)[source]¶

Apply weight_fn to each edge of graph (mutates graph in place).

weight_fn receives the edge data dict plus any extra fn_kwargs. The dict supports row[‘key’] access just like a pandas Series, so callables written against a GeoDataFrame edge-row pattern work without modification.

Parameters:

graph (Graph)
weight_fn (Callable)
weight (str)

Return type:

None

aperta.routing.mask_excluded_edges(weight_fn, cost_excluded_flag)[source]¶

Wrap weight_fn so edges flagged cost_excluded_flag=True get cost = ∞.

Companion to routing_prep.prepare_network, which writes the per-edge boolean flag based on a mode’s cost_excluded_tags. The flag name is recorded on PreparedGraph.cost_excluded_flag; pass it here to bake the mode-specific edge exclusion into a routing-weight callable.

Typical use:

prepared = prepare_network(graph, “walk”) weight_fn = mask_excluded_edges(duration_walk_fn, prepared.cost_excluded_flag) apply_edge_weights(prepared.graph, weight_fn, “duration_walk_s”)

Edges with the flag missing or False flow through to weight_fn unchanged.

Parameters:

weight_fn (Callable)
cost_excluded_flag (str)

Return type:

Callable

aperta.routing.shortest_path_metrics_one_to_one(graph, trip_ids, origins, destinations, weight, length_attr='length', edge_features=None)[source]¶

Paired (origin, destination) routing via nx.bidirectional_dijkstra, with edge-feature aggregation along each realised path.

edge_features maps an edge attribute name to an aggregation:

‘sum’ : element-wise sum along the path
‘length_weighted’ : average weighted by edge length
‘duration_weighted’: average weighted by per-edge weight (i.e. the routing cost — typically duration)

Returns a DataFrame indexed by trip_id with columns:

distance (sum of length_attr along the path)
cost (sum of weight along the path)
one column per requested edge feature

Trips with no path (NetworkXNoPath) or missing endpoints (NodeNotFound) are silently dropped from the output (so output length ≤ input length). Self-pairs yield zero for every column.

Why bidirectional Dijkstra: aperta’s one-to-one workloads (calibration against ground-truth trips, validation against measured times) tend to have unique origins per trip, so origin-batching can’t be exploited. Bidirectional pruning halves the explored node count per query and runs entirely in Python without per-call C-boundary overhead — beats scipy CSR and igraph empirically on country-scale graphs.

For cost-only callers, use shortest_path_costs_one_to_one — same routing engine, no path-walk or feature-aggregation overhead.

Raises DataError if ‘distance’ or ‘cost’ appear in edge_features.

Parameters:

graph (Graph)
trip_ids (list | Series | ndarray)
origins (list | Series | ndarray)
destinations (list | Series | ndarray)
weight (str)
length_attr (str)
edge_features (dict[str, str] | None)

Return type:

DataFrame

aperta.routing.shortest_path_costs_one_to_one(graph, trip_ids, origins, destinations, weight)[source]¶

Cost-only one-to-one routing — lean variant of shortest_path_metrics_one_to_one.

Skips path reconstruction and per-edge feature aggregation entirely: uses the path-cost value nx.bidirectional_dijkstra already returns. ~2-3× faster end-to-end than the full-metrics function for callers that only want the routed cost (e.g. calibration / validation against measured durations).

Same per-trip drop semantics as the full-metrics function: trips with no path (NetworkXNoPath) or endpoints not in the graph (NodeNotFound) are silently absent from the output Series. Self-pairs (origin == dest) yield 0.0.

Parameters:

graph (Graph) – networkx graph; edges carry weight.
trip_ids (list | Series | ndarray) – per-trip identifier, becomes the output Series index.
origins (list | Series | ndarray) – per-trip node-id arrays. Must be the same length as trip_ids.
destinations (list | Series | ndarray) – per-trip node-id arrays. Must be the same length as trip_ids.
weight (str) – edge attribute name to use as routing cost.

Returns:

pd.Series[float] indexed by trip_id, named ‘cost’. Output length ≤ input length (unreachable / unknown-endpoint trips dropped).

Return type:

Series

aperta.routing.tiered_path_costs(graph, pairs, weight, *, mask=None, cutoff=None, dtype=<class 'numpy.float32'>)[source]¶

Shortest-path cost (sum of edge weight along the path) for every OD pair in pairs, across all tiers.

Single-process. This is the hot path for almost every aperta application — the closure-based inner loop is on purpose, not a refactor candidate (a module-level worker pattern adds per-origin dict lookups that measurably slow down single-process routing).

Every tier is routed across the same graph. All node IDs referenced anywhere in pairs — cell nodes (cells_to_cells keys + values), zone nodes (cells_to_zones values, zones_to_zones keys + values) — must therefore be present in graph.

Parameters:

graph (Graph) – networkx routable graph containing every node referenced in pairs. Converted internally to a scipy CSR matrix.
pairs (TieredODPairs) – TieredODPairs of destination IDs (typically from od_pairs.get_pairs).
weight (str) – edge attribute name used as the per-edge routing cost (e.g. ‘duration_naive’, ‘duration_traffic_iterative’).
mask (TieredODPairs | None) – optional boolean TieredODPairs (build via od_pairs.make_mask). Destinations where the mask is False are skipped and stored as np.inf in the output (same convention as unreachable). Output arrays keep the same length as the input pairs (position-wise alignment is preserved); use the mask itself to distinguish “masked-out” from “unreachable” if you care. Missing origins or missing tiers in the mask are treated as “no filter”.
cutoff (float | None) – optional network-distance cutoff in weight units (e.g. seconds for time-weighted edges, metres for length-weighted). Passed through to scipy.sparse.csgraph.dijkstra as limit=cutoff, truncating each per-origin Dijkstra at the cutoff. Big speed-up when the cutoff is small relative to graph diameter (e.g. walk accessibility on a country-scale graph). Destinations beyond cutoff are stored as np.inf — same convention as unreachable, so downstream metrics (cumulative_opportunities etc.) handle them naturally. Default None = no cutoff (limit=np.inf).
dtype (dtype | type) – dtype of returned cost arrays (default np.float32 — halves memory + on-disk size vs float64, with seconds-resolution precision more than sufficient for travel costs). Pass np.float64 if downstream arithmetic needs the extra range (e.g. logsum with very small scale parameter).

Returns:

TieredODPairs of cost arrays paired position-wise with pairs. Each unreachable, masked-out, or beyond-cutoff destination is stored as np.inf.

Return type:

TieredODPairs

class aperta.routing.PathAggregation(name, attribute, aggregator='sum')[source]¶

Bases: NamedTuple

Named per-edge feature aggregation along realised shortest paths.

name labels the corresponding output column in tiered_path_aggregate’s return dict. attribute extracts a per-edge value; aggregator combines those values into one scalar per OD pair.

attribute:

str: name of an edge attribute on the graph; the per-edge value is edge_data[attribute].
Callable[(u, v, data) -> float]: arbitrary per-edge function.

aggregator:

‘sum’: sum across path edges (returns 0 for an empty path).
‘mean’: arithmetic mean (returns NaN for an empty path).
‘min’, ‘max’: respective extremes (NaN for an empty path).
Callable[(np.ndarray) -> float]: arbitrary callable on the per-edge value array.

Parameters:

name (str)
attribute (str | Callable)
aggregator (str | Callable)

name: str¶: Alias for field number 0

attribute: str | Callable¶: Alias for field number 1

aggregator: str | Callable¶: Alias for field number 2

class aperta.routing.NodeAggregation(name, attribute, aggregator='sum', include_endpoints=True)[source]¶

Bases: NamedTuple

Named per-node feature aggregation along realised shortest paths.

Parallel to PathAggregation but for node attributes (e.g. counting traffic signals encountered, or finding the highest-elevation node along a route). The node sequence of a path is [u₀, u₁, …, uₙ]; include_endpoints controls whether the route’s origin (u₀) and destination (uₙ) nodes contribute.

name, aggregator: as in PathAggregation.

attribute:

str: name of a node attribute on the graph; the per-node value is node_data[attribute].
Callable[(node, data) -> float]: arbitrary per-node function.

include_endpoints:

True (default): all n+1 nodes contribute, including origin and destination. Risk: endpoints shared across many routes get amplified weight in cross-route counts.
False: interior nodes only (u₁ .. uₙ₋₁). Self-pair [u] and single-edge path [u, v] both yield an empty array → aggregator empty-path semantics apply (‘sum’ → 0; ‘mean’/’min’/’max’ → NaN).

Parameters:

name (str)
attribute (str | Callable)
aggregator (str | Callable)
include_endpoints (bool)

name: str¶: Alias for field number 0

attribute: str | Callable¶: Alias for field number 1

aggregator: str | Callable¶: Alias for field number 2

include_endpoints: bool¶: Alias for field number 3

aperta.routing.aggregate_along_paths(graph, paths, weight, *, edge_aggregations=(), node_aggregations=(), dtype=<class 'numpy.float32'>)[source]¶

Walk realised paths and aggregate per-edge / per-node features along each.

Pure path walker — no routing involved. Use this directly when you already have a list of paths (Strava traces, prebuilt routes, calibration targets, etc.). tiered_path_aggregate is the wrapper that routes shortest paths on a TieredODPairs and scatters results back into per-tier TieredODPairs outputs.

For each path:

cost = sum of weight along the path’s edges
each PathAggregation reduces per-edge attribute values
each NodeAggregation reduces per-node attribute values

paths semantics:

[] → unreachable: cost=`inf`, all aggs=`NaN`
[u] → self-pair: cost=0, edge aggs follow empty-array
semantics, node aggs follow each spec’s include_endpoints setting
[u, v, …] → multi-node path; cost + aggs walked normally

Parameters:

graph (Graph) – networkx graph used for edge / node attribute lookup. For MultiGraph / MultiDiGraph the min-weight parallel edge is used (matches the router’s choice).
paths (list[list]) – list of node-id sequences (lists). Node IDs must match graph keys.
weight (str) – edge attribute name used as the per-edge cost.
edge_aggregations (Sequence[PathAggregation]) – list of PathAggregation specs (per-edge).
node_aggregations (Sequence[NodeAggregation]) – list of NodeAggregation specs (per-node). At least one of edge_aggregations / node_aggregations must be non-empty. Names must be unique across both lists.
dtype (dtype | type) – dtype of returned arrays (default np.float32).

Returns:

costs: ndarray of shape (len(paths),). inf for unreachable; 0.0 for self-pairs.
aggregations_by_name: dict {name -> ndarray} with one entry per spec across both lists. Unreachable destinations are NaN.

Return type:

(costs, aggregations_by_name)

aperta.routing.tiered_path_aggregate(graph, pairs, weight, *, edge_aggregations=(), node_aggregations=(), mask=None, cutoff=None, dtype=<class 'numpy.float32'>)[source]¶

Route shortest paths and aggregate per-edge / per-node features along each.

Wraps aggregate_along_paths with routing on every tier of pairs. Memory cost matches tiered_path_costs for the cost component — paths are processed per-origin and discarded.

For the cost-only case (no aggregations needed), use tiered_path_costs directly: it can skip path retrieval (more expensive than distance retrieval) and is faster.

Parameters:

pairs (TieredODPairs) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.
graph (Graph) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.
weight (str) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.
mask (TieredODPairs | None) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.
cutoff (float | None) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.
dtype (dtype | type) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.
edge_aggregations (Sequence[PathAggregation]) – list of PathAggregation specs (per-edge).
node_aggregations (Sequence[NodeAggregation]) – list of NodeAggregation specs (per-node). At least one of the two must be non-empty. Names must be unique across both lists.

Returns:

costs: TieredODPairs of routing costs (sum of weight along the realised path). Same shape and conventions as tiered_path_costs. Unreachable / masked-out / beyond-cutoff destinations are np.inf.
aggregations_by_name: dict[name -> TieredODPairs]. One entry per spec (edge + node), keyed by spec name. Unreachable / masked-out / beyond-cutoff destinations are np.nan (not inf, since aggregations may be signed or already use inf semantics).

Return type:

(costs, aggregations_by_name)

For OSMnx-style MultiDiGraphs with multiple parallel edges between the same (u, v) pair, the edge with the lowest weight is used for both cost computation and attribute extraction (matching the router’s choice).

For self-pairs (origin == destination, path length 0): cost is 0.0, edge aggregations follow each aggregator’s empty-array semantics (‘sum’ → 0.0; ‘mean’/’min’/’max’ → NaN), node aggregations depend on each spec’s include_endpoints setting.

aperta.routing.floor_intrazonal_costs(costs, min_cost)[source]¶

Floor cell-tier costs at min_cost — applied uniformly to every entry.

Routing on a graph returns 0 for the trivial origin-to-origin path. That’s fine for cumulative-opportunity output (cost 0 falls in the smallest bin), but degenerate for decay-based metrics like gravity: exp(-β·0) = 1 puts the maximum possible decay weight on the cell itself, and c^(-β) at c = 0 diverges outright.

The floor is applied uniformly to every cell-tier entry, not just self- pairs. Setting only the self-pair to a non-zero floor would create an inconsistency: a cell would route to itself at, say, 120 s while a different (very close) cell could route at 60 s, implying you can travel further faster than you can travel zero distance. The min-cost interpretation is the physical floor on per-trip cost — no trip can take less than min_cost, regardless of distance — and it handles the intrazonal-cost-0 case as a side effect.

Non-finite entries (np.inf for unreachable destinations, np.nan for missing observations) are passed through unchanged — the floor is applied only to finite costs. Flooring inf would erase reachability information (an unreachable destination would become reachable in min_cost seconds), and flooring nan would silently invent data; both behaviours would be incorrect.

All three tiers are floored uniformly. Earlier revisions floored only cells_to_cells on the argument that the higher tiers can’t contain self-pairs — but that’s only true when all three tiers are populated. When a scenario uses only zones_to_zones (or has a zone→same-zone entry there), the same zero-self-cost degeneracy applies. Flooring uniformly is safer and matches the “physical floor on any trip” interpretation. Tiers that are None pass through unchanged.

Parameters:

costs (_TOD) – TieredODPairs of cost arrays.
min_cost (float | dict | Series) – floor value. Either a scalar float (same floor for every origin), a dict[origin_node -> float] (per-origin floor; origins absent from the dict get no floor and their costs pass through unchanged), or a pd.Series indexed by origin_node (same semantics as the dict form).

Returns:

New TieredODPairs with cost = max(cost, min_cost) applied per origin to finite entries in every tier; non-finite entries (inf, nan) pass through unchanged.

Return type:

_TOD

aperta.routing¶

`aperta.routing`¶