aperta.calibration¶
Iterative calibration of per-edge weights against observed trip-time data.
calibrate_edge_weights fits a linear model relating observed point-to-point trip times to features collected along the routed shortest path plus features at trip endpoints. The same feature set defines both the per-edge weight formula used for routing AND the regression — keeping the two consistent (a subtle pitfall in earlier ad-hoc calibration code).
Model:
- time_trip = α · baseline_time
Σ_m coef_m · (baseline_time · length-weighted-avg of m along path)
Σ_a coef_a · (sum of a along path)
Σ_e coef_e · (endpoint value of e)
constant
where features come in three classes (matching how they enter the per-edge duration formula in examples/swiss/prepare/4_edge_weights.ipynb):
multiplier: scales baseline speed (so it multiplies baseline time per edge — appears in the regression as baseline_time · feature_avg). Examples: local density, traffic flow.
additive_route: adds seconds per unit summed along the path. Examples: intersection counts (sec per intersection), elevation gain (sec per metre).
additive_endpoint: adds seconds based on the value of a node attribute at the origin and at the destination. Examples: snap distance, local density.
Iteration (option A from the design discussion): re-route after each OLS fit, since updated coefficients change edge weights and therefore the chosen path + feature aggregates. Cheap to repeat — usually converges in 2-3 passes.
This module does NOT compute betweenness / traffic flows itself. Treat the traffic estimate as just another per-edge attribute the caller supplies (e.g. via network_processing.get_nested_edge_betweenness). Then include it in multiplier_features (if it scales duration like density) or additive_route_features (if seconds-per-unit).
- class aperta.calibration.CalibrationResult(coefficients, r_squared, n_used, predicted_times, observed_times, routed_distances, rmse, rmse_by_distance, edge_duration_attr, iter_log)[source]¶
Bases:
objectOutcome of calibrate_edge_weights.
- Variables:
coefficients (pandas.DataFrame) – DataFrame indexed by feature name with columns kind (multiplier / additive_route / additive_endpoint / const / baseline), coef (fitted value), p (p-value), mean_effect (coef × mean of column, in seconds).
r_squared (float) – OLS R² on the held-in trips.
n_used (int) – number of ground-truth rows that survived snap + filter + successful routing.
predicted_times (pandas.Series) – per-trip predicted time (Series indexed by trip_id); comparable to ground_truth.loc[predicted_times.index, ‘time_measured’].
observed_times (pandas.Series) – per-trip observed time (same index as predicted).
routed_distances (pandas.Series) – per-trip routed distance (m); useful for distance-band breakdowns.
rmse (float) – overall RMSE on the held-in trips, in seconds.
rmse_by_distance (pandas.Series) – Series of RMSE per distance band, indexed by band label (’< 10 km’, ‘10-25 km’, ‘>= 25 km’).
edge_duration_attr (str) – name of the per-edge attribute written to graph by the final iteration (default ‘duration_calibrated’). Downstream routing can use this as the cost.
iter_log (pandas.DataFrame) – DataFrame, one row per iteration, columns r_squared, rmse, n_used — useful to inspect convergence.
- Parameters:
- aperta.calibration.calibrate_edge_weights(graph, ground_truth, *, baseline_speed_attr='speed_kph', multiplier_features=None, additive_route_features=None, additive_endpoint_features=None, constant=True, n_iterations=3, snap_max_distance=300.0, min_trip_distance=3000.0, max_trip_distance=100000.0, max_dist_to_line_ratio=4.0, edge_duration_attr='duration_calibrated', eligible_node_ids=None, eligible_node_flag=None)[source]¶
Iteratively calibrate per-edge durations against observed trip times.
- See module docstring for the model. Each iteration:
Writes per-edge duration to edge_duration_attr from the current coefficients (or initial guesses on iteration 1).
Routes each ground-truth trip on those weights, aggregating features along the path.
Fits an OLS model of time_measured ~ baseline_time + features + endpoint terms.
Updates coefficients to the OLS fit.
- Parameters:
graph (MultiDiGraph) – routable networkx graph. Must carry length and baseline_speed_attr on every edge, plus every attribute named in the feature dicts.
ground_truth (DataFrame) – DataFrame with columns orig_x, orig_y, dest_x, dest_y, time_measured (seconds). Optional dist_measured enables the dist-ratio filter. Optional dist_line is computed from coords if not provided.
baseline_speed_attr (str) – per-edge speed in km/h (e.g. from osmnx.add_edge_speeds). Not modified by this function.
multiplier_features (dict[str, float] | None) – {edge_attr: initial_coef}. Each scales the baseline duration (new_dur = old_dur · (1 + Σ coef · feat)). Use for density-like features.
additive_route_features (dict[str, float] | None) – {edge_attr: initial_coef}. Each contributes coef · feat_value seconds per edge (summed along path). Use for intersection counts, elevation gain, etc.
additive_endpoint_features (dict[str, float] | None) – {node_attr: initial_coef}. Each adds coef · value_at_origin + coef · value_at_destination to total trip duration. Use for snap distance, local density at endpoints.
constant (bool) – include an intercept in the OLS fit.
n_iterations (int) – number of route-fit cycles. 2-3 usually converges.
snap_max_distance (float) – drop trips where origin or destination is farther than this from any network node (metres).
min_trip_distance (float) – drop trips with dist_line below this (metres).
max_trip_distance (float) – drop trips with dist_line above this (metres).
max_dist_to_line_ratio (float) – if dist_measured is present, drop trips where dist_measured / dist_line exceeds this (long detours are usually data noise).
edge_duration_attr (str) – name of the per-edge duration attribute written on graph (overwritten each iteration).
eligible_node_ids – optional set / list / Index of node IDs to restrict trip-endpoint snap targets to. Forwarded to snap_to_network_nodes. Typically prepared.snap_eligible_nodes from routing_prep.prepare_network — prevents trips from snapping to trapped nodes and contaminating the calibration fit.
eligible_node_flag (str | None) – alternative to eligible_node_ids — name of a per-node bool attribute on graph marking eligible snap targets (e.g., prepared.snap_eligible_flag). Ignored if eligible_node_ids is also given.
- Returns:
CalibrationResult — see its docstring.
- Raises:
ValueError – if any required column is missing or every trip filters out before fitting.
- Return type:
- aperta.calibration.snap_counters_to_edges(counters, graph, *, search_radius=50.0, bearing_tol_deg=20.0, bearing_column='bearing_deg', eligible_edges=None, bidirectional=None)[source]¶
Snap directional traffic counters to the correct network edges.
Counters typically sit next to several parallel candidate edges (opposite directions on the same road; service roads; frontage roads), so naïve nearest-line matching picks the wrong edge most of the time. This function adds a bearing tolerance filter — only edges whose local bearing matches the counter’s bearing_deg (within bearing_tol_deg) are eligible. For directed graphs the bearing comparison is directional (a counter at bearing 90° won’t snap to an edge pointing at 270°), which correctly assigns the two counters of a two-way road to the two directional edges.
Uses d[‘geometry’] from every edge — guaranteed by consolidate_intersections. Edges without a geometry attribute (e.g. raw OSMnx graphs with simplify=True) are silently skipped; consolidate first or call osmnx.graph_to_gdfs(…, fill_edge_geometry=True).
- Parameters:
counters (GeoDataFrame) – GeoDataFrame of point geometries with a bearing_column (degrees, OSM/north-clockwise convention). Same CRS as the graph node coordinates.
graph (MultiDiGraph) – routable nx graph. Edge attributes must include geometry (LineString) and whatever eligible_edges reads.
search_radius (float | Series) – max cartesian distance for candidate edges (CRS units). Pass a scalar for one global radius or a pd.Series aligned to counters.index for per-counter radii (e.g. wider for highway counters which sit further from the carriageway).
bearing_tol_deg (float) – max angular difference between counter bearing and local edge bearing at the snap point.
bearing_column (str) – counter column holding the directional bearing (default ‘bearing_deg’).
eligible_edges (Callable[[Series, GeoDataFrame], GeoDataFrame] | None) – optional (counter_row, candidate_edges_gdf) -> subset callback. Use to restrict matches by class — typically a highway counter only matches highway edges, a local counter only matches local edges. Forwarded to [[geo_mapping.map_points_to_filtered_lines]].
bidirectional (bool | None) – how to compare bearings. True collapses opposite bearings (counter at 90° matches edges at 90° AND 270°) — correct for undirected graphs where one edge represents both directions of a road. False is directional — correct for directed graphs (the default nx.MultiDiGraph). None auto-detects from graph.is_directed().
- Returns:
u, v, k: matched edge ID (or pd.NA if no acceptable match within radius);
snap_dist: cartesian distance counter → edge (or NaN);
dist_along: along-edge distance from edge start to nearest point on edge (or NaN).
- Return type:
DataFrame indexed like counters with columns
Unmatched counters get all-NA rows — drop with result.dropna(subset=[‘u’]).
- aperta.calibration.evaluate_against_counters(modeled, counters, *, observed_column='traffic_cars')[source]¶
Compare modeled per-edge AADT against snapped counter observations.
- Parameters:
modeled (Series) – per-edge modeled AADT, indexed by (u, v, k) tuples (the output of traffic_flows.nested_node_sample + betweenness + AADT scaling).
counters (DataFrame) – DataFrame with u, v, k columns (from snap_counters_to_edges) and an observed-AADT column. Rows with NA in u/v/k are dropped (unmatched counters).
observed_column (str) – name of the observed-AADT column (default ‘traffic_cars’, matching the Swiss counter schema).
- Returns:
r2: Pearson correlation² between modeled and observed — scale-invariant, so use this to pick distribution-shape params (lognormal σ, μ).
slope: slope from a no-intercept regression modeled = slope · observed. Tells you how to rescale absolute volumes — e.g. multiply trips_per_person_per_day by 1 / slope to bring the modeled total in line with counters.
rmse: root-mean-square error on the matched set, in counter-units (veh/day).
n_matched: number of counters used in the comparison.
merged: DataFrame with observed, modeled, (u, v, k) for every matched counter — convenient for scatter plots.
- Return type:
Dict with