The problem with straight-line distance
The drive time ML model was trained on OSRM-derived road times for ~1,800 airport pairs. When a requested route is not in that set, the predictor needs to estimate the road distance somehow. A naive approach — using great-circle (haversine) distance as a proxy for road distance — works acceptably on flat terrain but fails badly in mountainous regions. Roads through the Appalachians, Rockies, or Ozarks wind significantly more than the crow flies. On a route like Roanoke VA → Pittsburgh PA (219 mi straight-line), the actual road distance is closer to 290 mi with a drive time of ~420 minutes — a haversine-based model would predict roughly 250 minutes, a 40% underestimate.Fallback chain
For any route not found in the training route_stats,DriveTimePredictor follows this chain:
predict() and estimate().
Routes in route_stats (those seen during training) always use the stored road distance and the ML ensemble — OSRM is not queried.
OSRM public endpoint
The predictor uses the public OSRM API atrouting.openstreetmap.de. No API key is required. Responses are cached in memory for the lifetime of the TransportPredictor instance, so repeated calls for the same pair incur only one network request.
West Appalachia accuracy
After this fix, drive time accuracy across West Appalachian routes:| Route | Straight-line | OSRM truth | Before fix | After fix |
|---|---|---|---|---|
| ROA → PIT | 219 mi | 423 min | 248 min (−41%) | 423 min (0%) |
| TRI → CLT | 120 mi | 233 min | 135 min (−42%) | 233 min (0%) |
| LEX → CLT | 281 mi | 479 min | 334 min (−30%) | 479 min (0%) |
| CRW → PIT | 163 mi | 270 min | 273 min (+1%) | 273 min (+1%) |
Air fare — sparse route handling
A related fix applies toAirFarePredictor. Routes with fewer than 10 DB1B tickets (too sparse to pass training filters) fall back to generic market defaults. Previously the default used a hardcoded distance = 1500 miles, which inflated fare estimates for short regional routes.
The fix resolves the actual great-circle distance between origin and destination and uses that instead: