This project builds an end-to-end machine learning pipeline to predict delivery duration using structured SQL-based feature engineering and gradient boosting.
The workflow includes: - SQL data cleaning and transformation using DuckDB - Feature engineering focused on marketplace congestion - Proper time-based train/test split (no leakage) - Linear regression baseline - Elastic Net regularization - XGBoost gradient boosting
XGBoost achieved a ~7% MAE improvement over the linear baseline, confirming the presence of nonlinear congestion effects in delivery dynamics.
026cb92 (Deploy standalone Quarto dashboard (embedded resources))
026cb92 (Deploy standalone Quarto dashboard (embedded resources))
Observations
orders_per_dasher is the dominant predictor, confirming that marketplace congestion (supply-demand imbalance) drives delivery delays more than raw order volume.
estimated_store_to_consumer_driving_duration remains a structural driver, as expected in any last-mile delivery problem.
Time-based features such as order_hour indicate meaningful time-of-day effects.
The presence of store_id suggests consistent store-level preparation time differences, which tree models can capture but may introduce generalization risk.
Overall, importance rankings align strongly with domain intuition, increasing confidence in model validity.
Predicted vs Actual (XGBoost)
Code
library(ggplot2)predictions <-read.csv("outputs/test_predictions.csv")ggplot(predictions, aes(x = actual, y = predicted)) +geom_point(alpha =0.2) +geom_abline(slope =1, intercept =0, color ="red", linewidth =1) +theme_minimal() +labs(title ="Predicted vs Actual Delivery Time (XGBoost)",x ="Actual Delivery Duration (seconds)",y ="Predicted Delivery Duration (seconds)" )
Observations
Predictions cluster closely around the diagonal at lower delivery durations, indicating strong performance for typical orders.
As actual delivery time increases, dispersion widens, suggesting increasing prediction variance for extreme cases.
The model slightly underestimates very long deliveries, likely due to trimmed outliers and limited extreme-case signal.
The spread of points increases as delivery time increases, meaning prediction errors grow for longer deliveries.
Overall, the model captures the central tendency well but remains less precise in the upper tail.