A hybrid deep-learning framework that pairs 1D convolutional feature extraction with bidirectional recurrent sequence modeling, and decomposes SHAP attributions jointly across seasonal and diurnal axes — turning a black-box AQI forecaster into a policy-grade tool. Evaluated on six years of CPCB hourly data across Delhi, Mumbai, Kolkata, and Chennai.
Industrial air pollution in Indian metropolitan areas remains a serious public health problem, and AQI readings cross hazardous levels in cities like Delhi almost every winter. For policy interventions and citizen advisories to actually work, the underlying forecasts have to be both accurate and interpretable. This paper proposes a CNN-BiLSTM model paired with SHAP GradientExplainer, and decomposes the explainability output across season and hour-of-day to surface actionable intervention windows.
1D convolution extracts local temporal motifs; stacked bidirectional LSTM models forward accumulation and backward dissipation. Closed by a small FC head.
SHAP GradientExplainer gives per-timestep, per-feature attribution. Aggregated globally, then stratified seasonally and diurnally.
Strong on Delhi and Mumbai; clearly weaker on Chennai because the current six-feature input cannot capture sea-breeze meteorology. We say so explicitly.
Three sequential stages, then a separate explainability pass.
We compare against ARIMA, SVR, Random Forest, LSTM, GRU, and CNN-LSTM. All metrics on the inverse-scaled AQI; deep models share a 15% chronological test split.
Walk-forward ARIMA shown for completeness; not directly comparable to chronological deep-learning evaluation.
| Model | MAE | RMSE | MAPE (%) | R² |
|---|---|---|---|---|
| ARIMA (2,1,2)† | 2.27 | 7.25 | 2.08 | 0.9336 |
| SVR (RBF) | 13.71 | 18.72 | 7.80 | 0.9770 |
| Random Forest | 13.74 | 17.80 | 9.52 | 0.9792 |
| LSTM | 14.41 | 18.27 | 9.29 | 0.9780 |
| GRU | 14.72 | 18.53 | 9.28 | 0.9774 |
| CNN-LSTM | 10.89 | 14.19 | 6.71 | 0.9868 |
| CNN-BiLSTM (Ours) | 9.74 | 13.83 | 5.99 | 0.9874 |
† ARIMA evaluated via walk-forward one-step-ahead validation on a 30-day subset, feeding the most recent observed value before each prediction.
Chennai's R² of 0.68 is not a failure to hide. The current six-feature input doesn't capture sea-breeze meteorology, which is exactly what coastal Tamil Nadu air quality depends on. Adding wind direction and a sea-breeze index is a natural next step.
Climate diversity sharpens the model's limits.
Mumbai's low absolute error reflects its lower AQI variance.
| City | MAE | RMSE | MAPE (%) | R² |
|---|---|---|---|---|
| Delhi | 9.74 | 13.83 | 5.99 | 0.9874 |
| Mumbai | 5.13 | 6.28 | 7.89 | 0.9388 |
| Kolkata | 8.87 | 10.86 | 16.40 | 0.8993 |
| Chennai | 15.47 | 21.69 | 18.91 | 0.6808 |
A high-R² forecaster is not the new thing. The new thing is decomposing SHAP attributions jointly across season and hour-of-day, so a policymaker can read off when each pollutant matters most. Three findings stand out.
φ̄ = 0.0033 globally, followed by PM₁₀ at 0.0025. Consistent with PM₂.₅'s heavy weight in the CPCB sub-index formula.
Rainfall preferentially scavenges fine PM₂.₅, leaving the coarse fraction dominant. Counterintuitive at first glance, clean once you know the chemistry.
Both PMs peak sharply at 08:00 IST, exactly when morning traffic is at its worst. NO₂ and O₃ peak between 10:00–14:00 in the photochemical cycle.
Mean |φ| aggregated over the Delhi test set.
Notice the PM₂.₅ → PM₁₀ inversion during monsoon.
Hour-of-day SHAP attribution. Shaded morning rush window highlighted in the paper.
Component-level ablation on the Delhi test set. Removing the CNN block hurts the most — confirming that local convolutional feature extraction is doing the heavy lifting, not the recurrence alone.
Higher bar = component matters more.
| Configuration | MAE | RMSE | R² | ΔRMSE |
|---|---|---|---|---|
| CNN-BiLSTM (Full) | 9.74 | 13.83 | 0.9874 | — |
| w/o Bidirection | 10.85 | 14.00 | 0.9871 | +0.17 |
| w/o BatchNorm | 12.38 | 16.22 | 0.9827 | +2.39 |
| w/o CNN | 13.85 | 18.01 | 0.9787 | +4.18 |
BibTeX entry below. Click to copy.
@inproceedings{nath2026aqi,
title = {An Explainable Deep Learning Architecture for Forecasting
Industrial Atmospheric Pollutants of Indian Metropolitan Cities},
author = {Nath, Akash and Baruah, Pragyat Jyoti and Paul, Arnab and
Nath, Arun Jyoti and Borah, Tirthanka and Debnath, Kamalesh},
booktitle = {Proceedings},
year = {2026}
}