What It Does
HurricAIne predicts where a hurricane will be 6 hours from now using a CNN trained on atmospheric reanalysis. I built the full pipeline: data acquisition, feature engineering, model training, and deployable inference with quantified error.
Bottom line: ~100–200 km mean absolute error on 6-hour position (best case ~50 km), comparable to interpolated operational guidance. Inference runs in under a second per prediction.
Results at a Glance
- Accuracy: 100–200 km MAE; best predictions ~50 km
- Model size: ~50k–100k parameters, ~5 MB; trains on CPU in 30–60 min
- Pipeline: Saved normalization stats and artifacts for one-step inference; full-storm track prediction with error aggregation
- Baseline: XGBoost on SHIPS (10+ features) as a fast, GPU-free check—CNN wins on accuracy
Data Pipeline
Two main sources, one window:
- ERA5 (ECMWF)—gridded atmospheric fields, monthly NetCDF via CDS API. In-memory cache with explicit clear after each storm to avoid memory exhaustion.
- HURDAT2 (NOAA)—ground-truth storm positions every 6 hours.
- Windowing: 70° × 70° box around each storm; robust handling of longitude (ERA5 0–360° vs. HURDAT −180°/180°) and date-line crossings.
Training: 2020 Atlantic season (31 named storms, ~500 samples). SHIPS and NOAA OISST support the XGBoost baseline and future work.
Features & Input
Each timestep → one 40 × 40 × 17 tensor:
- 5 surface channels: 10 m wind (u, v), sea-level pressure, SST, total column water vapor
- 12 pressure-level channels: wind + relative humidity at 200, 500, 700, 850 hPa
Target: displacement in degrees, next_position - current_position. Channel- and target-normalized; stats saved for inference. Defensive checks: filter zero-dimension slices, enforce storm-ID consistency, replace NaNs over land before resize.
CNN Architecture
Lightweight and attention-aware:
- Separable convolutions—fewer parameters, same spatial features
- Residual block—1×1 shortcut + dual separable convs for better gradient flow
- Channel attention—squeeze-and-excitation; model learns which variables matter (e.g. SST vs. shear)
- Spatial attention—focus on the relevant quadrant of the storm
- Head: GlobalAveragePooling → Dense(128) → Dense(64) → Dense(2) →
[Δlat, Δlon]
Training: Adam, MSE loss, MAE metric, ReduceLROnPlateau, EarlyStopping with best-weights restore.
Inference & Viz
- Load saved normalization stats and build the feature tensor for the current timestep.
- Forward pass → denormalize → add delta to current position.
- Compute error in km (degree-to-distance conversion).
Interactive Plotly map: actual vs. predicted tracks, marker size = storm radius, color = intensity (e.g. VMAX).
Challenges Solved
- Memory: ERA5 ~2–4 GB/month → explicit cache clear after each storm so full seasons run without OOM.
- Coordinates: One longitude convention and date-line roll in the slicer so ERA5 and HURDAT2 line up in all basins.
- Robustness: Zero-dim checks and storm-ID verification before resize; NaN handling over land.
Wrap-up
I built HurricAIne over winter break using real scientific data—hundreds of gigabytes of ERA5 reanalysis and HURDAT2 tracks—because I wanted something ambitious and end-to-end. In testing, the model achieved a mean absolute error of about 112 km on 6-hour position, with some predictions as accurate as 20 km. The architecture’s attention mechanisms learn which regions of the atmosphere around a storm matter most for the prediction, so in a sense the model discovers what matters on its own.
Seeing predicted tracks on a map next to real hurricane paths was the payoff for all the pipeline debugging and long runtimes. What surprised me was how much of the work had nothing to do with the neural network: most of the time went into aligning position records with multi-dimensional weather data, handling coordinate systems (especially near the date line), and structuring everything correctly before the model ever saw it.
Next steps: expanding the dataset across multiple years and trying time-based models (e.g. recurrent layers) to improve accuracy. I’m happy to discuss the technical details or collaborate—reach out if you’d like to talk.