Wrocław. Predicting catastrophe bond coupons: machine learning models with climate-based features

In recent years, natural disasters have been occurring more frequently and with greater intensity, causing economic losses and increasing financial risks. This has encouraged financial markets to look more closely at instruments designed to transfer and manage catastrophe-related risk. Catastrophe (CAT) bonds are one such instrument, enabling insurers to shift a portion of potential losses from extreme weather events to investors, and serving as an alternative to traditional reinsurance.

A CAT bond is typically structured so that an insurer or reinsurer issues the bond through a special purpose vehicle (SPV). Investors receive above-average coupon payments provided that no predefined triggering event – such as a hurricane, earthquake, or flood – occurs during the bond’s term. If such an event does occur, part or all of the principal is transferred to cover the issuer’s losses, and investors bear the corresponding financial loss.

As climate-related disasters continue to grow in frequency and severity, accurately pricing CAT bond coupons has become increasingly important. Machine learning models, capable of capturing nonlinear relationships and identifying patterns in large, complex datasets, offer a promising approach to CAT bond pricing that can account for evolving climate risks.

While previous studies have examined financial, structural, and climate-related determinants of CAT bond spreads, our analysis in [2] extends this literature in two key ways. First, we investigate whether large-scale climate variability indices help explain CAT bond coupons. Second, we evaluate the predictive performance of a broad range of machine learning models and compare their performance to traditional approaches. Overall, our study provides new insights into how climate variability may shape pricing in the CAT bond market and assesses the value added by alternative modeling techniques.

Our dataset comes from the primary market and comprises 734 tranches issued between June 1997 and December 2020. In Figure 1 we present geographical distribution of bond issuance across countries. The dataset covers a wide range of geographic regions. It captures both developed and emerging markets. While the majority of bonds are linked to catastrophe risk exposures in the United States, issuances tied to European and Asian exposures are also present.

Figure 1: Geographical distribution of bond issuance across countries. The map presents the number of tranches issued in the primary market between June 1997 and December 2020.

We examine whether large-scale climate variability indices — such as the Oceanic Niño Index (ONI), Arctic Oscillation (AO), North Atlantic Oscillation (NAO), Outgoing Longwave Radiation (OLR), the Pacific–North American (PNA) pattern, Pacific Decadal Oscillation (PDO), and the Southern Oscillation Index (SOI) — carry explanatory power for CAT bond coupons. In Figure 2 we present the correlation coefficients between CAT bond coupon and lagged values of climate indices. Each index is shifted backward in time from 2 to 18 months to capture potential delayed effects. The ONI exhibits negative correlations with coupon values, with the lowest values between 9 and 14 months lag. This suggests that warmer ENSO phases might be associated with lower CAT bond coupons. The AO shows positive correlations across most lags, peaking slightly at around 3-5 months and again at 14-16 months. The NAO displays positive correlations at shorter lags (around 3-5 months) and again at 12-13 months, with a peak correlation close to 0.12. The OLR stands out as one of the most consistently positively correlated indices, especially at higher lags between 11 and 16 months, where correlations approach or exceed 0.2. Since OLR is a proxy for tropical convection and rainfall, these findings may indicate that sustained tropical activity affects perceived risk in CAT bond markets. The PNA also shows a relatively stable and positive relationship with CAT bond coupons. Correlations range from 0.05 to over 0.2. In contrast, the PDO consistently exhibits negative correlations across all lags, with a particularly strong values at short lags between 2 and 4 months. Finally, the SOI shows a broadly positive correlation values, especially between 5 and 15 months. Correlations peak above 0.2 around month 14.

Figure 2: Correlation between catastrophe bond coupon rates and lagged values of large-scale climate indices. The x-axis shows the number of months each climate index is shifted backward to capture delayed effects, while the y-axis presents the corresponding correlation coefficient.

Overall, the figure indicates that OLR, SOI, and PNA exhibit the strongest and most consistent positive correlations with CAT bond coupons, particularly at lags above 10 months. This suggests that large-scale climate variability, especially those linked to tropical convection and North American atmospheric patterns, may have delayed yet measurable impacts on CAT bond pricing mechanisms. In contrast, the PDO and ONI indices are negatively or weakly correlated with spreads. To assess whether these climate signals translate into genuine pricing information, we proceed to a regression analysis. We compare the explanatory power of two models: a benchmark replicating the specification proposed in [1], and an extended version that incorporates additional climate variability variables.
The benchmark model follows the original approach and includes a comprehensive set of bond-specific and structural variables. These include catastrophe bond fundamentals such as expected loss, size, term, and trigger type, as well as peril and region-specific dummies (Wind, Earthquake, Multiterritory, US, Europe, Japan, and their interactions). It also controls for issuer quality (Swiss Re, investment grade), market conditions (ROL index, BB spread).

The extended model builds upon this structure by incorporating additional variables. In particular, it includes the ROL index change, which represents the annual change in the ROL index. We also include two lagged climate indices: SOI shifted by 15 months and OLR shifted by 12 months. These additions aim to test whether climate-related variables carry additional explanatory power beyond traditional CAT bond characteristics.

We applied linear regression, which allowed us to directly compare the benchmark feature set with our proposed feature set [1]. In the next step, we extend this comparison by applying various machine learning models using both benchmark feature set and our own, to evaluate how different algorithms perform across the two models. We apply 7 models: Random Forest, Bayesian Ridge Regression, Gradient Boosting Regression, Extremely Randomized Trees, Automatic Relevance Determination Regression, Light Gradient Boosting Machine, and Extreme Gradient Boosting.

In Table 1 we present the RMSE on the test set for both benchmark feature set and our proposed features, across all evaluated models. These results correspond to point forecasts, where each model provides a single predicted value of the CAT bond coupon for each observation. The baseline model (OLS) shows slightly lower error with our features (0.0168) compared to benchmark (0.0184). A similar pattern can be observed for all models, indicating that our representation generally improves predictive accuracy. Among tree-based ensembles, the ETR algorithm achieves the lowest overall error, with RMSE of 0.0123 for our features, which is a notable improvement over benchmark 0.0142 (approximately 13.2 %). Other boosting methods such as Gradient Boosting, LGBM, and XGBoost show consistent but smaller gains. Linear Bayesian models (BRR and ARD) also benefit from our features, though the improvement is less pronounced. Overall, the comparison demonstrates that our feature set leads to systematically better results across a range of algorithms, with the largest relative gain observed for ETR.

Table 1. Values of the RMSE calculated for the predictions on the testing set, rounded to five significant figures.

ModelBenchmark modelOur model
OLS0.0184430.016823
RF0.0188440.017563
BRR0.0184470.016813
GBR0.0191550.017756
ETR0.0141610.012294
ARD0.0177090.017085
LGBM0.0190880.018858
XGB0.0184870.018163

Taken together, the evidence shows that climate variability provides additional important information for CAT bond pricing and that machine learning methods can capture this information to improve forecasts. While no single model dominates every metric, the Extremely Randomized Trees seems to perform the best, making it a practical choice for forecasting CAT bond coupons.

[1] Braun, A.: Pricing in the primary market for CAT bonds: new empirical evidence. Journal of Risk and Insurance 83, 811–847 (2016).
[2] Kończal, J., Balcerek, M., Burnecki, K.: Machine learning models for predicting catastrophe bond coupons using climate data. https://arxiv.org/abs/2512.22660

By Julia Kończal (Wrocław University of Science and Technology)The work supported by the NCN grant No. 2022/47/B/HS4/02139.

Leave a Reply