Climate risk modelling, using a new parametrisation of multivariate generalised Pareto distribution

The Department of Probability and Statistics organised a workshop about the hot topic of Climate Risk in November, 2022. There were quite a few interesting talks held by experts from financial institutions, like Morgan Stanley or MSCI. The academic side was represented by Johannes Stroebel (NYU) and András Zempléni of ELTE, who spoke about a new approach to multivariate extreme values. His talk is sketched below.

Here the multivariate peaks-over threshold approach was applied to climate forecasts. The “traditional” models use just those data that exceeds the threshold in all coordinates. However, it is rarely useful in practice (as very few data points are left, especially in higher dimensions). The Rootzen-Tajvidi (2006) model uses all data that exceeds the threshold in at least one coordinate. It has the disadvantage that the marginals are only conditionally GPD (for Xi > ui). There are different representations (Rootzen et al, 2018), which allow for effective estimation and simulation of these models.

We have used the T-parametrisation, where T = E + U − max(U), where E is unit exponential, U is a suitably chosen d-dimensional distribution. Its density can be explicitly computed for some important parametric families for U, like the normal or the Gumbel. This allows for computing the usual maximum likelihood estimator for model fitting.

We illustrate the use of this approach for estimating future climatic extremes. The multivariate approach is relevant as more risky events are characterised by multivariate properties, like

  • a prolonged heat wave,
  • a wet period coupled with extreme daily precipitation etc.

Simulation is an important part of modelling, as the available data usually does not give information on the uncertainty and is not enough to reliably fit the models.

There are quite a few “future climatic events” available. We used the Copernicus data set, which includes several meteorological indices, estimated by different models (Nobakht et al.). The used data set is for 10-day periods and for the years 2011-2040.

For simulating plausible scenarios, we used a GAN (generative adversarial network), which is a machine-learning based method and can be trained to simulating authentic data, originally used for images, but it can also be applied for tabular data with covariates (labels), in our case the year. For extremes further modifications are needed, but the method works, as our results show.

As an example, we present the coverage regions for the fitted distribution for the precipitation sum (RR) and the daily precipitation average for the wet days (SDII), modelled for the grid point of Budapest.

Estimated coverage regions for 2011
Estimated coverage regions for 2040

The difference is visible: total precipitation extremes are higher and the dependence is weaker for 2040.


The Copernicus data base as well as the help of the experts of the 4-Xtra Technologies is greatly appreciated.


  1. Rootzen, H., Tajvidi, N.: Multivariate generalized Pareto distributions. Bernoulli, 12(5), 917–930 (2006)
  2. Rootzén, H. et al: Multivariate peaks over thresholds models. Extremes, 21, 115–145 (2018)
  3. Nobakht, M.,et al: Agroclimatic indicators derived from climate projections, version 1.0. Copernicus Climate Change Service (C3S) Climate Data Store, Accessed on 11-11-2022,!/ dataset/sis-agroclimatic-indicators?tab=form

For more information, contact András Zempléni (

%d bloggers like this: