Usage based insurance

Holon Technologies problem for the First Estonian Study Group with Industry



Holon Technologies, one of the companies involved, works in the insurance sector, exploiting telematics and GPS data in order to provide useful indications to insurance companies, which offer usage-based policies. Usage-based insurance has become a topic of interest in last ten years, after the developing and spreading of big data and Internet-of-Things technologies, together with machine learning techniques.

There exist four main types of usage-based models:

  • Pay-As-You-Drive (PAYD); insurance premium is computed dynamically according to the amount driven.
  • Pay-How-You-Drive (PHYD): Similar to PAYD, uses additional sensors.
  • Manage-How-You-Drive (MHYD): Drivers receive periodic report about their driving style.

-Try-Before-You-Buy (TBYB): Insurers offer a free app to potential customers, to analyse their driving style.

IMG_1438 2.jpg

Main questions asked from Holon Technologies to ESGI151 are the following:

  • Which method and techniques could be used to classify the data?
  • How to find a trend for drivers’ behaviour and how to compute the probability of the trend to continue over time?
  • How to define a model, combining different aspects, to determine the probability of damage, the damage frequency and the probable amount of the damage? Therefore, how the premium should be changed?

Three types of input data could be identified which are divided by the types of sources. Raw data are data given as telematics and stored in each vehicle and describes the individual behaviour of each vehicle and its drivers during the lifetime and usage of the vehicle. Accident data are provided by the insurance organisations and describes the characteristics of each single accident (insurance case). The third group of data provided by state and other organisation as public available data like weather data, road conditions, etc. Aggregation and analysis of the data have been performed using Microsoft Office Excel, Jupyter Notebook (Python) and WEKA, an open source software for data mining and machine learning. The database sample was located in Microsoft SQL Server. The structure of the obtained data set is illustrated in the table below.


Resume all the most important techniques and approach to use to tackle the Usage-based Insurance problem, answering to the three original questions asked by Holon Technologies. We introduced a possible method based on Probabilistic Mixture models and Clustering techniques to compute drivers’ risk to be involved in future accidents.

To enrich the currently available information, we strongly recommend retrieving Global Open Street Map data, together with the accidents history, to generate a baseline risk for each country or interested area. Vehicles (and drivers) should be clustered based on their aggregated driving habits; to do this, we suggest to adopt unsupervised learning methods, which should suffice to the purpose of Question 1.


Probabilistic Mixture models can be used to calculate the risk of the driver to occur into an accident, based on driving data. This risk index can be merged with the general risk of the areas the driver usually drives through. Global models should also be updated based on the available data on certain periods of time (e.g., each month). Based on the risk of accident calculated for answering to Questions 2 and 3, we underline that any function can be used by insurance companies to calculate the premium as they prefer, since the output of our method is basically an index that insurers can exploit or customize to their own needs.



%d bloggers like this: