From data mining to railway embankment excavating

Mathematical logic emerged from the need to find out the basics and limits of exact reasoning. Mathematics is by nature unambiguous and consistent and follows the principles of classical logic. However, in the modern scientific community it has been understood that logic – in fact, different logics – has also much to offer outside the studies of the basics of mathematics. The world in which we live is contradictory, multi-stranded, and in many ways incomplete, so we need to develop new logics for it.

At the University of Tampere, we work in co-operation with the University of Economics, Prague in studying GUHA data mining logic and its applications. As the name suggests, GUHA – Generalized Unary Hypothesis Automata – produces hypotheses supported by a provided large data matrix. Thus, the task is not to test known hypotheses, for this purpose there are well-established methods of statistics. Some parts of the GUHA logic is coded into the software called LISpMiner, which is maintained at the University of Economics, Prague. The user must have some insight into the content of the data so that he/she can ask reasonable analytical questions about data. These questions can be encoded into the language of the LISpMiner, and the software will automatically search for the answers to them.

An illustration of the application of GUHA data mining method is our data analysis of the Finnish rail network data. There are over 6,000 kilometers of rail network in Finland, and we have a vast amount of different measurement data over the years about the structures and functional features of it. Data about its structure is available at intervals of up to tens of centimeters. In addition, the track geometry measurements that describe the functional condition of the track at different times have been recorded for many years, and every year from different seasons.

From the point of view of rail track maintenance, it is essential to find out why the track settlements and other geometrical deviations affecting the operation and traffic safety do not occur uniformly on a given section of the track. Indeed, they are unevenly distributed depending on the location, even though the traffic loads are the same. As research data we chose the data of two sections of the track, and asked the analytical question: What are the characteristics of the track structure that cause more than in average track geometry deviations?

We found several answers by LISpMiner, one of which surprised even the track experts, namely frost insulation boards. Recall that in winter in Finland, the soil freezes even at a depth of one meter, which means that the soil’s surface layers can move several centimeters during a year. This must be taken into account in all construction work. In places where the rail structure is equipped with frost insulation boards to prevent frost progress, occurs a faster than average track geometry deterioration. This was unexpected because frost insulation boards are specifically thought to improve the stability of the track geometry by eliminating the damage caused by freezing soil.

By examining the matter more closely by digging railway embankment in selected points where it is known that frost insulation boards are installed (in fact, we concretely moved to test the hypothesis produced by GUHA logic!) it was found that frost insulation boards function as a kind of water retaining abrasive platform for the ballast above. As trains run repeatedly over the point of the frost insulation boards, the ballast crumbles slowly to sand due to internal movement. This sand sinks over time on the surface of the frost plate and so the entire structure above the frost insulation boards falls down. This phenomenon occurs to a lesser extent in areas where there is no frost insulation boards. We are now studying the matter more closely.

In general, data can be extracted, analyzed and modeled using different methods. In statistics, the starting point is to test given hypotheses, and similarly in regression analysis, the content of the data must be clear before analysis. In these methods, findings are often global in the sense that they share some general dependencies and trends in data, but not details. Neural networks, on the other hand, have their own and quite versatile application areas, for example in machine vision. However, they have their limitations, as they operate according to the black box principle. A certain kind of input data produces a certain response, but a non-deterministic neural network does not tell more precisely, why the dependency is as it is.

The data mining based on GUHA logic is descriptive in nature, and the dependencies of the analysis are deterministic. The method is local in the sense that a small subset of data, even tiny in relation to the size of the data, can help in finding rare dependencies that are not found by statistical analysis or by neural networks. Frost insulation boards are a good example.

An important feature of GUHA logic is its profound logical foundations, in particular generalized quantifiers in finite models; such as more than in average quantifier and many others. If in a practical data mining project we encounter a new kind of data issue, we can develop the GUHA logic theory accordingly and code it to LISpMiner software. We have recently been studying paraconsist logic related to conflicting data, and as a result of the study, we added a new quantifier to the LISpMiner software, called Paraconsistent Separation quantifier. Practice feeds theory.

LISpMiner software can be freely down loaded from https://lispminer.vse.cz/.

Esko Turunen, Professor, Tampere University of Technology

Mikko Sauni, PhD student

Foto by Toni Saarikoski. A fragment of frost insulation board and sand crumbled from construction ballast on it.