By Mats Ehrnström

NTNU Norwegian University of Science and Technology
Can we rediscover a classical PDE from crude experimental data — just video recordings of solitons in a small wave flume — without assuming the equation in advance? In a recent collaboration we show that the answer is yes: two fundamentally different discovery methods recover the same governing evolution law from the same benchtop data, and the discovered equation predicts new experimental cases withheld from the learning process.
This post can be read alongside the DIScover overview published earlier. In the first post we set the scientific agenda (dynamical discovery under uncertainty), and in this post we give the soliton project as a concrete micro case: how to learn an interpretable PDE from noisy video data and validate it by forward prediction. (For the scientific collaboration, see https://www.ntnu.edu/imf/discover.)
Equation discovery resides in a productive middle ground between first‑principles modelling and black‑box prediction. The goal is parsimonious, interpretable evolution laws learned from data — equations that can be analysed, simulated, and understood. Shallow‑water waves are an unusually good testbed: they contain visible dynamics, classical theory for comparison, and experiments that can be performed with simple, everyday equipment.
A short background: why KdV is a standard
The Korteweg–de Vries (KdV) equation and its relatives describe a balance between nonlinear steepening and dispersion. In their simplest nondimensional forms, they are among the most transparent nonlinear PDEs in physics. You can write them down, understand the terms, and test predictions. That makes them ideal as rediscovery test beds: when a method can recover a transparent model in the KdV regime, it stands a good chance where governing laws are unknown, too.
The experiment: “GoPro physics” under realistic conditions
The setup was intentionally simple designed: soliton‑like waves were generated by a hand‑held piston in shallow water and recorded by video. From each frame, the free surface was extracted via edge detection and smoothing, producing spatio‑temporal fields η(x,t). This tests discovery methods under realistic imperfections: noise, imperfect boundaries, limited calibration, and trial‑to‑trial variability.

What we try to learn from the data
We search for an evolution law of the form , where
is built from spatial operators acting on
and nonlinear combinations such as
. The modelling constraint is interpretability. The right‑hand side should be readable as transport, nonlinearity and dispersion, not as an opaque black box. We then assess candidate equations both by regression fit and by forward prediction on cases withheld from discovery.
Two discovery routes, chosen to fail differently
We apply two complementary routes because one is new, and agreement across methods is a strong indicator that the discovered model is real rather than artefactual.
WSINDy (weak‑form sparse regression): Rather than differentiating noisy data directly, a weak formulation shifts derivatives onto compactly supported test functions (integration by parts). A sparsity‑promoting regression then selects a parsimonious PDE from a candidate library. The ensemble over many random subdomains provides stability diagnostics and inclusion probabilities.
Fourier‑multiplier discovery: This is the new method. Assume where
and
are (possibly nonlocal) Fourier multiplier operators, linear and quadratic, respectively. We estimate their symbols in frequency space by least squares across the dataset. In this case, for comparison with SINDy, we approximate the symbols by low‑order polynomials to obtain an explicit PDE.

What the data say
The outcome is striking in its convergence: both methods recover the same KdV‑type evolution law from the data, with the same term structure and almost identical coefficients. This clearly indicates that the equation is inherent in the data rather than an artefact of one algorithm.
We did not hard‑code the exact KdV coefficients, but looked for general up-to-7th-order equations with quadratic terms of lower order. If wanted, additional terms with very small coefficients may be detectable. In our dataset, a persistent fifth‑order dispersive contribution appears alongside the classic KdV balance, also called the 5th-order KdV equation, and in a fact a next-order approximation to the Euler equations. This suggests that too idealized KdV-theory risks missing systematic effects in the experimental regime.
Results at a glance (interpretable terms)
A useful feature of our formulation is that each term has an immediate modelling interpretation, and its coefficient can be stress‑tested: if you perturb the coefficient within its uncertainty range, how quickly do predictions drift? That is a kind of auditability that can be used when the same workflow is later applied to more complex flows or to biological data.
Validation: forward prediction on withheld solitons
A discovered model also earns its right by predicting data it has not seen. We therefore solve the recovered PDEs forward in time and compare against solitons withheld from discovery. Errors are tracked instantaneously and cumulatively. The dominant discrepancies are systematic and interpretable (phase speed and dispersion mismatch), and the overall errors over 30 seconds are small, on the order of 5%.

What we learned about methods
The discovery routes have complementary strengths. WSINDy offers a statistical view: inclusion probabilities and coefficient variability reveal how stable the discovered terms are under changes in sampling and hyperparameters. The Fourier‑multiplier route is computationally light once the spectra are computed, and it can capture operator structure without committing to a fixed derivative library. In practice, the combination might be valuable as a cross-validation. A recurring pain point in equation discovery is that the discovery step is cheap but validation is expensive: you only know whether a model is useful after you solve it forward, often many times, and compare to withheld data. This makes simplicity a virtue.
Why this matters beyond water waves
This case study is mathematically modest, but considers realistic, experimental data with a new method. It captures a scalable workflow: (i) extracting usable fields from messy data, (ii) learning an interpretable evolution law, (iii) validating by forward prediction on withheld cases, and (iv) use systematic mismatch to refine the model and quantify uncertainty. The same logic applies in more difficult settings — video‑based surface flows, turbulent regimes, or neural population dynamics.
References
[1] K. S. Heinrich, D. S. Seth, M. Ehrnström & S. Å. Ellingsen, ‘Rediscovering shallow water equations from experimental data’, arXiv:2511.05486 (2025).
[2] S. L. Brunton, J. L. Proctor & J. N. Kutz, ‘Discovering governing equations from data (SINDy)’, PNAS 113 (2016).
[3] D. A. Messenger & D. M. Bortz, weak-form SINDy / WSINDy (see Journal of Computational Physics 443 (2021) and Multiscale Modeling & Simulation 19 (2021)).
[4] S. L. Brunton & J. N. Kutz, Data‑Driven Science and Engineering (Cambridge University Press, 2022).
[5] D. J. Korteweg & G. de Vries, ‘On the change of form of long waves advancing in a rectangular canal’, Philosophical Magazine 39 (1895).
[6] J. Canny, ‘A computational approach to edge detection’, IEEE TPAMI 8 (1986).
[7] E. Hairer & G. Wanner, Solving Ordinary Differential Equations II (Springer, 1996).
[8] S. L. Brunton & J. N. Kutz, ‘Promising directions of machine learning for partial differential equations’, Nature Computational Science 4 (2024).
