Wrocław. Finding structural changes in non-Gaussian cyclo-stationary signals

Our current research with the Weave-Unisono project concentrates on the new pre-processing algorithms for non-Gaussian cyclostationary models segmentation.

In many real-world systems, data exhibit both regular patterns and irregular disturbances. A particularly important class of such processes is cyclostationary signals, whose statistical properties repeat over time. These signals arise naturally in engineering, physics, medicine, and finance. However, classical approaches typically assume Gaussian behavior, which is often violated in practice due to the presence of impulsive events. Many real-world signals exhibit non-Gaussian, impulsive behavior, e.g.: vibration signals in condition monitoring (mechanical systems), abdominal recordings, combining cardiac, respiratory, and gastrointestinal signals, heart rate variability. Figure 1 illustrates the example signals for which we can observe sudden spikes and bursts that clearly deviate from Gaussian assumptions.

Figure 1: Left panel: vibration signal from a crushing machine. Right panel: abdominal sound recording with impulsive events.

Cyclostationary processes

Cyclostationary processes extend stationary models by allowing their statistical properties to vary periodically. Formally, a process is cyclostationary if its mean and autocovariance repeat with a fixed period T. This property reflects the presence of cyclic mechanisms, such as rotating machinery or biological rhythms. A commonly used representation is the Periodic Autoregressive (PAR) model:

X_t = s_1(t) X_{t-1} + Z_t,

where s_1(t) is a periodic function and Z_t denotes background noise. The periodic coefficient captures the deterministic cyclic behavior, while Z_t introduces random noise.

In many classical settings, Z_t is assumed to be Gaussian. However, real-world signals often contain impulsive disturbances, making this assumption unrealistic. To capture non-Gaussian effects, we model the noise component as follows:

Z = M + K \cdot W,

where M is Gaussian noise and W is a binary variable indicating whether an impulse occurs. The key parameters are K that determines amplitude of impulses and p = P(W=1), which controls the frequency of impulses:

  • p=0: purely Gaussian behavior,
  • small p: rare impulses,
  • large p: frequent impulsive disturbances.

This formulation allows for a smooth transition between Gaussian and highly non-Gaussian regimes, reflecting real signal characteristics.

Time-varying impulsiveness

In practical applications, the level of impulsiveness is not constant. Systems evolve over time, leading to changes in the statistical structure of the signal. In our analysis, we assume that the distribution of the background noise varies over time, with p being the time-varying parameter. This parameter controls the occurrence of large impulses. For the sake of simplicity, the remaining parameters \sigma and a are assumed to be constant (\sigma=0.25 and a=2). In Figure 2 we present the exemplary trajectory of Z_t (panel A) and corresponding X_t (panel B) constructed in such a way that for every hundredth observation the parameter p changes, as indicated in the example. In consequence, we have here 7 segments corresponding to different p. In the bottom panel, we demonstrate the corresponding X_t trajectories form PAR(1) model and

s_1(t)= 0.5 + 0.3 \cdot \sin(2 \pi \cdot t / 12).

This figure clearly shows how the signal alternates between regimes of low and high impulsiveness, making the detection of structural changes a non-trivial task.

Figure 2: A) exemplary trajectories of Zt with different segments corresponding to different frequency of impulses p; B) corresponding trajectories of Xt; C) the variance calculated based on many Monte Carlo simulations of Xt process.

What’s new?

The research addresses structural change detection (distributional shifts) in cyclostationary models, motivated by signals transitioning from Gaussian to impulsive noise. Two segmentation approaches are proposed: HMM (two variants) and distribution-distance-based methods (MIDAST).

The change point detection problem

Problem: Detecting structural changes in such signals is difficult due to:

  • temporal dependence (autoregressive structure),
  • periodic behavior,
  • non-Gaussian and evolving distributions.

Classical change point detection methods typically assume independence and Gaussianity, which limits their applicability.

Proposed solution: The application of parametric methods, predicated on prior knowledge of the underlying data-generative cyclostationary model. Alternatively, statistical methods may be employed to analyze noise variability (in the latter case, preliminary data processing is required).

Approach 1: Hidden Markov Models (HMMs)

We assume that the changes of the parameter p are governed by an unobserved two state Markov chain, R_t, i.e.
p=\begin{cases} p_1, & \text{ if } R_t=1,\\ p_2, & \text{ if } R_t=2. \end{cases}
R_t is described by a transition matrix \mathbf{P} containing the probabilities \pi_{ij}=P(R_{t+1} = j\mid R_{t} = i) of switching from regime i at time t to regime j at time t+1. Detection of the change points is based on the Expectation-Maximization (EM) algorithm [1] being a standard approach for the estimation of parameters of HMMs. We consider two version of the procedure:

  • (HMM_X): The PAR model structure is taken into account in the estimation procedure and no initial detection of periodicity is required.
  • (HMM_Z): The second version is designed for the segmentation of the residuals series. In this case, an additional preliminary step is required.

This enables segmentation of the signal into regions with distinct statistical properties. Importantly, this method can be applied directly to the observed data.

Approach 2: Distributional Distances

An alternative approach is based on comparing distances between distributions across segments. It is univariate veersion of MIDAST [2]. If two neighboring segments differ significantly in terms of distributional distances, a change point is detected. Here, we use two versions of this method that utilize the following statistical tests for equality of distributions:

  • (D_{ES}): the distributional distances-based procedure that use two-sample Epps–Singleton test for equality of distributions [3],
  • (D_{CVM}): the distributional distances-based procedure that use two-sample Cramér–von Mises test for equality of distributions [4].

These tests capture different aspects of distributional differences, from global shape to higher-order properties. While effective, this approach typically requires preprocessing to remove periodic dependencies.

Comparison of methods

The performance of different methods can be evaluated using simulation studies. Figure 3 presents the normalized error of change point estimation under various scenarios. In the simulations, we assume a scenario involving a data sequence of length N generated by a PAR(1) model with Gaussian noise for half of the data (where the impulse probability p=0). At the midpoint of the dataset, a structural change is introduced, causing the noise component to exhibit impulses with p > 0.

Figure 3: Normalized MAE for different methods, sample size N and p.

HMM_X provides the highest change point localization accuracy and does not require periodic decomposition, but tends to over-detect changes for small p (see, Figure 4). Methods HMM_Z, D_{ES}, and D_{CVM} require preliminary periodic decomposition, but depending on sample size and p, offer a better trade-off between accuracy and robustness to false positives. The proposed framework enables unsupervised analysis of non-Gaussian cyclostationary processes, supporting better system understanding and applications in autonomous smart infrastructure.

Figure 4: Average number of detected change points (mean) and standard deviations (in parentheses) for different methods, parameter values p, and sample sizes N.

[1] A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm”. Journal of the Royal Statistical Society. Series B (Methodological) 39 (1977), 1–38.
[2] J. Witulska et al. “Identifying the temporal distribution structure in multivariate data for time-series segmentation based on two-sample test”. Information Fusion 125 (2026), 103445.
[3] T. Wepps and K.J. Singleton. “An omnibus test for the two-sample problem using the empirical characteristic function”. Journal of Statistical Computation and Simulation 26 (1986), 177–203.
[4] T. Wanderson. “On the distribution of the two-sample Cramer-von Mises criterion”. Annals of Mathematical Statistics (1962), 1148–1159.

By Agnieszka Wylomańska and Justyna Witulska (Wrocław University of Science and Technology).

Leave a Reply