Before: Working memory project Homepage Next: Mesoscopic traveling wave

SImulation & Detection of Cortical Traveling Waves

I explored the interesting topic of cortical traveling waves (TWs) during summer 2020 with the guidance of Prof. Sven Bestmann. I acquired a lot of new skills and gained new insights into the essence of oscillatory signals. Below is my first experiment about the detection of spherical TWs at a macroscopic scale.

Update: For my second (and more formal) experiment about planar TWs at mesoscopic scale, click here.

Simulation of Cortical Traveling Waves

In this project, I tried to simulate an electrophysiological recording dataset (say, EEG) which contains cortical travelling waves. I spent most of my time on two questions:

  1. What exactly is the "travelling wave" in non-periodic data, mathematically?
  2. How to make the synthetic data consecutive accross the channels (just like real data) under the requirement of certain TW directions?

Since the first problem might be too simple for experts, I'll put it in the discussion section and work on the second problem first.

In order to make the synthetic data more similar to real EEG recordings, we decided to construct the signal in frequency domain, with a "reference" power spectrum (computed from a long resting EEG dataset) and a phase spectrum that we can manipulate.

Initially, we decided to simulate a wave that travels on a line, from center (Cz) to front/back/left/right (Fpz/Oz/T7/T8). Therefore, we:

Here the constant phase lag is linear to the distance between the channel and Cz, as well as the central frequency in each band (namely, the waves travel at a fixed speed - about 5m/s according to the literature).

The codes can be downloaded here.

The simulation above is based on the assumption that the wave carries specific information that must be conveyed to a certain area. But actually the two-dimensional spherical travelling wave is more common in electrophysiological recording:

1D Travelling Wave2D Spherical Travelling Wave

Therefore, we created another dataset with spherical TWs in the same way as we described before, except that the constant phase lag became , where k is the (normalized) spatial frequency, x is any channel and s is the "source" of the wave. We randomly picked several channels and the source changed among them in every data segments (2s). We also perturbed the phase lag by a (very small) Brownian-motion-based random phase fluctuation.

The codes can be downloaded here.

Here is what we got:

The voltage distribution is mostly consecutive. Here is the phase distribution:

Finally, let's compare the voltage sequence and oscillatory power of our data and the reference data:

This is a non-TW channel at the corner, and the simulated data looks quite acceptable. Note that we only add up the delta/theta/alpha/beta band (from 1-25Hz) so the high frequency band is weak.

Comparison of Different Algorithms

After simulating an EEG dataset with traveling wave, we analyzed this dataset with different algorithms. We implemented the methods in (Alexander et. al., Plos One, 2016) and (Alexander et. al., Plos Comput. Biol., 2019). For each frequency-of-interest, the first one uses k-means algorithm to cluster the phase evolution within every cycle (i.e. the input vector length is ), while the other uses PCA to find the basis vector of the phase distribution (the input matrix is ). We also tried using k-means instead of PCA. In the following section, we will compare their output with the ground truth.

K-means on Data Cycles

Following (Alexander et. al., Plos One, 2016), we want to cluster the individual data cycles so that each cluster represents an unique TW with a certain source s.

For each frequency (from 1 to 32, log-spaced) and each cycle:

Then:

First we examined the result in time domain, channels (1-81) aligned by the estimated TW's initial phases in ascending order:

We can see that:

We also tried what would happen if the temporal frequency is a free parameter, but not surprisingly the result was bad since it overfit the averaged pattern. Anyway, in the time domain, the TWs were acceptable. But in the space domain, the result was problematic. Here are the intial phase distributions for single cycle, averaged pattern and estimation:

(Note: the ground truth is that there were two possible sources, (3, 2) as in cycle 312 and (6, 9) as in cycle 118; and the speed of the TW was 7ms/chn; same below)

Due to the clustering error, the phase pattern didn't resemble a 6.06Hz wave in time, so there was a large overall phase difference between the pattern and underlying TW at every time point (e.g., the mean value of the estmated TW in the third row was completely different from that of the patterns in the second row). So it's quite hard to correctly estimate the source and spatial frequency of the TW.

Fortunately, we can see that the averaged patterns are basically consistent with the individual cycles, so we may be able to find the correct TW by another linear regression after subtracting mean(m, 1) from m. But if so, why not just perform k-means on data samples rather than the whole cycle? Therefore, we tried two different methods to find the structure from data samples directly.

PCA

As is described in (Alexander et. al., Plos Comput. Biol.), this method is quite straightforward. We organized the phases (unlike the original paper, we used the real-value phases rather than complex exponentials) into an matrix, then calculated its basis vectors by Principle Component Analysis (PCA). It's hypothesized that these basis vectors can represent the TW pattern. In our experiments, we explored the influence of factor rotation and systematic phase lag removal.

A very difficult problem is how to scale the principle components. We tried to multiply each component by the data samples' highest / 5% highest / mean projection on them, and it seems that the second one approximated the real spatial frequency best. Besides, this time we constrained the spatial frequency to be positive in the TW model during linear regression.

Here's the result for theta band (ground truth is still (3, 2) and (6, 9) and 7ms/chn):

 Before RotationVariMax Rotation
With Offset
Remove Offset

For alpha band:

 Before RotationVariMax Rotation
With Offset
Remove Offset

For beta band:

 Before RotationVariMax Rotation
With Offset
Remove Offset

Generally speaking, it seems that:

However, when we tried using three possible sources, PCA could hardly find the third source. Here's the result without removing phase offset (ground truth is that the sources were (2, 9), (3, 2) and (8, 6), and speed of TW was 7ms/chn):

 Before RotationVariMax Rotation
Theta
Alpha
Beta

Besides, removing phase offset didn't help, e.g for alpha band:

Before RotationVariMax Rotation

K-means over Data Samples

Actually this method is very similar to the so-called EEG microstate clustering. The algorithm is roughly the same as the previous one, except that principle components are replaced by cluster centers. One advantage of this method is that we didn't need to scale the component coefficients.

 With OffsetWithout Offset
Theta
Alpha
Beta

Both were excellent, and it seems that removing the phase offset would help exclude the fake clusters.

Increasing the number of possible sources:

 With OffsetWithout Offset
Theta
Alpha
Beta

We can see that k-means could not find the third source (8, 6) either. But after removing the phase lag, the result seems good. However, removing the phase lag will increase the algorithm's sensitivity, so it may generate more clusters than necessary (e.g. for alpha band).

Discussion

In this part we try to analyze the problem about TWs in a more "mathematic" way, and try to explain why and why not the algorithms performed well.

the Essence of Travelling Waves

This is what I got in mind about "travelling waves" before I began to really simulate one:

Namely, the activity at each position is a sine wave, and the phase decreases linearly with the distance between the source and the current position: . Therefore, if I want to simulate a TW from Cz to Fpz, I just need to add a phase lag like (FCz, -5°), (Fz, -10°), (AFz, -15°), (Fpz, -20°).

However, it becomes less clear when the signal is not a simple sine wave. For example, for a theta-band oscillation, what does it means to describe its "phase" and add "phase lag" to it?

It first reminded me of a famous connectivity criterion called Phase-Lag-Index (PLI), so I reviewed the article and found that the "instantaneous" phase of a narrow-band real-value signal is defined by its analytic representation . So I googled for concepts like instantaneous phase, analytic representation, Hibert transform and so on. It finally became clear to me that:

(Here are some of my notes)

But still I didn't understand the relationship between the instantaneous phase and the phases of the frequency components. So I tried to derive what will happen if I add a small lag to the instantaneous phase and extract the real part of the new "analytic" signal as :

And we know that if we want to represent a signal as the summation of a series of cosine waves: , then . Therefore, the phase of every cosine components in our is shifted backward for comparing with s(t), which is consistent with our intuitive understanding about "phase lag". Besides, is indeed the analytic representation of .

Therefore, if we want to determine whether there is a consistent phase lag between A and B, we just need to band-pass-filter the signals, compute the analytic representation and extract the instantaneous phase sequence. Or we can compute the derivative of the sequence and see whether they have similar instantaneous angular frequencies (which indicates a consistent phase difference).

the Geometric Way to Look at TWs

For a given frequency or frequency band, what we try to find are unique TW patterns, each with a different source s (for planar wave, different orientation). So a "perfect" TW phase pattern should be something like , and only will change in every data cycle. So the data distribution in (or ) space is roughly one-dimensional: . Therefore, different clusters' distribution is roughly parallel.

K-means

If has a large variance (relative to the differnce between phase patterns), k-means may not be suitable for this kind of clustering problem:

rng('default');
C1 = [0 1] + rand(300, 1) * [2 2] + [randn(300, 1) randn(300, 1)] * 0.1;
C2 = [1 0] + rand(300, 1) * [2 2] + [randn(300, 1) randn(300, 1)] * 0.1;
C = [C1; C2];
C = C(randperm(length(C)), :);
idx = kmeans(C, 2);
gscatter(C(:, 1), C(:, 2), idx);
title('Problem of k-means');

This is consistent with our result in the first and third experiment:

With Phase OffsetRemove Phase Offset

It's clear that with the common phase offset, the algorithm performed worse and the TW structure was distorted.

PCA

Actually I don't think the principle components can represent the underlying TW structure, because PCs are the orientations at which the data has a largest variance. If we wish a PC to be something like "distance vector between every channel and the source s", then the variance explained by this PC should actually represents a series of TWs with the same source but different spatial frequency. This is not consistent with our data structure, where most of the variance should lie in [1, 1, 1, ... 1] and something like TW1 - TW2, TW3 - TW2 or so:

rng('default');
C1 = [0 1] + rand(300, 1) * [2 2] + [randn(300, 1) randn(300, 1)] * 0.1;
C2 = [1 0] + rand(300, 1) * [2 2] + [randn(300, 1) randn(300, 1)] * 0.1;
C = [C1; C2];
idx = randperm(length(C));
C = C(idx, :);
x = pca(C);
gscatter(C(:, 1), C(:, 2), 1 + (idx > 300))
hold on;
X1 = mean(C) - x';
X2 = mean(C) + x';
for i = 1:2
    plot([X1(i,1) X2(i,1)], [X1(i,2), X2(i,2)])
end
legend({'Cluster 1', 'Cluster 2', 'Component 1', 'Component 2'})
title('PCA');
axis equal

When there are few clusters, maybe we can scale the PCs so that they luckily look like the real TWs (especially when we scale them by the maximum dot product between PC and data sample - as it is in our experiment). But when there are more clusters, it will be hard to find out the TWs in this way. Afrer all, PCA is not a clustering algoithm.

Future Work

First, we need to try more algorithms. For example, actually we can simply average all samples within each cycle and approximate each of these patterns by a TW, then merge the similar TWs by clustering. We also need to try other methods from the literature.

Second, we need to increase the complexity of TW structures and the level of noise. For example, the TWs may have smaller scope, more possible sources, or even overlap with each other. And we will reduce the SNR by pink noise.

Third, we may use some statistics to quantify the performance of different algorithms, so as to find out the optimal one.

Finally, we may apply these methods to experimental data.

For my second (and more formal) experiment about planar TWs at mesoscopic scale, click here.

 


Blogs Back to top Homepage