Observed or simulated multi-channel timeseries generally include a sum of different signals that can be hardly distinguished one another, even if their respective origin is fundamentally different. Analysis methods that are able to extract the most coherent modes of variability generally helps to identify signals of interests.

SpanLib currently focuses on the use of linear analysis methods that rely on eign solutions of covariance or correlations matrices

This package provides a F90 library (as a module) containing a minimal collection of subroutines to perform Principal Componant Analysis (PCA), Multi-channel Singular Spectrum Analysis (MSSA), reconstruction of components and phase composites. The package also provide a python module that calls the F90 library and gives the user a set of useful functions to perform analyses.

In its future version, SpanLib will also include others methods, such Singular Value Decomposition or Principal Oscillation Pattern analysis.

1.2. Fundamentals

PCA is also know as Empirical Orthogonal Functions (EOFs) decomposition: it decomposes a space-time signal in pairs of spatial EOFs and temporal Principal Components (PCs) that are the eigen solutions of the covariance (or correlation) matrix of the initial signal. The first EOFs represent the dominant, pure spatial patterns of variability, and their associated PCs are the coefficients that regulate these patterns.

Note

In this document, "space" refers to the more general notion of "channel", in opposition to "time". In climate studies, the channel dimension generally coincides with space.

SSA (Singular Spectrum Analysis) is mathematically very similar to PCA: there is now only one channel as an input dataset, and eigenmodes are computed on the lag-covariance matrix (instead of on the cross -between channels- covriance matrix). The EOFs have only a temporal dimension. Therefore, SSA is intended to provides information on purely temporal signal, like a classical Fourier decomposition. However, SSA has many advantages on the latter method:

It removes incoherent noise (white noise): the noisy part of the signal takes the form of low order modes, identified as a "background" that can be easily neglected.
It naturally extracts regular oscillations (with a narrow spectral peak). These oscillations are identified as pair of modes whose PCs and EOFs are in phase quadrature, that can be intermittent.
Coherent nonlinear trends are identified as the lower frequency modes.
Compared to others, this method is efficient on short signal.

The maximal lag (the only parameter of SSA) is known as the window.

MSSA is a combination of PCA and SSA: it is an SSA on several channels. The diagonalized is built on covariances between channels (cross) and time segments (lag). Therefore, it has the advantage of PCA for extracting the dominant "spatial" patterns of the variability, and has also the spectral filtering capabilities of SSA. All identified modes have spatio-temporal properties. For example, oscillations are not constrained on a fixed spatial pattern, but can also have a propagative signature over their cycle. This advanced spatial and spectral filtering is helpful to identify the most coherent (and more especially oscillatory) spatio-temporal modes in a short noisy signal.

All these analysis methods act as a linear filter. For each of them, it is possible to reconstruct part of the filtered signal. A reconstructed mode is the "multiplication" of its EOF by its PC, and it has the same dimension of the initial dataset. Such operation is necessary to go back from the EOF space to the physical space.

Finally, PCA may be used also to simply reduce the number of degree-of-freedom (d-o-f) of a dataset. For example, you can keep the first PC that explain a 80% of the variance. These PCs are then used as an input dataset for other analysis. This methodology is useful for MSSA since the eigen problem solving may be very time consuming: we are now able, for example, to potentially reduce the number of channels from several hundred or thounsand, to less than 20.

PCA decomposition is performed on spatio-temporal datasets. If the number of channels becomes important, PCA can use a lot of CPU since the size of the diagonalised matrix if to the square of this number. It is possible to partly avoid this problem when the time dimension is lower than the spatial dimension, using a correlation matrix in time instead of in space. F90 subroutine sl_pca of SpanLib provides the ability to choose which of theses approaches to use for PCA.

Weights

In some case, not all channels have the same weight. For instance, for gridded dataset, weight must be proportional to the grid cell area. Whereas common PCA analysis does not take these weights into account it is possible to give optional weights to sl_pca. Using the python module, it is easy to "attach" weights to a variable for use by pca.

Mask

Similarly, it is not useful to analyse masked points (for example, gridded points situated on land when use analyse oceanic data). The F90 subroutine sl_pca makes the supposition that none of the masked (all channels are analysed). However, as well as for the weights, it is possible to associate an spatial mask to a dataset in order to remove masked points when using the python module. Then, spanlib.pack can be used to "pack" (compress) data before they are analysed.

Analysing several variables at the same time

One can be interested in analysing several variables ate the same time. These variables may come from different regions, datasets and may be even of completely different nature. The essential problem of units may be solved using simple normalisations. Python function spanlib.stackData can be used to "pack" (compress) data before they are analysed. Then, using spanlib.unStackData you can unpack results from you analysis. Raynaud et al (2006) presents an example of use where variables such as sea surface temperature, wind stress modulus and air-sea CO2 fluxes are analysed at the same time: the simultaneous variability of the variables is filtered and the dominant oscillations are extracted for each of these variables.

Reconstructions

Reconstructions (F90:sl_pcarec, Python:<SpAn_object>.reconstruct) may not be necessary the multiplication of an EOF by its associated PC. When PCA is used for a reduction of d-o-f (see Section 1.2, “Fundamentals”), orginal PCs are first filtered and then converted back to the original space using saved EOFs.

1.3.2. ...about MSSA

The window parameter

This is the only and essential parameter of SSA and MSSA (F90:sl_mssa, Python:<SpAn_object>.mssa). It defines the maximal value of the lags use when building the covariance matrix. It acts as a spectral parameter: the spectral resolution is higher for periods lower than this period. A standard value is one third of the time dimension.