Table of Contents
Observed or simulated multi-channel timeseries generally include a sum of different signals that can be hardly distinguished one another, even if their respective origin is fundamentally different. Analysis methods that are able to extract the most coherent modes of variability generally helps to identify signals of interests.
SpanLib currently focuses on the use of linear analysis methods that rely on eign solutions of covariance or correlations matrices
This package provides a F90 library (as a module) containing a minimal collection of subroutines to perform Principal Componant Analysis (PCA), Multi-channel Singular Spectrum Analysis (MSSA), reconstruction of components and phase composites. The package also provide a python module that calls the F90 library and gives the user a set of useful functions to perform analyses.
In its future version, SpanLib will also include others methods, such Singular Value Decomposition or Principal Oscillation Pattern analysis.
PCA is also know as Empirical Orthogonal Functions (EOFs) decomposition: it decomposes a space-time signal in pairs of spatial EOFs and temporal Principal Components (PCs) that are the eigen solutions of the covariance (or correlation) matrix of the initial signal. The first EOFs represent the dominant, pure spatial patterns of variability, and their associated PCs are the coefficients that regulate these patterns.
SSA (Singular Spectrum Analysis) is mathematically very similar to PCA: there is now only one channel as an input dataset, and eigenmodes are computed on the lag-covariance matrix (instead of on the cross -between channels- covriance matrix). The EOFs have only a temporal dimension. Therefore, SSA is intended to provides information on purely temporal signal, like a classical Fourier decomposition. However, SSA has many advantages on the latter method:
The maximal lag (the only parameter of SSA) is known as the window.
MSSA is a combination of PCA and SSA: it is an SSA on several channels. The diagonalized is built on covariances between channels (cross) and time segments (lag). Therefore, it has the advantage of PCA for extracting the dominant "spatial" patterns of the variability, and has also the spectral filtering capabilities of SSA. All identified modes have spatio-temporal properties. For example, oscillations are not constrained on a fixed spatial pattern, but can also have a propagative signature over their cycle. This advanced spatial and spectral filtering is helpful to identify the most coherent (and more especially oscillatory) spatio-temporal modes in a short noisy signal.
All these analysis methods act as a linear filter. For each of them, it is possible to reconstruct part of the filtered signal. A reconstructed mode is the "multiplication" of its EOF by its PC, and it has the same dimension of the initial dataset. Such operation is necessary to go back from the EOF space to the physical space.
Finally, PCA may be used also to simply reduce the number of degree-of-freedom (d-o-f) of a dataset. For example, you can keep the first PC that explain a 80% of the variance. These PCs are then used as an input dataset for other analysis. This methodology is useful for MSSA since the eigen problem solving may be very time consuming: we are now able, for example, to potentially reduce the number of channels from several hundred or thounsand, to less than 20.
PCA decomposition is performed on spatio-temporal datasets. If the number of channels becomes important, PCA can use a lot of CPU since the size of the diagonalised matrix if to the square of this number. It is possible to partly avoid this problem when the time dimension is lower than the spatial dimension, using a correlation matrix in time instead of in space. F90 subroutine sl_pca of SpanLib provides the ability to choose which of theses approaches to use for PCA.
In some case, not all channels have the same weight. For instance, for gridded dataset, weight must be proportional to the grid cell area. Whereas common PCA analysis does not take these weights into account it is possible to give optional weights to sl_pca. Using the python module, it is easy to "attach" weights to a variable for use by pca.
Similarly, it is not useful to analyse masked points (for example, gridded points situated on land when use analyse oceanic data). The F90 subroutine sl_pca makes the supposition that none of the masked (all channels are analysed). However, as well as for the weights, it is possible to associate an spatial mask to a dataset in order to remove masked points when using the python module. Then, spanlib.pack can be used to "pack" (compress) data before they are analysed.
One can be interested in analysing several variables ate the same time. These variables may come from different regions, datasets and may be even of completely different nature. The essential problem of units may be solved using simple normalisations. Python function spanlib.stackData can be used to "pack" (compress) data before they are analysed. Then, using spanlib.unStackData you can unpack results from you analysis. Raynaud et al (2006) presents an example of use where variables such as sea surface temperature, wind stress modulus and air-sea CO2 fluxes are analysed at the same time: the simultaneous variability of the variables is filtered and the dominant oscillations are extracted for each of these variables.
Reconstructions (F90:sl_pcarec, Python:<SpAn_object>.reconstruct) may not be necessary the multiplication of an EOF by its associated PC. When PCA is used for a reduction of d-o-f (see Section 1.2, “Fundamentals”), orginal PCs are first filtered and then converted back to the original space using saved EOFs.
This is the only and essential parameter of SSA and MSSA (F90:sl_mssa, Python:<SpAn_object>.mssa). It defines the maximal value of the lags use when building the covariance matrix. It acts as a spectral parameter: the spectral resolution is higher for periods lower than this period. A standard value is one third of the time dimension.
One of the most important interests of MSSA is to be able to extract intermittent space-time oscillations from the signal. At the first order, an oscillation is its "typical" cycle. sl_phasecomp (F90) and spanlib.computePhases (Python) perfom phases composites: it computes an averaged cycle and cut it an homegeneous parts (as one can do for the annual cycle in 12 months).
For more information about PCA and MSSA may be found in papers and on the web. See for example:
You can also browse Section 5, “Links”.
An html version of the source codes is available here:
Syntax is highlighted for both fortran and python codes.