Table of Contents
Observed or simulated multi-channel timeseries generally include a sum of different signals that can be hardly distinguished one another, even if their respective origin is fundamentally different. Analysis methods that are able to extract the most coherent modes of variability generally helps to identify signals of interests.
SpanLib currently focuses on the use of linear analysis methods that rely on eign solutions of covariance or correlations matrices
This package provides a F90 library (as a module) containing a minimal collection of subroutines to perform Principal Componant Analysis (PCA), Multi-channel Singular Spectrum Analysis (MSSA), reconstruction of components and phase composites. The package also provide a python module that calls the F90 library and gives the user a set of useful functions to perform analyses.
In its future version, SpanLib will also include others methods, such Singular Value Decomposition or Principal Oscillation Pattern analysis.
PCA is also know as Empirical Orthogonal Functions (EOFs) decomposition: it decomposes a space-time signal in pairs of spatial EOFs and temporal Principal Components (PCs) that are the eigen solutions of the covariance (or correlation) matrix of the initial signal. The first EOFs represent the dominant, pure spatial patterns of variability, and their associated PCs are the coefficients that regulate these patterns.
SSA (Singular Spectrum Analysis) is mathematically very similar to PCA: there is now only one channel as an input dataset, and eigenmodes are computed on the lag-covariance matrix (instead of on the cross -between channels- covriance matrix). The EOFs have only a temporal dimension. Therefore, SSA is intended to provides information on purely temporal signal, like a classical Fourier decomposition. However, SSA has many advantages on the latter method:
The maximal lag (the only parameter of SSA) is known as the window.
MSSA is a combination of PCA and SSA: it is an SSA on several channels. The diagonalized is built on covariances between channels (cross) and time segments (lag). Therefore, it has the advantage of PCA for extracting the dominant "spatial" patterns of the variability, and has also the spectral filtering capabilities of SSA. All identified modes have spatio-temporal properties. For example, oscillations are not constrained on a fixed spatial pattern, but can also have a propagative signature over their cycle. This advanced spatial and spectral filtering is helpful to identify the most coherent (and more espacially oscillatory) spatio-temporal modes in a short noisy signal.
All these analysis methods act as a linear filter. For each of them, it is possible to reconstruct part of the filtered signal. A reconstructed mode is the "multiplication" of its EOF by its PC, and it has the same dimension of the initial dataset. Such operation is necessary to go back from the EOF space to the physical space.
Finally, PCA may be used also to simply reduce the number of degree-of-freedom (d-o-f) of a dataset. For example, you can keep the first PC that explain a 80% of the variance. These PCs are then used as an input dataset for other analysis. This methodology is useful for MSSA since the eigen problem solving may be very time consuming: we are now able, for example, to potentially reduce the number of channels from several hundred or thounsand, to less than 20.
PCA decomposition is performed on spatio-temporal datasets. If the number of channels becomes important, PCA can use a lot of CPU since the size of the diagonalised matrix if to the square of this number. It is possible to partly avoid this problem when the time dimension is lower than the spatial dimension, using a correlation matrix in time instead of in space. F90 subroutine sl_pca of SpanLib provides the ability to choose which of theses approaches to use for PCA.
In some case, not all channels have the same weight. For instance, for gridded dataset, weight must be proportional to the grid cell area. Whereas common PCA analysis does not take these weights into account it is possible to give optional weights to sl_pca. Using the python module, it is easy to "attach" weights to a variable for use by pca.
Similarly, it is not useful to analyse masked points (for example, gridded points situated on land when use analyse oceanic data). The F90 subroutine sl_pca makes the supposition that none of the masked (all channels are analysed). However, as well as for the weights, it is possible to associate an spatial mask to a dataset in order to remove masked points when using the python module. Then, spanlib.pack can be used to "pack" (compress) data before they are analysed.
One can be interested in analysing several variables ate the same time. These variables may come from different regions, datasets and may be even of completely different nature. The essential problem of units may be solved using simple normalisations. Python function spanlib.stackData can be used to "pack" (compress) data before they are analysed. Then, using spanlib.unStackData you can unpack results from you analysis. Raynaud et al (2006) presents an example of use where variables such as sea surface temperature, wind stress modulus and air-sea CO2 fluxes are analysed at the same time: the simultaneous variability of the variables is filtered and the dominant oscillations are extracted for each of these variables.
Reconstructions (F90:sl_pcarec, Python:<SpAn_object>.reconstruct) may not be necessary the multiplication of an EOF by its associated PC. When PCA is used for a reduction of d-o-f (see Section 1.2, “Fundamentals”), orginal PCs are first filtered and then converted back to the original space using saved EOFs.
This is the only and essential parameter of SSA and MSSA (F90:sl_mssa, Python:<SpAn_object>.mssa). It defines the maximal value of the lags use when building the covariance matrix. It acts as a spectral parameter: the spectral resolution is higher for periods lower than this period. A standard value is one third of the time dimension.
One of the most important interests of MSSA is to be able to extract intermittent space-time oscillations from the signal. At the first order, an oscillation is its "typical" cycle. sl_phasecomp (F90) and spanlib.phases (Python) perfom phases composites: it computes an averaged cycle and cut it an homegeneous parts (as one can do for the annual cycle in 12 months).
For more information about PCA and MSSA may be found in papers and on the web. See for example:
You can also browse Section 5, “Links”.
An html version of the source codes is available here:
You need a F90 compiler to compile it, and BLAS/LAPACK libraries compilated using this F90 compiler to be able to link with library.
To run the F90 example, you need the F90 netcdf library.
You can download the sources of the package from the Sourceforge repositories of the project.
Detailed instructions can be found in the INSTALL file of the package.
First, compile it with:
<user> ./configure <user> make |
Second, install it as root:
<user> su <root> make install |
.
Here is an example of ./configure:
<raynaud> ./configure --with-blas-lib=/usr/local/install/lapack-3.0/lib \ --with-netcdf-lib=/usr/local/install/netcdf-3.6.1/lib \ --with-netcdf-inc=/usr/local/install/netcdf-3.6.1/include \ --prefix=$HOME --with-pythondir=$HOME/python FC=ifort |
The fortran 90 library provides a list of subroutines to perform PCA, MSSA, recontructions and phase composites.
Perform a decomposition of space-time field in a set of Empirical Orthogonal Functions (EOFs) and Principal components (PCs). The input data set can be optionally weighted in space. By default, the analysis computes "temporal" (T) or classical spatial (S) EOFs depending on if the space dimension is greater than the time dimension. This default behavior can be overridden.
Perform a reconstruction using a set of components previously computed with a PCA. All the reconstructed components are summed. A reconstructed component is simply the "product" of an EOF by its PC. The sum of all reconstructed component is the original field.
Perform a decomposition of space-time field in a set of space-time Empirical Orthogonal Functions (EOFs) and time Principal components (PCs), according to a window parameter.
Performs phase composites of S-T oscillatory field. This field is typically a reconstructed pair of MSSA modes. Composites are evaluated according to an index defined by the first PC of the input field and its derivative. Space weights can be optionally used to compute the PC. A minimal normalized amplitude can be also used: when the index is under value, data are not used to compute phases. It is also possible so specify the angle of the first phase in the 360 degrees phase diagram circle: zero means the the first phase conincides with the maximmum.
This is an example of how to use this the Fortran 90 component of Spanlib. In this example, an analysis of Sea surface Temperature anomalies using data stored in the netcdf format. The dominant oscillatory mode of the El Nino variability is then extracted and stored in a netcdf file.
Example 1. F90 example
You can run this example typing (from package directory):
<user> cd example && make |
This command will first try to download the dataset, then compile the following f90 program, run it, and finally try to visualise output data. See Section 2.1, “Requirements” if it fails.
The sources of this example is highlighted below, with comments inside.
! File: example.f90 ! ! This file is part of the SpanLib library. ! Copyright (C) 2006 Stephane Raynaud ! Contact: stephane dot raynaud at gmail dot com ! ! This library is free software; you can redistribute it and/or ! modify it under the terms of the GNU Lesser General Public ! License as published by the Free Software Foundation; either ! version 2.1 of the License, or (at your option) any later version. ! ! This library is distributed in the hope that it will be useful, ! but WITHOUT ANY WARRANTY; without even the implied warranty of ! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ! Lesser General Public License for more details. ! ! You should have received a copy of the GNU Lesser General Public ! License along with this library; if not, write to the Free Software ! Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA program example ! This simple example shows how to use all subroutines from this package. ! Warning: it requires netcdf for in/outputs. ! ! We start from longitude/latitude/time value of Pacific Sea Surface Temperature ! that include the El Nino Southern Oscillation signal. ! Input is the netcdf file data.cdf. ! We remove land points from the initial array according ! to the netcdf missing_value attribute of the analysed variable (data are "packed"). ! A PCA is used to reduce the degrees of freedoom before MSSA analysis. ! Weights for PCA are computed as a fonction of latitude. ! Then, we assume that we have already identified an oscillation (after tests). ! This oscillation, given by a pair of MSSA modes, is then ! reconstructed from the MSSA and PCA spaces. ! Finally, phase composites are computed from this reconstructed oscillation. ! The oscillation is outputed in a netcdf file (pair_1.cdf). ! ! The initial data set (data.cdf): ! - origin: updated Reynolds and Smith (1996) SST (netcdf file) ! - origin url: data selector from http://iridl.ldeo.columbia.edu ! - how to get it [10Mb]: http://stefdeperou.free.fr/pub/data.cdf ! - area of study: tropical pacific [130.5E:75.5W, 29.5S:29.5N] (155x60 grid points) ! - period of study: Jan1982:Dec2005 (288 time steps) ! ! Parameters: ! - Only the first 20 PCs are retainedand given to the MSSA ! - A window of 7 years (84 months) is chosen for the MSSA ! - Phase composites use 8 phases ! - An offset of 0.4 and is used for composites ! - The first phase of composites is set at 180 degrees (minimal value) use spanlib use netcdf implicit none ! Parameters ! ---------- integer,parameter :: nkeep_pca=5, nwindow=84, first_mode=1, nphases=8 real, parameter :: offset=0., first_phase=180., new_missing_value=-999. character(len=20), parameter :: input_nc_file="data2.cdf", output_nc_file="output.nc", & var_name='ssta' ! Other declarations ! ------------------ real, allocatable :: field(:,:,:), weights(:,:), lat(:), lon(:), time(:) real, allocatable :: reco(:,:,:), phasecomps(:,:,:) logical, allocatable :: mask(:,:) real, allocatable :: packed_field(:,:), packed_weights(:), & & packed_phasecomps(:,:), stphasecomps(:,:) real, allocatable :: eof(:,:), ev(:), pc(:,:), stpair(:,:), pair(:,:) real, allocatable :: steof(:,:),stpc(:,:),stev(:) character(len=20) :: dim_names(3), dim_name, lon_units, lat_units, var_units, & & lon_name, lat_name, time_name, time_units integer :: ncid, dimid, dimids(4), varid, dims(3), thisdim, & & lonid, latid, phaseid, timeid, phcoid, recoid, origid integer(kind=4) :: i, nspace, nlon, nlat, ntime real :: pi, missing_value ! Get the initial sst field from the netcdf file ! ---------------------------------------------- print*,'Reading inputs...' call err(nf90_open(input_nc_file, nf90_nowrite, ncid)) call err(nf90_inq_varid(ncid, var_name, varid)) call err(nf90_inquire_variable(ncid, varid, dimids=dimids(1:3))) call err(nf90_inquire_dimension(ncid, dimids(1), & & name=lon_name, len=nlon)) call err(nf90_inquire_dimension(ncid, dimids(2), & & name=lat_name, len=nlat)) call err(nf90_inquire_dimension(ncid, dimids(3), & & name=time_name, len=ntime)) allocate(field(nlon,nlat,ntime)) allocate(mask(nlon,nlat)) allocate(weights(nlon,nlat)) allocate(lon(nlon)) allocate(lat(nlat)) allocate(time(ntime)) call err(nf90_get_var(ncid, varid, field)) call err(nf90_get_att(ncid, varid, 'missing_value', missing_value)) call err(nf90_get_att(ncid, varid, 'units', var_units)) call err(nf90_inq_varid(ncid, lon_name, varid)) call err(nf90_get_var(ncid, varid, lon)) call err(nf90_get_att(ncid, varid, 'units', lon_units)) call err(nf90_inq_varid(ncid, lat_name, varid)) call err(nf90_get_var(ncid, varid, lat)) call err(nf90_get_att(ncid, varid, 'units', lat_units)) call err(nf90_inq_varid(ncid, time_name, varid)) call err(nf90_get_var(ncid, varid, time)) call err(nf90_get_att(ncid, varid, 'units', time_units)) call err(nf90_close(ncid)) ! Format (pack) data to have only one space dimension ! --------------------------------------------------- print*,'Packaging...' ! Compute weights proportional to grid point area pi = cos(-1.) do i=1,nlat weights(:,i) = cos(lat(i)*pi/180.) end do ! Now pack if(isnan(missing_value))then mask = not(isnan(field(:,:,1))) else mask = (field(:,:,1) /= missing_value) end if allocate(packed_field(count(mask), ntime)) do i=1, ntime packed_field(:,i) = pack(field(:,:,i), mask) end do allocate(packed_weights(count(mask))) packed_weights = pack(weights, mask) ! Perform a PCA to reduce the d.o.f ! --------------------------------- print*,'PCA...' call sl_pca(packed_field, nkeep=nkeep_pca, xeof=eof, & & pc=pc, weights=packed_weights) deallocate(packed_field) ! We send results from PCA to MSSA ! -------------------------------- print*,'MSSA...' call sl_mssa(transpose(pc), nwindow, nkeep=first_mode+1, & & steof=steof, stpc=stpc, ev=stev) ! We reconstruct modes [first_mode + first_mode+1] of MSSA ! -------------------------------------------------------- print*,'MSSAREC...' call sl_mssarec(steof, stpc, nwindow, stpair, & & istart=first_mode, iend=first_mode+1) deallocate(steof, stpc) ! We compute phases composites for the reconstructed oscillation ! --------------------------------------------------------------- print*,'PHASECOMP...' call sl_phasecomp(stpair, nphases, stphasecomps, & & weights=packed_weights, & & offset=offset, firstphase=first_phase) ! We go back to the physical space for ! the full oscillation AND its composites ! --------------------------------------- print*,'PCAREC...' call sl_pcarec(eof, transpose(stpair), pair) call sl_pcarec(eof, transpose(stphasecomps), packed_phasecomps) deallocate(stpair, eof, stphasecomps) ! Unpacking ! --------- print*,'Unpacking...' allocate(reco(nlon,nlat,ntime)) do i=1, ntime reco(:,:,i) = unpack(pair(:,i), mask, new_missing_value) where(mask == .false.) field(:,:,i) = new_missing_value end where end do allocate(phasecomps(nlon,nlat,nphases)) do i=1, nphases phasecomps(:,:,i) = unpack(packed_phasecomps(:,i), mask, new_missing_value) end do ! Write out the phase composites of the first oscillation ! ------------------------------------------------------- print*,'Writing out...' ! File call err(nf90_create(output_nc_file, nf90_write, ncid)) ! Dimensions call err(nf90_def_dim(ncid, 'lon', nlon, dimids(1))) call err(nf90_def_dim(ncid, 'lat', nlat, dimids(2))) call err(nf90_def_dim(ncid, 'time', ntime, dimids(3))) call err(nf90_def_dim(ncid, 'phase', nphases, dimids(4))) ! Variables call err(nf90_def_var(ncid, 'lon', nf90_float, dimids(1), lonid)) call err(nf90_put_att(ncid, lonid, 'long_name', 'Longitude')) call err(nf90_put_att(ncid, lonid, 'units', lon_units)) call err(nf90_def_var(ncid, 'lat', nf90_float, dimids(2), latid)) call err(nf90_put_att(ncid, latid, 'long_name', 'Latitude')) call err(nf90_put_att(ncid, latid, 'units', lat_units)) call err(nf90_def_var(ncid, 'time', nf90_float, dimids(3), timeid)) call err(nf90_put_att(ncid, timeid, 'long_name', 'Time')) call err(nf90_put_att(ncid, timeid, 'units', time_units)) call err(nf90_def_var(ncid, 'phase', nf90_float, dimids(4), phaseid)) call err(nf90_put_att(ncid, phaseid, 'long_name', 'Phase')) call err(nf90_put_att(ncid, phaseid, 'units', 'level')) call err(nf90_def_var(ncid, 'orig', nf90_float, dimids(1:3), origid)) call err(nf90_put_att(ncid, origid, 'long_name', 'SST anomaly / original field')) call err(nf90_put_att(ncid, origid, 'units', var_units)) call err(nf90_put_att(ncid, origid, 'missing_value', new_missing_value)) call err(nf90_def_var(ncid, 'reco1', nf90_float, dimids(1:3), recoid)) call err(nf90_put_att(ncid, recoid, 'long_name', 'SST anomaly / reconstruction of first pair')) call err(nf90_put_att(ncid, recoid, 'units', var_units)) call err(nf90_put_att(ncid, recoid, 'missing_value', new_missing_value)) call err(nf90_def_var(ncid, 'pair1', nf90_float, (/dimids(1),dimids(2),dimids(4)/), phcoid)) call err(nf90_put_att(ncid, phcoid, 'long_name', 'SST anomaly / phase composite of first pair')) call err(nf90_put_att(ncid, phcoid, 'units', var_units)) call err(nf90_put_att(ncid, phcoid, 'missing_value', new_missing_value)) ! Values call err(nf90_enddef(ncid)) call err(nf90_put_var(ncid, lonid, lon)) call err(nf90_put_var(ncid, latid, lat)) call err(nf90_put_var(ncid, timeid, time)) call err(nf90_put_var(ncid, phaseid, float((/(i,i=1,nphases)/)))) call err(nf90_put_var(ncid, origid, field)) call err(nf90_put_var(ncid, recoid, reco)) call err(nf90_put_var(ncid, phcoid, phasecomps)) call err(nf90_close(ncid)) end program example subroutine err(jstatus) use netcdf integer :: jstatus if (jstatus .ne. nf90_noerr) then print *, trim(nf90_strerror(jstatus)) stop end if end subroutine err |
The intent of the python module is to offer a scripting interface to the F90 subroutines. It takes advantage of the power of python and of the computational efficiency of fortran.
Thanks to the use of CDAT, the module easily handle gridded dataset, using masks and weights for example. It also allows the analysis of several datasets at the same time by stacking them. The module performs some tasks in a transparant way: packing and unpacking of data, PCA reduction before MSSA anaysis, etc.
This example shows how to use some of the available components of the python module. It uses the same dataset as for Section 3.3, “A Fortran 90 example”. However, is this example, two different areas are analysed at the same time. This is intented to mimics the used of two different datasets that are stacked, before being analysed. Such approach (see for example Raynaud et al (2006)) allows to find modes of variability in a arbitrary number of variables, provided you are careful with units (performing appropriate normalisations).
If you configurations allows it, you can run this example typing this from the installation package directory:
<user> cd example && make python1 |
Example 2. Python example
# File: example1.py # # This file is part of the SpanLib library. # Copyright (C) 2006 Charles Doutiraux, Stephane Raynaud # Contact: stephane dot raynaud at gmail dot com # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ################################################################### # In this example, we analyse two different areas at the same time. # You can do the same with two completely different datasets, # except that they must have the same temporal grid. ################################################################### # Needed modules import cdms import spanlib import MV import vcs # We tell cdms that we have longitude, latitude and time cdms.axis.latitude_aliases.append('Y') cdms.axis.longitude_aliases.append('X') cdms.axis.time_aliases.append('T') # Simply open the netcdf file print "Open file" f=cdms.open('data2.cdf') # Get our two datasets print "Read two different regions" s2=f('ssta',latitude=(-10,10),longitude=(110,180)) s1=f('ssta',latitude=(-15,15),longitude=(210,250)) # Stack the two dataset to have only one dataset print "Stacking data" res = spanlib.stackData(s1,s2) # Create the analysis object print "Creating SpAn object" SP=spanlib.SpAn(MV.array(res[0]),weights=MV.array(res[1])) # Perform a preliminary PCA print "PCA..." eof,pc,ev = SP.pca() # Now perform a MSSA print "MSSA..." res3 = steof,stpc,stev = SP.mssa(pca=True) # Finally recontructed the filtered field ffrec = SP.reconstruct() res4 = spanlib.unStackData(ffrec,res[1],res[2],res[3]) # Plot a timeseries taken from our two # recontructed datasets x=vcs.init() x.plot(res4[1]) raw_input('ok?') x.clear() x.plot(res4[1][:,5,5]) x.plot(res4[0][:,5,5]) raw_input('ok?') |
[3] “Extracting qualitative dynamics from experimental data”. Physica D. Elsevier Science Publishers B. V.. 217–236. 1986.
[4] “Singular Spectrum Analysis in nonlinear dynamics,with application to paleoclimatic time series”. Physica D. 395-424. 1989.
[5] “ Spells of low-frequency oscillations and weather regimes in the northern hemisphere”. J. Atm. Sc.. 210-236. 1994.
[6] “Using MSSA to determine explicitly the oscillatory dynamics of weakly nonlinear climate systems”. Nonlin. Proc. Geophys.. 807–815. 2005. Download.
[7] “Interannual to decadal variability of air-sea CO2 fluxes in the North Atlantic”. Ocean Sci.. 43-60. 2006. Download.