SpanLib - Spectral Analysis Library

Stephane Raynaud

Charles Doutriaux

Version 1.0

Project hosted by:

This documentation was generated on 29 August 2006

Table of Contents

1. Presentation
1.1. Introduction
1.2. Fundamentals
1.3. Some technical details...
1.4. For more information
1.5. View sources
2. Installation
2.1. Requirements
2.2. Download
2.3. Compilation and installation
3. The Fortran 90 library
3.1. Introduction
3.2. F90 subroutines
3.3. A Fortran 90 example
4. The python module
4.1. Introduction
4.2. Python functions
4.3. A python example
5. Links
Bibliography

1. Presentation

1.1. Introduction

Observed or simulated multi-channel timeseries generally include a sum of different signals that can be hardly distinguished one another, even if their respective origin is fundamentally different. Analysis methods that are able to extract the most coherent modes of variability generally helps to identify signals of interests.

SpanLib currently focuses on the use of linear analysis methods that rely on eign solutions of covariance or correlations matrices

This package provides a F90 library (as a module) containing a minimal collection of subroutines to perform Principal Componant Analysis (PCA), Multi-channel Singular Spectrum Analysis (MSSA), reconstruction of components and phase composites. The package also provide a python module that calls the F90 library and gives the user a set of useful functions to perform analyses.

In its future version, SpanLib will also include others methods, such Singular Value Decomposition or Principal Oscillation Pattern analysis.

1.2. Fundamentals

PCA is also know as Empirical Orthogonal Functions (EOFs) decomposition: it decomposes a space-time signal in pairs of spatial EOFs and temporal Principal Components (PCs) that are the eigen solutions of the covariance (or correlation) matrix of the initial signal. The first EOFs represent the dominant, pure spatial patterns of variability, and their associated PCs are the coefficients that regulate these patterns.

Note

In this document, "space" refers to the more general notion of "channel", in opposition to "time". In climate studies, the channel dimension generally coincides with space.

SSA (Singular Spectrum Analysis) is mathematically very similar to PCA: there is now only one channel as an input dataset, and eigenmodes are computed on the lag-covariance matrix (instead of on the cross -between channels- covriance matrix). The EOFs have only a temporal dimension. Therefore, SSA is intended to provides information on purely temporal signal, like a classical Fourier decomposition. However, SSA has many advantages on the latter method:

  • It removes incoherent noise (white noise): the noisy part of the signal takes the form of low order modes, identified as a "background" that can be easily neglected.
  • It naturally extracts regular oscillations (with a narrow spectral peak). These oscillations are identified as pair of modes whose PCs and EOFs are in phase quadrature, that can be intermittent.
  • Coherent nonlinear trends are identified as the lower frequency modes.
  • Compared to others, this method is efficient on short signal.

The maximal lag (the only parameter of SSA) is known as the window.

MSSA is a combination of PCA and SSA: it is an SSA on several channels. The diagonalized is built on covariances between channels (cross) and time segments (lag). Therefore, it has the advantage of PCA for extracting the dominant "spatial" patterns of the variability, and has also the spectral filtering capabilities of SSA. All identified modes have spatio-temporal properties. For example, oscillations are not constrained on a fixed spatial pattern, but can also have a propagative signature over their cycle. This advanced spatial and spectral filtering is helpful to identify the most coherent (and more espacially oscillatory) spatio-temporal modes in a short noisy signal.

All these analysis methods act as a linear filter. For each of them, it is possible to reconstruct part of the filtered signal. A reconstructed mode is the "multiplication" of its EOF by its PC, and it has the same dimension of the initial dataset. Such operation is necessary to go back from the EOF space to the physical space.

Finally, PCA may be used also to simply reduce the number of degree-of-freedom (d-o-f) of a dataset. For example, you can keep the first PC that explain a 80% of the variance. These PCs are then used as an input dataset for other analysis. This methodology is useful for MSSA since the eigen problem solving may be very time consuming: we are now able, for example, to potentially reduce the number of channels from several hundred or thounsand, to less than 20.

1.3. Some technical details...

1.3.1. ...about PCA

CPU: space versus time

PCA decomposition is performed on spatio-temporal datasets. If the number of channels becomes important, PCA can use a lot of CPU since the size of the diagonalised matrix if to the square of this number. It is possible to partly avoid this problem when the time dimension is lower than the spatial dimension, using a correlation matrix in time instead of in space. F90 subroutine sl_pca of SpanLib provides the ability to choose which of theses approaches to use for PCA.

Weights

In some case, not all channels have the same weight. For instance, for gridded dataset, weight must be proportional to the grid cell area. Whereas common PCA analysis does not take these weights into account it is possible to give optional weights to sl_pca. Using the python module, it is easy to "attach" weights to a variable for use by pca.

Mask

Similarly, it is not useful to analyse masked points (for example, gridded points situated on land when use analyse oceanic data). The F90 subroutine sl_pca makes the supposition that none of the masked (all channels are analysed). However, as well as for the weights, it is possible to associate an spatial mask to a dataset in order to remove masked points when using the python module. Then, spanlib.pack can be used to "pack" (compress) data before they are analysed.

Analysing several variables at the same time

One can be interested in analysing several variables ate the same time. These variables may come from different regions, datasets and may be even of completely different nature. The essential problem of units may be solved using simple normalisations. Python function spanlib.stackData can be used to "pack" (compress) data before they are analysed. Then, using spanlib.unStackData you can unpack results from you analysis. Raynaud et al (2006) presents an example of use where variables such as sea surface temperature, wind stress modulus and air-sea CO2 fluxes are analysed at the same time: the simultaneous variability of the variables is filtered and the dominant oscillations are extracted for each of these variables.

Reconstructions

Reconstructions (F90:sl_pcarec, Python:<SpAn_object>.reconstruct) may not be necessary the multiplication of an EOF by its associated PC. When PCA is used for a reduction of d-o-f (see Section 1.2, “Fundamentals”), orginal PCs are first filtered and then converted back to the original space using saved EOFs.

1.3.2. ...about MSSA

The window parameter

This is the only and essential parameter of SSA and MSSA (F90:sl_mssa, Python:<SpAn_object>.mssa). It defines the maximal value of the lags use when building the covariance matrix. It acts as a spectral parameter: the spectral resolution is higher for periods lower than this period. A standard value is one third of the time dimension.

Phase composites

One of the most important interests of MSSA is to be able to extract intermittent space-time oscillations from the signal. At the first order, an oscillation is its "typical" cycle. sl_phasecomp (F90) and spanlib.phases (Python) perfom phases composites: it computes an averaged cycle and cut it an homegeneous parts (as one can do for the annual cycle in 12 months).

1.4. For more information

For more information about PCA and MSSA may be found in papers and on the web. See for example:

You can also browse Section 5, “Links”.

1.5. View sources

An html version of the source codes is available here:

2. Installation

2.1. Requirements

2.1.1. Fortran library

You need a F90 compiler to compile it, and BLAS/LAPACK libraries compilated using this F90 compiler to be able to link with library.

To run the F90 example, you need the F90 netcdf library.

2.1.2. Python module

You need the fortran library, and the pyfort and Numeric modules from CDAT.

To run python example, you needs the cdms and vcs modules (delivered with CDAT).

2.2. Download

You can download the sources of the package from the Sourceforge repositories of the project.

2.3. Compilation and installation

Detailed instructions can be found in the INSTALL file of the package.

  1. First, compile it with:

    <user> ./configure
    <user> make
    

  2. Second, install it as root:

    <user> su
    <root> make install
    

    .

Here is an example of ./configure:

<raynaud> ./configure --with-blas-lib=/usr/local/install/lapack-3.0/lib \
--with-netcdf-lib=/usr/local/install/netcdf-3.6.1/lib \
--with-netcdf-inc=/usr/local/install/netcdf-3.6.1/include \
--prefix=$HOME --with-pythondir=$HOME/python FC=ifort

3. The Fortran 90 library

3.1. Introduction

The fortran 90 library provides a list of subroutines to perform PCA, MSSA, recontructions and phase composites.

Note

The channel (spatial) dimension is always 1D. Therefore, as in the python module, multi-dimensional arrays must be packed before being analysed. You can use pack and unpack F90 subroutines for this task, and optionally give a mask to remove masked points. If you analyse several variables at the same time, concatenate normalised packed arrays before the analsysis.

3.2. F90 subroutines

3.2.1. Principal Component Analysis: sl_pca

Usage
call sl_pca(ff, nkeep=nkeep, xeof=xeof, pc=pc, ev=ev, weights=weights, useteof=useteof)
Description

Perform a decomposition of space-time field in a set of Empirical Orthogonal Functions (EOFs) and Principal components (PCs). The input data set can be optionally weighted in space. By default, the analysis computes "temporal" (T) or classical spatial (S) EOFs depending on if the space dimension is greater than the time dimension. This default behavior can be overridden.

Necessary arguments
  • ff [intent:input, type:real]: :: Space-time array
Optional arguments
  • nkeep [intent:input, type:integer]: :: Maximum number of modes to keep in outputs
  • xeof [intent:output, type:real]: :: Space-mode array of EOFs
  • pc [intent:output, type:real]: :: Time-mode array of PCs
  • ev [intent:output, type:real]: :: Mode array of eigen values (variances)
  • weights [intent:input, type:real]: :: Space array of weights
  • useteof [intent:input, type:integer]: :: To force the use of T or S EOFs [0 = T, 1 = S, -1 = default]
Dependencies

sl_diasym

3.2.2. Reconstruction of a set of PCA components: sl_pcarec

Usage
call sl_pcarec(xeof, pc, ffrec, istart=istart, iend=iend)
Description

Perform a reconstruction using a set of components previously computed with a PCA. All the reconstructed components are summed. A reconstructed component is simply the "product" of an EOF by its PC. The sum of all reconstructed component is the original field.

Necessary arguments
  • xeof [intent:input, type:real]: :: Space-mode array of EOFs
  • pc [intent:input, type:real]: :: Time-mode array of PCs
  • ffrec [intent:output, type:real]: :: Space-time array of the reconstructed field
Optional arguments
  • istart [intent:input, type:integer]: :: Index of the first component to use
  • iend [intent:input, type:integer]: :: Index of the last component to use

3.2.3. Multi-channel Singular Spectrum Analysis: sl_mssa

Usage
call sl_mssa(ff, nwindow, nkeep, steof=steof, stpc=stpc, ev=ev)
Description

Perform a decomposition of space-time field in a set of space-time Empirical Orthogonal Functions (EOFs) and time Principal components (PCs), according to a window parameter.

Necessary arguments
  • ff [intent:input, type:real]: :: Space-time array
  • nwindow [intent:input, type:integer]: :: Window size
  • nkeep [intent:input, type:integer]: :: Maximum number of modes to keep in outputs
Optional arguments
  • steof [intent:output, type:real]: :: SpaceXwindow-mode array of EOFs
  • stpc [intent:output, type:real]: :: Time-mode array of PCs
  • ev [intent:output, type:real]: :: Mode array of eigen values (variances)
Dependencies

sl_diasym

3.2.4. Reconstruction of a set of MSSA components: sl_mssarec

Usage
call sl_mssarec(steof, stpc, nwindow, ffrec, istart=istart, iend=iend)
Description

Same as for the reconstruction of PCA components, but for MSSA.

Necessary arguments
  • steof [intent:input, type:real]: :: SpaceXwindow-mode array of EOFs
  • stpc [intent:input, type:real]: :: Time-mode array of PCs
  • nwindow [intent:input, type:integer]: :: Window size
  • ffrec [intent:output, type:real]: :: Space-time array of the reconstructed field
Optional arguments
  • istart [intent:input, type:integer]: :: Index of the first component to use
  • iend [intent:input, type:integer]: :: Index of the last component to use

3.2.5. Phase composites: sl_phasecomp

Usage
call sl_phasecomp(ffrec, np, phases, weights=weights, offset=offset, firstphase=firstphase)
Description

Performs phase composites of S-T oscillatory field. This field is typically a reconstructed pair of MSSA modes. Composites are evaluated according to an index defined by the first PC of the input field and its derivative. Space weights can be optionally used to compute the PC. A minimal normalized amplitude can be also used: when the index is under value, data are not used to compute phases. It is also possible so specify the angle of the first phase in the 360 degrees phase diagram circle: zero means the the first phase conincides with the maximmum.

Necessary arguments
  • ffrec [intent:input, type:real]: :: Space-time array
  • np [intent:input, type:integer]: :: Number of requested phases over the 360 degrees cycle (default = 8)
  • phases [intent:output, type:real]: ::
Optional arguments
  • weights [intent:input, type:real]: :: Space array of weights
  • offset [intent:input, type:real]: :: Minimal normalized amplitude of the index (default = 0.)
  • firstphase [intent:input, type:real]: :: Value in degrees of the first phase (default = 0)
Dependencies

sl_pca

3.2.6. Diagonalisation of a symetric matrix: sl_diasym

Usage
call sl_diasym(a, eig)
Description

A simple interface to the ssyev diagonalisation subroutine from LAPACK.

Necessary arguments
  • a [intent:inoutput, type:real]: :: Input = symetric matrix, output = EOFs
  • eig [intent:output, type:real]: :: Eigen values
Dependencies

ssyev(LAPACK)

3.3. A Fortran 90 example

This is an example of how to use this the Fortran 90 component of Spanlib. In this example, an analysis of Sea surface Temperature anomalies using data stored in the netcdf format. The dominant oscillatory mode of the El Nino variability is then extracted and stored in a netcdf file.

Example 1. F90 example

You can run this example typing (from package directory):

<user> cd example && make

This command will first try to download the dataset, then compile the following f90 program, run it, and finally try to visualise output data. See Section 2.1, “Requirements” if it fails.

The sources of this example is highlighted below, with comments inside.

! File: example.f90
!
! This file is part of the SpanLib library.
! Copyright (C) 2006  Stephane Raynaud
! Contact: stephane dot raynaud at gmail dot com
!
! This library is free software; you can redistribute it and/or
! modify it under the terms of the GNU Lesser General Public
! License as published by the Free Software Foundation; either
! version 2.1 of the License, or (at your option) any later version.
!
! This library is distributed in the hope that it will be useful,
! but WITHOUT ANY WARRANTY; without even the implied warranty of
! MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
! Lesser General Public License for more details.
!
! You should have received a copy of the GNU Lesser General Public
! License along with this library; if not, write to the Free Software
! Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

program example

	! This simple example shows how to use all subroutines from this package.
	! Warning: it requires netcdf for in/outputs.
	!
	! We start from longitude/latitude/time value of Pacific Sea Surface Temperature
	! that include the El Nino Southern Oscillation signal.
	! Input is the netcdf file data.cdf.
	! We remove land points from the initial array according
	! to the netcdf missing_value attribute of the analysed variable (data are "packed").
	! A PCA is used to reduce the degrees of freedoom before MSSA analysis.
	! Weights for PCA are computed as a fonction of latitude.
	! Then, we assume that we have already identified an oscillation (after tests).
	! This oscillation, given by a pair of MSSA modes, is then
	! reconstructed from the MSSA and PCA spaces.
	! Finally, phase composites are computed from this reconstructed oscillation.
	! The oscillation is outputed in a netcdf file (pair_1.cdf).
	!
	! The initial data set (data.cdf):
	! - origin: updated Reynolds and Smith (1996) SST (netcdf file)
	! - origin url: data selector from http://iridl.ldeo.columbia.edu
	! - how to get it [10Mb]: http://stefdeperou.free.fr/pub/data.cdf
	! - area of study: tropical pacific [130.5E:75.5W, 29.5S:29.5N] (155x60 grid points)
	! - period of study: Jan1982:Dec2005 (288 time steps)
	!
	! Parameters:
	! - Only the first 20 PCs are retainedand given to the MSSA
	! - A window of 7 years (84 months) is chosen for the MSSA
	! - Phase composites use 8 phases
	! - An offset of 0.4 and is used for composites
	! - The first phase of composites is set at 180 degrees (minimal value)


	use spanlib
	use netcdf

	implicit none

	! Parameters
	! ----------
	integer,parameter :: nkeep_pca=5, nwindow=84, first_mode=1, nphases=8
	real, parameter :: offset=0., first_phase=180., new_missing_value=-999.
	character(len=20), parameter :: input_nc_file="data2.cdf", output_nc_file="output.nc", &
		var_name='ssta'

	! Other declarations
	! ------------------
	real, allocatable :: field(:,:,:), weights(:,:), lat(:), lon(:), time(:)
	real, allocatable :: reco(:,:,:), phasecomps(:,:,:)
	logical, allocatable :: mask(:,:)
	real, allocatable :: packed_field(:,:), packed_weights(:), &
		& packed_phasecomps(:,:), stphasecomps(:,:)
	real, allocatable :: eof(:,:), ev(:), pc(:,:), stpair(:,:), pair(:,:)
	real, allocatable :: steof(:,:),stpc(:,:),stev(:)
	character(len=20) :: dim_names(3), dim_name, lon_units, lat_units, var_units, &
		&	lon_name, lat_name, time_name, time_units
	integer :: ncid, dimid, dimids(4), varid, dims(3), thisdim, &
		& lonid, latid, phaseid, timeid, phcoid, recoid, origid
	integer(kind=4) :: i, nspace, nlon, nlat, ntime
	real :: pi, missing_value

	! Get the initial sst field from the netcdf file
	! ----------------------------------------------
	print*,'Reading inputs...'
	call err(nf90_open(input_nc_file, nf90_nowrite, ncid))
	call err(nf90_inq_varid(ncid, var_name, varid))
	call err(nf90_inquire_variable(ncid, varid, dimids=dimids(1:3)))
	call err(nf90_inquire_dimension(ncid, dimids(1), &
		&	name=lon_name, len=nlon))
	call err(nf90_inquire_dimension(ncid, dimids(2), &
		&	name=lat_name, len=nlat))
	call err(nf90_inquire_dimension(ncid, dimids(3), &
		&	name=time_name, len=ntime))
	allocate(field(nlon,nlat,ntime))
	allocate(mask(nlon,nlat))
	allocate(weights(nlon,nlat))
	allocate(lon(nlon))
	allocate(lat(nlat))
	allocate(time(ntime))
	call err(nf90_get_var(ncid, varid, field))
	call err(nf90_get_att(ncid, varid, 'missing_value', missing_value))
	call err(nf90_get_att(ncid, varid, 'units', var_units))
	call err(nf90_inq_varid(ncid, lon_name, varid))
	call err(nf90_get_var(ncid, varid, lon))
	call err(nf90_get_att(ncid, varid, 'units', lon_units))
	call err(nf90_inq_varid(ncid, lat_name, varid))
	call err(nf90_get_var(ncid, varid, lat))
	call err(nf90_get_att(ncid, varid, 'units', lat_units))
	call err(nf90_inq_varid(ncid, time_name, varid))
	call err(nf90_get_var(ncid, varid, time))
	call err(nf90_get_att(ncid, varid, 'units', time_units))
	call err(nf90_close(ncid))


	! Format (pack) data to have only one space dimension
	! ---------------------------------------------------
	print*,'Packaging...'

	! Compute weights proportional to grid point area
	pi = cos(-1.)
	do i=1,nlat
		weights(:,i) = cos(lat(i)*pi/180.)
	end do

	! Now pack
	if(isnan(missing_value))then
		mask = not(isnan(field(:,:,1)))
	else
		mask = (field(:,:,1) /= missing_value)
	end if
	allocate(packed_field(count(mask), ntime))
	do i=1, ntime
		packed_field(:,i) = pack(field(:,:,i), mask)
	end do
	allocate(packed_weights(count(mask)))
	packed_weights = pack(weights, mask)


	! Perform a PCA to reduce the d.o.f
	! ---------------------------------
	print*,'PCA...'
	call sl_pca(packed_field, nkeep=nkeep_pca, xeof=eof, &
		&	pc=pc, weights=packed_weights)
	deallocate(packed_field)

	! We send results from PCA to MSSA
	! --------------------------------
	print*,'MSSA...'
	call sl_mssa(transpose(pc), nwindow, nkeep=first_mode+1, &
		&	steof=steof, stpc=stpc, ev=stev)

	! We reconstruct modes [first_mode + first_mode+1] of MSSA
	! --------------------------------------------------------

	print*,'MSSAREC...'
	call sl_mssarec(steof, stpc, nwindow, stpair, &
		&	istart=first_mode, iend=first_mode+1)
	deallocate(steof, stpc)

	! We compute phases composites for the reconstructed oscillation
	! ---------------------------------------------------------------
	print*,'PHASECOMP...'
	call sl_phasecomp(stpair, nphases, stphasecomps, &
		&	weights=packed_weights, &
		&	offset=offset, firstphase=first_phase)

	! We go back to the physical space for
	! the full oscillation AND its composites
	! ---------------------------------------
	print*,'PCAREC...'
	call sl_pcarec(eof, transpose(stpair), pair)
	call sl_pcarec(eof, transpose(stphasecomps), packed_phasecomps)
	deallocate(stpair, eof, stphasecomps)


	! Unpacking
	! ---------
	print*,'Unpacking...'
	allocate(reco(nlon,nlat,ntime))
	do i=1, ntime
		reco(:,:,i) = unpack(pair(:,i), mask, new_missing_value)
		where(mask == .false.)
			field(:,:,i) = new_missing_value
		end where
	end do
	allocate(phasecomps(nlon,nlat,nphases))
	do i=1, nphases
		phasecomps(:,:,i) = unpack(packed_phasecomps(:,i), mask, new_missing_value)
	end do


	! Write out the phase composites of the first oscillation
	! -------------------------------------------------------
	print*,'Writing out...'
	! File
	call err(nf90_create(output_nc_file, nf90_write, ncid))
	! Dimensions
	call err(nf90_def_dim(ncid, 'lon', nlon, dimids(1)))
	call err(nf90_def_dim(ncid, 'lat', nlat, dimids(2)))
	call err(nf90_def_dim(ncid, 'time', ntime, dimids(3)))
	call err(nf90_def_dim(ncid, 'phase', nphases, dimids(4)))
	! Variables
	call err(nf90_def_var(ncid, 'lon', nf90_float, dimids(1), lonid))
	call err(nf90_put_att(ncid, lonid, 'long_name', 'Longitude'))
	call err(nf90_put_att(ncid, lonid, 'units', lon_units))
	call err(nf90_def_var(ncid, 'lat', nf90_float, dimids(2), latid))
	call err(nf90_put_att(ncid, latid, 'long_name', 'Latitude'))
	call err(nf90_put_att(ncid, latid, 'units', lat_units))
	call err(nf90_def_var(ncid, 'time', nf90_float, dimids(3), timeid))
	call err(nf90_put_att(ncid, timeid, 'long_name', 'Time'))
	call err(nf90_put_att(ncid, timeid, 'units', time_units))
	call err(nf90_def_var(ncid, 'phase', nf90_float, dimids(4), phaseid))
	call err(nf90_put_att(ncid, phaseid, 'long_name', 'Phase'))
	call err(nf90_put_att(ncid, phaseid, 'units', 'level'))
	call err(nf90_def_var(ncid, 'orig', nf90_float, dimids(1:3), origid))
	call err(nf90_put_att(ncid, origid, 'long_name', 'SST anomaly / original field'))
	call err(nf90_put_att(ncid, origid, 'units', var_units))
	call err(nf90_put_att(ncid, origid, 'missing_value', new_missing_value))
	call err(nf90_def_var(ncid, 'reco1', nf90_float, dimids(1:3), recoid))
	call err(nf90_put_att(ncid, recoid, 'long_name', 'SST anomaly / reconstruction of first pair'))
	call err(nf90_put_att(ncid, recoid, 'units', var_units))
	call err(nf90_put_att(ncid, recoid, 'missing_value', new_missing_value))
	call err(nf90_def_var(ncid, 'pair1', nf90_float, (/dimids(1),dimids(2),dimids(4)/), phcoid))
	call err(nf90_put_att(ncid, phcoid, 'long_name', 'SST anomaly / phase composite of first pair'))
	call err(nf90_put_att(ncid, phcoid, 'units', var_units))
	call err(nf90_put_att(ncid, phcoid, 'missing_value', new_missing_value))
	! Values
	call err(nf90_enddef(ncid))
	call err(nf90_put_var(ncid, lonid, lon))
	call err(nf90_put_var(ncid, latid, lat))
	call err(nf90_put_var(ncid, timeid, time))
	call err(nf90_put_var(ncid, phaseid, float((/(i,i=1,nphases)/))))
	call err(nf90_put_var(ncid, origid, field))
	call err(nf90_put_var(ncid, recoid, reco))
	call err(nf90_put_var(ncid, phcoid, phasecomps))
	call err(nf90_close(ncid))

end program example

subroutine err(jstatus)

	use netcdf

	integer :: jstatus

	if (jstatus .ne. nf90_noerr) then
		print *, trim(nf90_strerror(jstatus))
		stop
	end if

end subroutine err

4. The python module

4.1. Introduction

The intent of the python module is to offer a scripting interface to the F90 subroutines. It takes advantage of the power of python and of the computational efficiency of fortran.

Thanks to the use of CDAT, the module easily handle gridded dataset, using masks and weights for example. It also allows the analysis of several datasets at the same time by stacking them. The module performs some tasks in a transparant way: packing and unpacking of data, PCA reduction before MSSA anaysis, etc.

Note

The current version of this module only manage 3D dataset: first two dimensions for "space", the last dimension for time.

4.2. Python functions

4.2.1.  Takes several data files, of same time and stacks them up together: spanlib.stackData

Usage
dout, weights, masks, axes = spanlib.stackData(*data)
Necessary arguments
  • *data [intent:input] :: One or more data objects to stack. They must all have the same time length.
Outputs
  • dout [intent:output] :: Stacked data
  • weights [intent:output] :: Associated stacked weights
  • masks [intent:output] :: Associated stacked masks
  • axes [intent:output] :: Associated stacked axes

4.2.2.  Unstack data in the form returned from stackData: spanlib.unStackData

Usage
dout = spanlib.unStackData(din, weights, masks, axes)
Necessary arguments
  • din [intent:input] :: Stacked data (see stackData function)
  • weights [intent:input] :: Associated stacked weights
  • masks [intent:input] :: Associated stacked masks
  • axes [intent:input] :: Associated stacked axes
Outputs
  • dout [intent:output] :: List of unstacked data

4.2.3.  Computes weights and mask: spanlib.pack

Usage
packed_data, packed_weights, mask = spanlib.pack(data, weights=None)
Necessary arguments
  • data [intent:input] :: Flatten in space an [x,y,t] array by removing its masked point
Optional arguments
  • weights [intent:input, default:None] :: Weights to be flatten also
Outputs
  • packed_data [intent:output] :: Space-time packed array
  • packed_weights [intent:output] :: Packed weights that were guessed or used
  • mask [intent:output] :: Mask that were guessed or used

4.2.4.  Phase composites for oscillatory fields: spanlib.phases

Usage
phases = spanlib.phases(data, nphases=8, offset=.5, firstphase=0)
Necessary arguments
  • data [intent:input] :: Space-time data oscillatory in time
Optional arguments
  • nphases [intent:input, default:8] :: Number of phases (divisions of the cycle)
  • offset [intent:input, default:.5] :: Normalised offset to keep higher values only [default:
  • firstphase [intent:input, default:0] :: Position of the first phase in the 360 degree cycle
Outputs
  • phases [intent:output] :: Space-phase array

4.2.5.  Prepare the Spectral Analysis Object: spanlib.SpAn

Usage
<SpAn_object> = spanlib.SpAn(data, weights=None, npca=None, window=None, nmssa=None)
Necessary arguments
  • data [intent:input] :: Data on which to run the PC Analysis Last dimensions must represent the spatial dimensions. Analysis will be run on the first dimension.
Optional arguments
  • weights [intent:input, default:1. everywhere] :: If you which to apply weights on some points. Set weights to "0" where you wish to mask. The input data mask will be applied, using the union of all none spacial dimension mask. If the data are on a regular grid, area weights will be generated, if the cdutil (CDAT) module is available.
  • npca [intent:input, default:10] :: Number of principal components to return
  • window [intent:input, default:time_length/3.] :: MSSA window parameter
  • nmssa [intent:input, default:4] :: Number of MSSA modes retained
Outputs
  • <SpAn_object> [intent:output] :: Object created for further analysis

4.2.6.  Principal Components Analysis (PCA): <SpAn_object>.pca

Usage
eof, pc, ev = <SpAn_object>.pca(npca=None)
Optional arguments
  • npca [intent:input, default:None] :: Number of principal components to return, default will be 10
Outputs
  • eof [intent:output] :: EOF array
  • pc [intent:output] :: Principal Components array
  • ev [intent:output] :: Eigein Values array

4.2.7.  MultiChannel Singular Spectrum Analysis (MSSA): <SpAn_object>.mssa

Usage
eof, pc, ev = <SpAn_object>.mssa(nmssa=None, pca=False, window=None)
Optional arguments
  • nmssa [intent:input, default:None] :: Number of MSSA modes retained
  • pca [intent:input, default:False] :: If True, performs a preliminary PCA
  • window [intent:input, default:None] :: MSSA window parameter
Outputs
  • eof [intent:output] :: EOF array
  • pc [intent:output] :: Principal Components array
  • ev [intent:output] :: Eigen Values array

4.2.8.  Reconstruct results from mssa or pca: <SpAn_object>.reconstruct

Usage
ffrec = <SpAn_object>.reconstruct(start=1, end=None, mssa=True, pca=True)
Optional arguments
  • start [intent:input, default:1] :: First mode
  • end [intent:input, default:None] :: Last mode
  • mssa [intent:input, default:True] :: Reconstruct MSSA if True
  • pca [intent:input, default:True] :: Reconstruct PCA if True
Outputs
  • ffrec [intent:output] ::

4.3. A python example

This example shows how to use some of the available components of the python module. It uses the same dataset as for Section 3.3, “A Fortran 90 example”. However, is this example, two different areas are analysed at the same time. This is intented to mimics the used of two different datasets that are stacked, before being analysed. Such approach (see for example Raynaud et al (2006)) allows to find modes of variability in a arbitrary number of variables, provided you are careful with units (performing appropriate normalisations).

If you configurations allows it, you can run this example typing this from the installation package directory:

<user> cd example && make python1

Example 2. Python example

# File: example1.py
#
# This file is part of the SpanLib library.
# Copyright (C) 2006  Charles Doutiraux, Stephane Raynaud
# Contact: stephane dot raynaud at gmail dot com
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA


###################################################################
# In this example, we analyse two different areas at the same time.
# You can do the same with two completely different datasets,
# except that they must have the same temporal grid.
###################################################################

# Needed modules
import cdms
import spanlib
import MV
import vcs

# We tell cdms that we have longitude, latitude and time
cdms.axis.latitude_aliases.append('Y')
cdms.axis.longitude_aliases.append('X')
cdms.axis.time_aliases.append('T')

# Simply open the netcdf file
print "Open file"
f=cdms.open('data2.cdf')

# Get our two datasets
print "Read two different regions"
s2=f('ssta',latitude=(-10,10),longitude=(110,180))
s1=f('ssta',latitude=(-15,15),longitude=(210,250))

# Stack the two dataset to have only one dataset
print "Stacking data"
res = spanlib.stackData(s1,s2)

# Create the analysis object
print "Creating SpAn object"
SP=spanlib.SpAn(MV.array(res[0]),weights=MV.array(res[1]))

# Perform a preliminary PCA
print "PCA..."
eof,pc,ev = SP.pca()

# Now perform a MSSA
print "MSSA..."
res3 = steof,stpc,stev = SP.mssa(pca=True)

# Finally recontructed the filtered field
ffrec = SP.reconstruct()
res4 = spanlib.unStackData(ffrec,res[1],res[2],res[3])



# Plot a timeseries taken from our two
# recontructed datasets
x=vcs.init()
x.plot(res4[1])
raw_input('ok?')
x.clear()
x.plot(res4[1][:,5,5])
x.plot(res4[0][:,5,5])
raw_input('ok?')

5. Links

  • Home page of the project: http://spanlib.sourceforge.net
  • Sourceforge page of the project where you can find news, forums, where to report bugs, where to request features, files to download, etc: http://sourceforge.net/projects/spanlib/
  • An example of use of MSSA is available at http://stefdeperou.free.fr/meteo.php (french). This example shows an analysis of air temperatures at Orly airport, with different values of parameters. It also provides an example of prediction using the analysis results.
  • The SSA-MTM Toolkit is graphical interface to PCA, MSSA and other methods. It is useful to make comparisons of methods. This website provides a lot of exlplanations of how these analysis methods works.

Bibliography

[1] R. Preisendorfer. Principal Component Analysis in Meteorology and Oceanography”. Elsevier Sci.. 1988.

[2] D. Wilks. Statistical methods in the atmospheric sciences”. Cornell University. 1995.

[3] S. D. Broomhead and G. P. King. Extracting qualitative dynamics from experimental data”. Physica D. Elsevier Science Publishers B. V.. 20. 217–236. 1986.

[4] R. Vautard and M. Ghil. Singular Spectrum Analysis in nonlinear dynamics,with application to paleoclimatic time series”. Physica D. 35. 395-424. 1989.

[5] G. Plaut and R. Vautard. Spells of low-frequency oscillations and weather regimes in the northern hemisphere”. J. Atm. Sc.. 51. 210-236. 1994.

[6] S. Raynaud, P. Yiou, R. Kleeman, and S. Speich. Using MSSA to determine explicitly the oscillatory dynamics of weakly nonlinear climate systems”. Nonlin. Proc. Geophys.. 12. 807–815. 2005. Download.

[7] S. Raynaud, O. Aumont, K. Rodgers, and P. Yiou. Interannual to decadal variability of air-sea CO2 fluxes in the North Atlantic”. Ocean Sci.. 2. 43-60. 2006. Download.