Let N be the ensemble size, n the size of the state vector and m the observation space dimension. The best linear unbiased estimator (BLUE) of the model’s state vector given the model forecast {\mathbf{x}}^{f} with error covariance {\mathbf{P}}^{f} and the observation {\mathbf{y}}^{o} with error covariance \mathbf{R} is given by {\mathbf{x}}^{a}:
where \mathbf{H} is the observation operator extracting the observed part of the state vector and {\mathbf{P}}^{a} the error covariance of the analysis {\mathbf{x}}^{a}.
From the ensemble of forecast states {\mathbf{{x}^{f}}}^{(k)} where k = 1,\mathop{\mathop{…}},N one can compute the ensemble mean
\overline{{\mathbf{x}}^{f}} = {1\over
N}{\mathop{∑
}}_{k=1}^{N}{\mathbf{x}}^{{f}^{(k)}
}
| (5) |
and ensemble covariance:
We construct the columns of the matrix {\mathbf{S}}^{f} by:
{
\left ({\mathbf{S}}^{f}\right )}_{
k} = {{{\mathbf{x}}^{f}}^{(k)} −\overline{{\mathbf{x}}^{f}}\over
\sqrt{N − 1}}
| (7) |
where {\mathbf{S}}^{f} is a n × N matrix, which each column being the difference between each member its ensemble mean. Its mean over all columns it thus zero. As many other assimilation schemes (SEEK, RRSQRT, ESSE, EnKF), {\mathbf{P}}^{f} is decomposed in terms of this square root matrix {\mathbf{S}}^{f}:
{
\mathbf{P}}^{f} ={ \mathbf{S}}^{f}{{\mathbf{S}}^{f}}^{T }
| (8) |
Typically, the number of ensemble members N is much smaller than the state vector size n. We rewrite the Kalman Filter analysis, by avoiding any matrix of the size n × n:
Where the Sherman-Morison-Woodbury identity has been applied from (10) to (11). This identity can be expressed as:
with \mathbf{A} = \mathbf{I},
\mathbf{B} ={ \mathbf{HS}}^{f},
\mathbf{C} = \mathbf{R}.
That is, instead of performing the inverse in space of matrix
\mathbf{A} the inverse is done in the
space of the matrix \mathbf{C}. We also
substitute {\mathbf{P}}^{f} in the expression of
the analysis covariance error {\mathbf{P}}^{a}:
In order to avoid to form {\mathbf{P}}^{a} explicitly, we need to express {\mathbf{P}}^{a} also in terms of the square root matrices {\mathbf{S}}^{a}.
{
\mathbf{P}}^{a} ={ \mathbf{S}}^{a}{{\mathbf{S}}^{a}}^{T }
| (17) |
This is possible when the following eigenvalue decomposition is made :
{
\left (\mathbf{H}{\mathbf{S}}^{f}\right )}^{T }{\mathbf{R}}^{−1}\mathbf{H}{\mathbf{S}}^{f} = \mathbf{U}\mathbf{Λ}{\mathbf{U}}^{T }
| (18) |
where {\mathbf{U}}^{T }\mathbf{U} = \mathbf{I} and
where \mathbf{Λ} is
diagonal. \mathbf{U} and
\mathbf{Λ} are both of
the size N × N.
Using the decomposition (18) in equation (16) one obtains:
So we found a square root decomposition of {\mathbf{P}}^{a} in terms of {\mathbf{S}}^{f}\mathbf{U}{(\mathbf{I} + \mathbf{Λ})}^{−1∕2}. But in order to construct an ensemble from the columns of {\mathbf{S}}^{a}, its mean has to be zero. So we will transform {\mathbf{S}}^{a} so that the identity (26) is preserved. One way to do this is
{
\mathbf{S}}^{a} ={ \mathbf{S}}^{f}\mathbf{U}{(\mathbf{I} + \mathbf{Λ})}^{−1∕2}{\mathbf{U}}^{T }
| (27) |
The decomposition (18) can also be used in the computation of the Kalman gain \mathbf{K}: by:
For a linear observation operator, the sum of all columns of \mathbf{H}{\mathbf{S}}^{f} is zero. Thus {\mathbf{1}}_{N×1} is a (unnormalized) eigenvector of {\left (\mathbf{H}{\mathbf{S}}^{f}\right )}^{T }{\mathbf{R}}^{−1}\mathbf{H}{\mathbf{S}}^{f} with eigenvalue 0:
{
\left (\mathbf{H}{\mathbf{S}}^{f}\right )}^{T }{\mathbf{R}}^{−1}\mathbf{H}{\mathbf{S}}^{f}{\mathbf{1}}_{
N×1} = 0\kern 3.26288pt {\mathbf{1}}_{N×1}
| (31) |
If eigenvalues are sorted in \mathbf{Λ}, then {\mathbf{1}}_{N×1} is the smallest and last eigenvalue (as all eigenvalues positive).
where \vec{{e}}_{N} is the a vector with all elements equal to zero expect that last which is one. Therefore, it follows that
\mathbf{U}{(\mathbf{I} + \mathbf{Λ})}^{−1∕2}{\mathbf{U}}^{T }{\mathbf{1}}_{
N×1} ={ \mathbf{1}}_{N×1}
| (34) |
since the element {\mathbf{Λ}}_{N,N} is zero. Thus the mean of all columns of {\mathbf{S}}^{a} is zero.
{\mathbf{S}}^{a} is the square root of {\mathbf{P}}^{a}:
{
\mathbf{P}}^{a} ={ \mathbf{S}}^{a}{{\mathbf{S}}^{a}}^{T }
| (35) |
Based on {\mathbf{x}}^{a} and {\mathbf{S}}^{a}, an ensemble can be reconstructed:
{{
\mathbf{x}}^{a}}^{(k)} ={ \mathbf{x}}^{a} + \sqrt{N − 1}\kern 3.26288pt {{\mathbf{S}}^{a}}^{(k)}
| (36) |
The bias aware analysis scheme of Dee and Silva (1998) is also implemented. But the error space {\mathbf{S}}^{a} is not computed.
The initialisation file of the assimilation module is composed mainly by four sections: configuration of the model (model state vector, position of the individual variables, error space of the model), observations to assimilate (observations, their position, their error), eventual diagnostics of the analysis and miscellaneous flags.
The following code contains the definition of the multivariate state vector. The key
Model.variables is a vector of character strings attributing to each variable a user chosen
name. The keys Model.gridX, Model.gridY, Model.gridZ and Model.mask are vectors of file
names. The files in Model.gridX and Model.gridY are degenerated and give the longitude and
latitude of each variable. The files in Model.gridZ can be plain files and contains the depth.
The key Model.mask is used to determine the sea-land mask of each variable. The
exclusion value (or missing value or _FillValue in NetCDF terminology) marks a
land point all other values, a sea points. Every files assembled into a state vector
should have physical values where mask assumes a sea point. The shape of the arrays
in Model.gridX, Model.gridY, Model.gridZ and Model.mask must be the same.
The string in Model.path in prepended to each file names. Example:
For nested grids the variables of the same nested must be grouped and the groups must be orders according to the resolution started with the highest resolution one. To each model grid is associated a Model.gridnum: one for the highest resolution one, two of the next highest resolution one and so one.
Mandatory keys
Key | Type | Description |
ErrorSpace.dimension | integer | The dimension of the error space. |
ErrorSpace.init | vector of strings | Each string is a Fortran format containing an integer descriptor. The format is converted into a file name with an internal write. The integer is a number ranging from 1 to the dimension of the error space n. n vectors of file names are formed and represent a error mode in the state space. Their norm represent the importance of the error mode and thus they are in general not normed. Orthogonality is not necessary. |
Optional keys
Key | Type | Description |
ErrorSpace.path | string | The path is prepended to all file names specified in ErrorSpace.*. The current path is used by default. |
ErrorSpace.scale | real | Each error mode is multiplied by this real number. The default is 1. |
ErrorSpace.spaceScale | vector of strings | Each error mode is multiplied element by element by this vector. The default is a vector with all elements equal to 1. |
When the local version of the assimilation algorithm (schemetype = 1) is used, then the assimilation is performed in a number of zones independently. Zones are defined by specifying a partition vector which has the same number of variables as the model state vector and each variable has the same size as the corresponding Model.mask. This vector contains only integer values starting with one and represent labels: all elements in the state vector which have the same partition number belong to the same zone. For example, for a state vector with 5 elements and the partition vector \mathbf{p}:
\mathbf{x} = \left (\array{
{x}_{1}
\cr
{x}_{2}
\cr
{x}_{3}
\cr
{x}_{4}
\cr
{x}_{5}} \right )\qquad \mathbf{p} = \left (\array{
1\cr
1
\cr
2\cr
2
\cr
3} \right )
| (37) |
This partition vector defined three zones: the first zone contains elements {x}_{1} and {x}_{2}, the second zone {x}_{3} and {x}_{4} and the third zone {x}_{3}. There should be no gaps in the partition vector. For example the vector {(1, 1, 2, 2, 4)}^{T } would cause an error. In practice, the state vector is partitioned along water columns. The assimilation is performed independently in each zone using only observations within the search radius given by Zones.maxLength. The weight of the observations {1\over R'} is multiplied by a Gaussian function:
{1\over
R'} = {1\over
R'}\mathop{ exp}\nolimits (−{(d∕L)}^{2})
| (38) |
where d is the horizontal distance (in m) the first point of a zone and a single observation and L a length-scale (in m) given by Zones.corrLength. Zones.maxLength and Zones.corrLength have the same size as the model state vector. In most cases these values are constant can be specified by, e.g.:
Key | Type | Description |
Zones.partition | vector of strings | Each string is a file name containing the partition file for the given model variable |
Zones.corrLength | vector of strings | Each string is a file name containing the correlation length |
Zones.maxLength | vector of strings | Each string is a file name containing the maximum correlation length |
All set of simultaneous observation are ordered chronically and are attributed to a time index starting with 001 (written always with three digits). In the following keys “XXX” have to be replaced by the time index.
Mandatory keys
Key | Type | Description |
ObsXXX.time | ’yyyy-mm-ddTHH:MM:ss’ | yyyy=year (minimum 1 digit integer) mm=month (2 digits integer) dd=day (2 digits integer) HH=hour (2 digits integer) MM=min (2 dig-ids integer) SS=second (minimum 1 digit integer or real) |
ObsXXX.value | vector of strings | Each string is a file name containing the actual values of the observations |
ObsXXX.rmse | vector of strings | Each string is a file name containing the root mean square error of the observations. |
ObsXXX.mask | vector of strings | Each string is a file name containing the binary mask of the observations. Values where the mask is different from 1 are rejected. |
Optional keys
Key | Type | Description |
ObsXXX.variables | vector of strings | The names must correspond to the names chosen in Model.variables. Unknown names are treated as ”out of the grid” and are not assimilated. |
ObsXXX.names | vector of strings | Each string is a description of the data type of the observations. You can choose any name meaningful to you. These names are only used for the log-file. The default names are Var01, Var02,... |
ObsXXX.gridX | vector of strings | Each string is a file name containing the longitude of the observations. |
ObsXXX.gridY | vector of strings | Each string is a file name containing the latitude of the observations. |
ObsXXX.gridZ | vector of strings | Each string is a file name containing the depth of the observations. |
ObsXXX.HperObs | vector of strings | The observation operator stored in a sparse matrix form per observations |
ObsXXX.operator | string | The observation operator stored in a sparse matrix form. |
ObsXXX.path | string | The path is prepended to all file names specified in ObsXXX.*. The current path is used by default. |
The optional keys are used to create the observation operator. If it is applied to the state vector, it extracts the observed variables at the location of the measurements. Several ways exist to specify the observation operator.
Note that the individual arrays in ObsXXX.value, ObsXXX.rmse, ObsXXX.mask, ObsXXX.gridX, ObsXXX.gridY and ObsXXX.gridZ should have the same size.
Format of the observation operator
Only the non-zero elements of the observation operator are specified in the 9 × n matrix (in column-major order) where n is the number of non-zero elements. Each column has the following structure:
Observations | Model |
| ||||||
var. index | i-index | j-index | k-index | var. index | i-index | j-index | k-index | Inter-polation coefficient |
The first integer value is related to the observation. The index of the variable is the position
where the observed variable appears in ObsXXX.value and i,j,k-index are the three spatial
indexes of a single scalar observation.
The integers in column 5 to 8 are related to the model state vector. Again the index of the variable is the position where the observed variable appears in Model.variables and i,j,k-index are the three spatial indexes of a single scalar model forecast. If one of the model indexes is -1 the corresponding observation is treated ”out of grid” and the associated weight will be zero.
The column 9 is a real value between 0 and 1 in the case of a simple trilinear interpolation. The observation operator can be generated offline using a trilinear interpolation with the tool ”genobsoper”.
All diagnostics are optional and the corresponding files are output.
Key | Type | Description |
DiagXXX.xf | vector of strings | the model forecast (ensemble mean) |
DiagXXX.Hxf | vector of strings | the observed part of the model forecast |
DiagXXX.Sf | vector of strings | Each string is a Fortran format. For the conversion into file names see the key ErrorSpace.init. The files represent the error modes of the model forecast. |
DiagXXX.Ef | vector of strings | Each string is a Fortran format. For the conversion into file names see the key ErrorSpace.init. The files represent the forecast ensemble. |
DiagXXX.diagPf | vector of strings | The diagonal elements of error covariance of the model forecast. |
DiagXXX.diagHPfHT | vector of strings | The diagonal elements of error covariance of the observed part of the model forecast |
DiagXXX.stddevxf | vector of strings | Standard deviation of the error of the model forecast. |
DiagXXX.stddevHxf | vector of strings | Standard deviation of the error of the observed part of the model forecast. |
DiagXXX.path | string | The path is prepended to all file names specified in DiagXXX.*. The current path is used by default. |
DiagXXX.xa | vector of strings | the analysis (ensemble mean) |
DiagXXX.Hxa | vector of strings | the observed part of the analysis |
DiagXXX.Sa | vector of strings | Each string is a Fortran format. For the conversion into filenames see the key ErrorSpace.init. The files represent the error modes of the analysis. |
DiagXXX.Ea | vector of strings | Each string is a Fortran format. For the conversion into file names see the key ErrorSpace.init. The files represent the analysis ensemble. |
DiagXXX.diagPa | vector string | The diagonal elements of error covariance of the analysis. |
DiagXXX.diagHPaHT | vector of strings | The diagonal elements of error covariance of the observed part of the analysis |
DiagXXX.stddevxa | vector of strings | Standard deviation of the error of the analysis. |
DiagXXX.stddevHxa | vector of strings | Standard deviation of the error of the observed part of the analysis. |
DiagXXX.H | strings | the observation operator |
DiagXXX.yo | vector of strings | The observations. |
DiagXXX.invsqrtR | vector of strings | The inverse of the root mean square error of the observations. If a scalar observation point has been eliminated (out of the model grid for example) its weight is zero. |
DiagXXX.xa-xf | vector of strings | The analysis increment |
DiagXXX.yo-Hxf | vector of strings | the observation minus the model forecast at the observation points |
DiagXXX.yo-Hxa | vector of strings | the observation minus the model analysis at the observation points |
DiagXXX.Hxa-Hxf | vector of strings | analysis increment at the observation points |
DiagXXX.path | string | The path is prepended to all filenames specified in DiagXXX.*. The current path is used by default. |
Key | Type | Description |
nbnest | integer | Number of nested grids |
assimnum | integer | Number between 1 and nbnest different for each model. The model with assimnum does the assimilation |
runtype | integer | possible values of runtype are:
|
schemetype | integer | possible values of schemetype are:
|
moderrtype | integer | possible values of moderrtype are:
|
biastype | integer | possible values of biastype are:
|
Bias.gamma | real | fraction of the error with is systematic |
Bias.init | vector of string | the initial estimation of the bias |
joinvectors | integer | If joinvectors is 1 then the variables of the nested grids will be assembled to one multigrid state vector |
logfile | string | File contains simple diagnostics such as rmse with observations |
debugfile | string | File contains debugging information is the code was compiled with the flag -DDEBUG |