In this section, we develop a practical real-time quality control procedure for SRIF based on the study by Shi et al. (2008), dedicating to the processing of huge GNSS networks, and make effort to improve its computational efficiency. The new approach follows the well-known procedure with three sequential steps: detection, identification, and adaptation (Teunissen, 1990, 1998, 2018; Yang et al., 2010) and is presented in detail hereafter.
Detection
The first step of the SRIF QC is to detect the existence of undetected cycle slips or outliers in the observation vector. Starting with the measurement update model of Eq. (23) including the information after the time update and the observations at the current epoch, Eq. (24) is obtained by applying the orthogonal transformation \(T\). In Eq. (24) \(e_{i}\) is the corresponding vector of the posterior residuals at the current epoch which theoretically satisfies \(e_{i} \sim N\left( {0,I} \right)\), and the variance of unit weight can be calculated as:
$$ \hat{\sigma }^{2} = \frac{{e_{i}^{T} e_{i} }}{m} $$
(31)
where \(e_{i}^{T} e_{i}\) follows the Chi-square distribution with the degrees of freedom m.
In principle, we can test the individual residuals or the variance of unit weight of Eq. (31) to detect whether there is any outlier or not. In this study, we check both by constructing the following simple hypothetical test:
$$ H_{0} :\left| {e_{i} } \right|_{\max } < k_{1} \;{\text{and}}\;\hat{\sigma } < k_{2} $$
(32)
where \(k_{1}\) and \(k_{2}\) are the thresholds for the residuals and unit weight STD, respectively, and theoretically they can be set according to their distributions. In this study for the multi-GNSS clock estimation, we use the empirical value of 5.0 and 1.5, respectively, which are based on our long-term operational processing experience at the GFZ IGS real-time analysis center. It is worth to mention that the unit weight STD depends on the a priori STD of the observations, which should be fine-tuned for each station, so that the unit weight STD is on average close to 1. This can affect the selection of the aforesaid thresholds. For example, a too optimistic the a priori STD for observations will results in larger residuals and consequently a large STD of unit weight, resultinng in more false outliers detected. In contrast, with a too large the a priori STD for observations, we may fail to detect the real outliers.
If \(H_{0}\) is accepted, there is no problematic observation at this epoch; otherwise, usually at least one outlier exists in the observation vector. It should be noted that the rejection of \(H_{0}\) can also be caused by the inaccurate modeling of the state parameters. However, in this contribution, we focus only on the quality control for observation blunders.
Identification
After \(H_{0}\) is rejected, the second step of the SRIF quality control is to find out which observations are contaminated by undetected cycle slips or blunders. The basic idea is to extend the observation model to include the possible outliers and then check whether under the extended model the \(H_{0}\) can be accepted. We first present the approach and then discuss how to select the outlier candidates from thousands of observations.
Assuming that there are \(n_{b}\) possible outliers with the observation index ip(k), k = 1,2,…\(n_{b}\), the function model Eq. (23) can be extended by introducing the corresponding outlier parameters \(\Delta\) with the a priori values of zero and variance matrix \(D_{\Delta } = R_{\Delta }^{ - 1} R_{\Delta }^{ - T}\) as:
$$ \left[ \begin{gathered} \tilde{R}_{i} \;\;\;\;0 \hfill \\ 0\;\;\;\;\;R_{\Delta } \hfill \\ A_{i} \;\;\;\;\theta_{\Delta } \hfill \\ \end{gathered} \right]\left[ \begin{gathered} x \hfill \\ \Delta \hfill \\ \end{gathered} \right] = \left[ \begin{gathered} \tilde{z}_{i} \hfill \\ 0 \hfill \\ z_{i} \hfill \\ \end{gathered} \right] $$
(33)
where \(\theta_{\Delta }\) is a matrix of \(m \times n_{b}\) with unit vector for all the columns, and for its k-th column only the ip(k)-th element equals to 1. By applying the same orthogonal transformation as that used to Eq. (23) in the measurement update at epoch i, we can get
$$ \left[ \begin{gathered} \hat{R}_{i} \;\;\;\hat{R}_{\Delta ,i} \hfill \\ 0\;\;\;\;\;R_{\Delta } \hfill \\ 0\;\;\;\;\;S_{\Delta ,i} \hfill \\ \end{gathered} \right]\left[ \begin{gathered} x \hfill \\ \Delta \hfill \\ \end{gathered} \right] = \left[ \begin{gathered} \hat{z}_{i} \hfill \\ 0 \hfill \\ e_{i} \hfill \\ \end{gathered} \right] $$
(34)
with
$$ \begin{gathered} \hat{R}_{\Delta ,i} { = }\left[ {\hat{R}_{{\Delta_{1} }} ,\hat{R}_{{\Delta_{2} }} , \cdot \cdot \cdot \hat{R}_{{\Delta_{{n_{b} }} }} } \right] \hfill \\ S_{\Delta ,i} { = }\left[ {S_{{\Delta_{1} }} ,S_{{\Delta_{2} }} , \cdot \cdot \cdot S_{{\Delta_{{n_{b} }} }} } \right] \hfill \\ \end{gathered} $$
(35)
where \(S_{\Delta ,i}\) can be considered as the sensitivity matrix of the residual vector \(e_{i}\) with respect to outliers \(\Delta\). It means that the magnitude of the un-detected cycle slips or blunders is mapped into the a posteriori residual vector through \(S_{\Delta ,i}\) as:
$$ e_{i} = S_{\Delta ,i} \Delta $$
(36)
It is clear that the residual vector \(e_{i}\) is a combination of outliers and observations noises, so the outlier parameters can be solved from Eq. (36) using the least square adjustment and the solution reads as
$$ \left\{ \begin{gathered} \hat{\Delta }{ = }\left( {S_{\Delta ,i}^{T} S_{\Delta ,i} } \right)^{ - 1} S_{\Delta ,i}^{T} e_{i} \hfill \\ \hat{v}_{\Delta } = e_{i} - S_{\Delta ,i} \hat{\Delta } \hfill \\ \hat{\sigma }_{0} = \frac{{\hat{v}_{\Delta }^{T} \hat{v}_{\Delta } }}{{m - n_{b} }} \hfill \\ \end{gathered} \right. $$
(37)
Then the same hypothetical test \(H_{0}\) can be conducted with the residuals and STD of the above solution. If the test is passed, the set of outliers should be accepted as the finally identified ones; otherwise, an additional problematic observation should be selected and added, then the same procedure from Eq. (33) to (37) is repeated until \(H_{0}\) is accepted. For the positioning of a single station, only tens of observations are involved per epoch, and hence it is possible to test all possible outlier combinations to identify the right ones. However, it is not possible for the real-time clock estimation using a global network, as the observation number per epoch reaches up to several thousands. Therefore, it is very important to select the outlier candidates which are critical to balance the product reliability and the computation efficiency.
Considering the situation of a global network with about 100 stations, there are enough redundant observations to detect the outliers which should show the largest residuals in the least squares adjustment. Therefore, we developed an empirical approach to select the potential outlier candidates according to the magnitude of the residuals. The residuals are sorted according to their absolute values and the largest one is selected as the most likely outlier in the associated adjustment. Then the above-mentioned identification procedure, that is, Eq. (33) to Eq. (37), is carried out. The whole identification will stop if the hypothesis \(H_{0}\) is accepted. Otherwise, the updated residuals of Eq. (37) will be used to find out the next most likely outlier according to their absolute values and added to the already selected ones for the next iteration. The procedure is repeated until the \(H_{0}\) is accepted or a certain number of observations has been marked as outliers, for example 100, in our operational data processing.
It must be pointed out that a sophisticated pre-processing should be implemented to find out as many problematic observations as possible, particularly the large outliers, and only few small outliers can remain undetected to reduce the computation cost of quality control and to avoid numerical instability caused by large outliers. We have designed for each satellite-station pair a channel-filter to identify outliers by checking the length of data gaps, the variation of raw phase and range observations, and especially the Melbourne-Wübbena combination and ionosphere-delay combination for jumps. From our operational results, after the channel-filtering the remaining outliers are quite few which will be presented in the experimental validation.
Another critical issue is the computation of the sensitivity vector \(S\) because there are thousands of observations, and each could be an outlier. Theoretically, we can compute and save the sensitivity for all the observations, however it is not only too time-consuming, but also needs a very huge memory. According to Eq. (18) where the matrix multiplication is replaced by vector addition, the sensitivity vector \(S\) for a single observation can be calculated by applying the orthogonal transformation to the corresponding vector. Therefore, only the sensitivity vectors for the observations selected as outlier candidates have to be calculated, which is usually very small in quantity, for example in the operational processing the epochs with one outlier account to only about 15.58%, and those with two account to about 5.28%..
Adaptation
Once the outliers are identified, the negative impact of the identified outliers must be removed from the filter. As it is very difficult to down-weight or weight negatively the corresponding observations, we adapt the filter by extending the function model as is done in the identification step. This is also because the most important effect comes from the cycle slips which can only be mitigated by adding an outlier parameter. The extended function model for the final identified outliers is already achieved with the help of the sensitivity vectors.
Assuming that there are \(n_{b}\) outliers found at the last step of the identification, the adapted model with the \(n_{b}\) outliers is already available, i.e., Eq. (34) where obviously \(\hat{R}_{i}\) is obtained in the measurement update, \(\hat{R}_{\Delta ,i}\) and \(S_{\Delta ,i}\) are already calculated as the sensitivity vectors in the identification step.
We can simply apply the same type of triangularization to the last \(n_{b}\) column to achieve the SRI for further data processing:
$$ \left[ \begin{gathered} \hat{R}_{i} \;\;\;\hat{R}_{\Delta ,i} \hfill \\ 0\;\;\;\;\;\hat{R}_{\Delta } \hfill \\ 0\;\;\;\;\;0 \hfill \\ \end{gathered} \right]\left[ \begin{gathered} x \hfill \\ \Delta \hfill \\ \end{gathered} \right] = \left[ \begin{gathered} \hat{z}_{i} \hfill \\ z_{\Delta } \hfill \\ \hat{e}_{i} \hfill \\ \end{gathered} \right] $$
(38)
Furthermore, the outlier parameters must be eliminated sooner. For a range outlier the introduced parameter can be removed directly, whereas the parameter for a phase observation could be either a cycle slip or an outlier. For a cycle slip, it can be replaced by the difference between the ambiguities before and after the cycle slip, then the previous ambiguity will be eliminated and only the new ambiguity is kept.
Assuming the ambiguities are \(n_{1}\) and \(n_{2}\), the \(\Delta n_{12} = n_{2} - n_{1}\), we have \(n_{1} = n_{2} - \Delta n_{12}\). Replace \(n_{1}\) in the Eq. (34) by \(n_{2} - \Delta n_{12}\), we only have to triangularize the last \(n_{b}\) columns, otherwise the triangularization must start from the column where \(n_{1}\) is. After the triangularization of the last \(n_{b}\) columns, the outlier parameters can be eliminated, and the information equation is ready for deriving an adapted solution and for the processing of the next epoch.