Skip to main content
  • Original Article
  • Open access
  • Published:

A novel global grid model for soil moisture retrieval considering geographical disparity in spaceborne GNSS-R

Abstract

Spaceborne global navigation satellite system-reflectometry has become an effective technique for Soil Moisture (SM) retrieval. However, the accuracy of global SM retrieval using a single model is limited due to the complexity of land surface. Introducing redundant ancillary data may also result in over-reliance problems. Therefore, we propose a method for SM retrieval that considers geographical disparities using the data from Cyclone GNSS (CYGNSS) observations and Soil Moisture Active and Passive (SMAP) product. Based on the CYGNSS effective reflectivity and ancillary datasets of SMAP, we establish five models for each grid with different parameters to achieve global SM retrieval. Subsequently, an optimal model, determined by the performance indicator, is used for SM retrieval. The results show that the root mean square error \(S_{\mathrm{RMSE}}\) with the improved method is decreased by 9.1% using SMAP SM as reference with the \(S_{\mathrm{RMSE}}\) = 0.040 cm3/cm3 compared with using single reflectivity-temperature-vegetation method. Additionally, using the in-situ SM of International Soil Moisture Network as reference, the overall correlation coefficient \(R\) and \(S_{\mathrm{RMSE}}\) values with the improved method are 0.80 and 0.064 cm3/cm3, respectively. The average \(R\) of the chosen sites is increased by 22.7%, and the average \(S_{\mathrm{RMSE}}\) is decreased by 8.7%. The results indicate that the improved method can better retrieve SM in both global and local scales without redundant auxiliary data.

Introduction

Soil Moisture (SM) is a crucial parameter in the water cycle as it links the atmosphere with the land. Accurate SM estimation is essential for advancing research on water cycle dynamics and crop growth (Holzman & Rivas, 2016; Schlüter et al., 2022). The traditional methods for estimating large-scale or global soil moisture primarily rely on microwave remote sensing. However, for SM retrievals with high spatial and temporal resolution active and passive microwave remote sensing techniques poses a significant challenge (Kuenzer et al., 2013). Recently, Global Navigation Satellite Systems-Reflectometry (GNSS-R), wherein GNSS signals reflected from the Earth’s surface are utilized in a forward bistatic radar configuration, has emerged as an effective remote sensing technique for estimating Earth surface geophysical parameters (Rodriguez-Alvarez et al., 2011, 2019; Wigneron et al., 2008). GNSS-R operating L-band signals can effectively penetrate the atmosphere, vegetation, and rain (Alonso-Arroyo et al., 2016; Balasubramaniam & Ruf, 2020; Camps et al., 2020). Moreover, with hundreds of GNSS satellites in orbit, GNSS-R benefits from many signal sources (Bu et al., 2020; Kim & Park, 2021), which enables accurate SM estimation with high spatial and temporal resolution.

Remote sensing observations of the physical parameters on the Earth’s surface using GNSS-reflected signals can be traced back to the later 1980s (Jin et al., 2024; Pan et al., 2020). Martin-Neira (1993) is the first to use GNSS-reflected signals to retrieve sea-surface heights. Subsequently, GNSS-R have demonstrated the capacity for surface geophysical parameter retrieval in ground-based and airborne platforms (Yu et al., 2014), such as those associated with soil moisture (Larson et al., 2008; Wu et al., 2021; Yan et al., 2022), sea level (Liu et al., 2022; Rajabi et al., 2021; Wang et al., 2019), water-level monitoring (Ichikawa et al., 2019; Wang et al., 2021), sea surface wind speed (Dong & Jin, 2019; Li et al., 2021), and snow depth (Jin et al., 2016; Yu et al., 2015; Zhou et al., 2019). Jin et al. (2024) summarized the progress of GNSS-R technology in various applications and discussed the current challenges and development prospects in multiple fields. The GNSS-R platform has expanded from ground-based and airborne to spaceborne (Zavorotny et al., 2014). Owing to its ability to rapidly obtain the surface physical information of a large area, spaceborne GNSS-R has broader application prospects, such as in forest biomass retrieval (Carreno-Luengo et al., 2020; Chen et al., 2021) and flood monitoring (Chew & Small, 2020; Zhang et al., 2021).

Spaceborne GNSS-R has a great capacity for SM retrieval (Al-Khaldi and Johnson, 2021a; Nan et al., 2022). Chew et al. (2016) found that there was a high correlation between SM and the observable derived from TechDemoSat-1. Camps et al. (2018a) also revealed the high correlation between SM and the peak power of the Delay Doppler Map (DDM). Based on the above works, Chew and Small (2018) established a linear relationship between Cyclone GNSS (CYGNSS) effective reflectivity and the SM derived from Soil Moisture Active and Passive (SMAP). However, due to the complex surface environment, the CYGNSS effective reflectivity is influenced not only by SM but also vegetation, surface roughness, soil surface temperature, and other factors (Dong et al., 2023; Izadgoshasb et al., 2024; Pierdicca et al., 2014). Clarizia et al. (2019) proposed a Reflectivity-Vegetation-Roughness (R-V-R) ternary linear regression algorithm to comprehensively consider the influences of vegetation and roughness on the CYGNSS effective reflectivity. Eroglu et al. (2019) established a SM retrieval method using an artificial neural network to learn this complex relationship. Additionally, the soil surface temperature also influences the effective reflectivity (Wigneron et al., 2008). Thus, Zhu et al. (2022) proposed a Reflectivity-Temperature-Vegetation (R-T-V) method to estimate SM by analyzing the impact of surface temperature on the CYGNSS effective reflectivity. Yan et al. (2020) proposed CYGNSS observables that can resolve the contributions of SM and surface roughness. Camps et al. (2018b) found that the relationship between the spaceborne GNSS-R effective reflectivity and other factors was affected by the geographical disparities. Moreover, Jia et al. (2024) proposed an advanced SM retrieval method based on Geographically Weighted Regression (GWR) which encompasses various spatial weights. It can preserve local spatial relationships and patterns while providing fine-resolution SM estimates. Therefore, it is necessary to consider geographical disparities for global SM retrieval. Additionally, introducing redundant parameters may also result in over-reliance problems with heavy-loaded ancillary data (Yan et al., 2020; Yang et al., 2024).

Following the studies and the associated limitations identified, we developed a SM retrieval method with spaceborne GNSS-R that considers the geographical disparities. This work aims to mitigate the impact of geographical disparities on SM estimates and avoid using redundant ancillary data. In this work, the CYGNSS effective reflectivity accounts for the effects of SM, surface roughness, vegetation, and soil surface temperature. Additionally, the relationship between the auxiliary parameters included in the model and geographical disparities is investigated. This work can help separate the effects of SM, vegetation, and other factors on the CYGNSS effective reflectivity. Compared with previous studies, this paper proposes a simple and effective method for SM retrieval. The remainder of this paper is organized as follows. The collocations of CYGNSS, SMAP, and International Soil Moisture Network (ISMN) data are described in Section “Datasets”; The development of the improved method is presented in Section “Methodology”; The results are presented in Section “Results and discussion”; The concluding remarks are given in Section “Conclusion”.

Datasets

The data utilized in this study are derived from CYGNSS, SMAP, and ISMN. The observations of CYGNSS and SMAP from January 2019 to December 2019 are collocated in an Equal-Area Scalable Earth (EASE) 2.0 36 km × 36 km grid. Note that the original data from Day of the Year (DOY) 171 to DOY 203 has not been provided in the SMAP product. The SM data of SMAP and ISMN are used for validation.

CYGNSS

CYGNSS, a constellation of eight miniature satellites launched by National Aeronautics and Space Administration (NASA) in 2016, is designed to monitor the global surface within a latitude range of ± 38° (Wu et al., 2020; Zavorotny & Voronovich, 2000). Each CYGNSS satellite simultaneously records four GNSS signals, with mean and median revisit times of 7 and 3 h, respectively (Jia et al., 2021). In this study, the observations derived from CYGNSS version 3.0 (L1) data are the latitude and longitude of a specular reflection point, the distance from the specular point to the CYGNSS spacecraft and the Global Positioning System (GPS) satellite, effective isotropic radiation power of GPS, antenna gain, DDM, and incident angle.

SMAP

SMAP, an Earth observation satellite launched by NASA in 2015, is initially designed to provide global SM levels and freeze–thaw classification using radar technology (Chew & Small, 2018). Although the active radar malfunctioned in July 2015, the microwave radiometer is still operational and provides important data for SM research and applications. By evaluating SMAP products, Colliander et al., (2019a, 2019b) demonstrated that L-band microwave radiometer data provided the expected accuracy for satellite design. In this study, the SM dataset derived from the SMAP L3 Radiometer Global Daily 36-km EASE-Grid (version 8) product is used as the reference dataset. This SM \(P_{\mathrm{SM}}\) dataset is the descending orbit data. Additionally, the other auxiliary datasets utilized in this study, as shown in Fig. 1, are Surface Roughness (SR) \(P_{\mathrm{SR}}\), Vegetation Optical Depth (VOD) \(P_{\mathrm{VOD}}\), Soil surface Temperature (ST) \(P_{\mathrm{ST}}\), and Vegetation Water Content (VWC) \(P_{\mathrm{VWC}}\).

Fig. 1
figure 1

The spatial maps of SMAP yearly mean \(P_{\mathrm{SR}}\), \(P_{\mathrm{VOD}}\), \(P_{\mathrm{ST}}\), and \(P_{\mathrm{VWC}}\), respectively

ISMN

The ISMN was established in 2009 to maintain a global in-situ SM database. It is a centralized data-hosting facility that supports the calibration and validation of global satellite products (Dorigo et al., 2011). With numerous operational and experimental SM networks worldwide, the global in-situ SM database serves as a valuable resource for validating SM retrieval. In this study, the in-situ SM data of ISMN is aggregated daily for field validation. The data at a depth of 5 cm for ISMN is used due to the limited penetration of L-band.

Methodology

In this section, the improved SM retrieval method for CYGNSS is investigated. The method consists of five linear models, similar to the R-V-R method proposed by Clarizia et al. (2019).

CYGNSS effective reflectivity

Previous studies found a strong relationship between CYGNSS effective reflectivity and SM (Senyurek et al., 2020a, 2020b). Loria et al. (2023) demonstrated that land-surface Delay Doppler Maps (DDMs) showcase the scattering behaviors from pure coherent reflection to pure incoherent scattering, as well as a combination of both. In the regions with dense vegetation or large topographic variation and roughness, the coherent component contained in the reflected signal is weaker than the incoherent component (Jin et al., 2024; Ruf et al., 2018; Zavorotny et al., 2014). Al-Khaldi et al. (2019) found that CYGNSS land observations were primarily coherent-component-dominated with the incoherent component having minimal impact on soil moisture retrieval. Like previous literature, this study also assumes coherent reflectivity as the dominant factor across the land surface. Thus, the CYGNSS effective reflectivity (\(\varGamma_{\mathrm{CYGNSS}}^{{{\mathrm{coh}}}}\)) can be calculated using the following formula (Al-Khaldi et al., 2021b)

$$\varGamma_{\mathrm{CYGNSS}}^{\mathrm{coh}} \ \mathrm{=} \ \frac{P_r \cdot\mathrm{(}\mathrm{4 \pi}\mathrm{)}^\mathrm{2} \cdot\left(R_{t s}\mathrm{+}R_{r s}\right)^\mathrm{2}}{P_t \cdot G_t \cdot G_r \cdot \lambda^\mathrm{2}} $$
(1)

,where \(\varGamma_{\mathrm{CYGNSS}}^{{{\mathrm{coh}}}}\) is the CYGNSS effective reflectivity; \(P_{t}\) is the transmitting power of the GPS satellite; \(P_{r}\) is the peak value of the simulated scattering power DDM; \(G_{t}\) and \(G_{r}\) are the gains of the reflecting and receiving antennae, respectively; \(P_{t} G_{t}\) can be expressed by the Equivalent Isotope Radiation Power (EIRP) of the GPS transmitter at the specular reflection point; \(R_{ts}\) and \(R_{rs}\) represent the distance between the GPS signal transmitter and the CYGNSS receiver to the specular reflection point, respectively; \(\lambda\) is the wavelength of the GPS L1 signal.

Data quality control

The data quality control in this study is as follows. The CYGNSS data with incident angles exceeding 65° are excluded to reduce DDM noise. The observations with Signal-to-Noise Ratio (SNR) less than 2 dB, as well as those with SNR equal to or greater than the receiver antenna gain plus 14 dB, are also excluded. The sampling points with poor accuracy are also eliminated according to the variable quality mark of the data extraction. The SMAP data with SM values lower than 0.01 cm3/cm3 and VWC values higher than 18 kg/m2 are removed to reduce the error caused by low SM value and the effect of dense vegetation (Camps et al., 2020).

Development of the improved method

Due to the complex surface environment, the influences of vegetation, water, SM, and other factors on spaceborne GNSS-R SM retrieval are difficult to define precisely (Camps et al., 2018b). Thus, analyzing the geographical disparities in different regions is necessary for improving SM retrieval. With SMAP product assimilated the global land surface types of the International Geosphere-Biosphere Programme (IGBP), the grids can be marked with land cover categories to consider geographical disparities. Figure 2 shows the 16 land cover categories globally. From the distribution of land cover categories, the global geographical disparity is obvious.

Fig. 2
figure 2

Global land cover categories of IGBP from SMAP product (40°S-40°N)

Here, the auxiliary parameters (\(P_{\mathrm{SR}}\), \(P_{\mathrm{VOD}}\), \(P_{\mathrm{ST}}\), and \(P_{\mathrm{VWC}}\)) are used to compensate for the CYGNSS effective reflectivity in spaceborne GNSS-R SM retrieval. Note that the \(P_{\mathrm{VOD}}\) is the same as ‘tau’ parameter normalized by the cosine of the incidence angle in the ‘tau-omega’ model, with the incidence angle set to a fixed value (40°). Considering the impacts of vegetation in different incident angles, we recalculate the \(P_{\mathrm{VOD}}\) parameter with the incident angle at the specular reflection point. Besides, the \(P_{\mathrm{VWC}}\) parameter without incident angle is additionally introduced to compensate for reflectivity. The results of the significance difference demonstrate that \(P_{\mathrm{VOD}}\) and \(P_{\mathrm{VWC}}\) are different at the significance level of 5%.

The specific SM retrieval method is illustrated in Fig. 3. As mentioned previously, introducing redundant auxiliary parameters may result in over-reliance problems. Yan et al. (2024) implemented a variable importance analysis by sequentially excluding input data and measuring the decrease in the accuracy of the results retrieved by each model. Thus, the removal of input variables can be an optional solution for sensitivity analysis and addressing the coupling problem. The similar approach is used to pair auxiliary parameters and combine them with the CYGNSS effective reflectivity after quality control. The number of auxiliary parameters in each model is two. Then, five models consisting of five groups of triadic linear models are established as shown in Table 1. In addition to the two models (R-S-V and R-T-V), three models are established (R-S-T, R-S-W, and R-T-W). Additionally, to obtain a more stable model, the data is divided randomly into training set and verification set. The training set comprises 70% of the data, while the validation set account for 30%. The regression coefficients in each linear model are calculated in the process of model training. Five ternary linear models are simultaneously fitted within the grid, with each model having its regression coefficients. To maintain the generalization ability of models, the training and validation sets are the same for each model. Additionally, only the results with close accuracy verification between the training set and the validation set are recorded.

Fig. 3
figure 3

Diagram of data processing and flowchart of the improved method

Table 1 Five models with corresponding equations in the improved method

The root-mean-square error \(I_{\mathrm{RMSE}}\), correlation coefficient \(I_{\mathrm{R}}\), and coefficient of determination \(I_{\mathrm{D}}\) of grid fitting are used to assess the performance of the proposed models. These indexes are obtained using the validation set for each linear model. Due to the different values of \(I_{\mathrm{RMSE}}\), \(I_{\mathrm{R}}\), and \(I_{\mathrm{D}}\) for different models, it is difficult to judge the optimal model in each grid. Therefore, we propose a performance indicator, which is defined as:

$$I = \left( {\left( {I_{{\mathrm{RMSE}}} } \right) + \left( {1 - I_{\mathrm{R}} } \right) + \left( {1 - I_{\mathrm{D}} } \right)} \right)$$
(2)

where \(I\) is the performance indicator; The model with minimum \(I\) is selected as the optimal model for the grid. Meanwhile, the auxiliary parameters used by the optimal model are considered the optimal parameters for the grid.

Results and discussion

In this section, the relationship of the optimal model in different grids with characteristic regions is investigated. Subsequently, the SM retrieval performance of the proposed method is evaluated using SM from SMAP and ISMN.

Analysis of geographical disparities using the optimal model in a grid

Here, the optimal model, as well as the relationship between the CYGNSS effective reflectivity and the auxiliary parameters in different land cover categories and the characteristic regions are presented. The average correlation coefficient is used to assess the sensitivities between CYGNSS effectivity reflectivity and influenced factors. Due to the positive and negative relationship between the reflectivity and other impact factors, the absolute value of correlation coefficient in each grid is used. Figure 4 illustrates these sensitivities in different land cover categories. In addition to the \(P_{\mathrm{SR}}\), the other auxiliary parameters exhibit a higher correlation with reflectivity compared to other influencing factors. As previously mentioned, the sensitivities of \(P_{\mathrm{VWC}}\) and \(P_{\mathrm{VOD}}\) to the CYGNSS effective reflectivity are different.

Fig. 4
figure 4

Correlations between the influencing factors and CYGNSS effective reflectivity in different land cover categories. The \(P_{\mathrm{SM}}\), \(P_{\mathrm{SR}}\), \(P_{\mathrm{ST}}\), \(P_{\mathrm{VOD}}\), and \(P_{\mathrm{VWC}}\) represent the SM, SR, ST, VOD, and VWC, respectively

Due to the large number of grids in the world, the grids in the characteristic regions (i.e., Southeast China hills, Sahara Desert, Great Artesian Basin, Himalayas, Congo Basin, and Deccan Plateau.) are additionally used for further analysis. As shown in Table 2, the results demonstrate that the average correlation coefficients for \(P_{\mathrm{ST}}\), \(P_{\mathrm{VOD}}\), and \(P_{\mathrm{VWC}}\) are higher than those for \(P_{\mathrm{SR}}\) across all land cover categories. A higher correlation between \(P_{\mathrm{VWC}}\) and CYGNSS effective reflectivity is observed in the Himalayas and Deccan Plateau. Furthermore, the averaged correlation coefficient of \(P_{\mathrm{VWC}}\) is higher than that in the Sahara Desert region. Additionally, there are differences in the sensitivities of parameter \(P_{\mathrm{VWC}}\) and \(P_{\mathrm{VOD}}\). In the Sahara Desert, the average correlation coefficient of \(P_{\mathrm{VOD}}\) reaches 0.101, while that of \(P_{\mathrm{VWC}}\) is 0.018. The specific distributions of the models in these regions are shown in Figs. 5, 6, and 7. Although model 1 (R-T-V model) is widely distributed in Sahara Desert, other models are also identified as the optimal choice in certain grid areas. From these findings and those in Table 2, one can conclude that auxiliary parameters with a high correlation value cannot accurately compensate for CYGNSS effective reflectivity. For instance, \(P_{\mathrm{VWC}}\) exhibits a higher correlation with CYGNSS effective reflectivity than \(P_{\mathrm{VOD}}\), but the model 1 (R-T-V model) is still determined as the optimal model in the grids of Deccan Plateau. These results demonstrate that it is insufficient to rely solely on the correlation value between the auxiliary parameters and CYGNSS effective reflectivity to determine the most suitable auxiliary parameter for SM retrieval.

Table 2 Number of matching grids and the averaged correlation coefficient between CYGNSS effective reflectivity and the influencing factors in the characteristic regions
Fig. 5
figure 5

Distribution and models in the characteristic regions of Southeast China hills and Himalayas

Fig. 6
figure 6

Distribution and models in the characteristic regions of Sahara Desert and Congo Basin

Fig. 7
figure 7

Distribution and models in the characteristic regions of Great Artesian Basin and Deccan Plateau

Figure 8 illustrates that model 4 (R-T-W model) is the most accepted globally. The model 1 (R-T-V model) is predominantly for arid regions, such as the Arabian Peninsula and Sahara Desert, which are characterized by small surface fluctuations and sparse vegetation. According to Table 3, the number of models with \(P_{\mathrm{SR}}\) is one order of magnitude lower than that without \(P_{\mathrm{SR}}\). This result reflects the limitations in using the static variable \(P_{\mathrm{SR}}\) to compensate for CYGNSS effective reflectivity in global SM retrieval method in most regions. Therefore, the optimal model in each grid and the compensation parameters can be determined by a comprehensive comparison of multiple linear models. Moreover, this approach can avoid introducing heavy-loaded auxiliary parameters.

Fig. 8
figure 8

Specific distribution of the models used in global SM retrieval

Table 3 Number of the global grids for each model used

Global SM retrieval results

The global distribution of the Root Mean Square Error (RMSE) \(S_{\mathrm{RMSE}}\) and correlation coefficient \(R\) for the improved method is shown in Figs. 9 and 10. From the figures, notable global distinctions are observed in different regions. The \(R\) values in most land regions are greater than 0.6. The \(R\) values in the regions with small surface fluctuations and sparse vegetation are greater than 0.8. Furthermore, the \(S_{\mathrm{RMSE}}\) is generally less than 0.06 cm3/cm3 with lower values observed in most regions, such as Africa. One should also note that the \(R\) values of the Indian Peninsula is greater than 0.8, but the \(S_{\mathrm{RMSE}}\) is poorer compared with the regions with lower \(R\) values. Similarly, the performance of the SM retrievals in central Australia is better than that in the eastern regions surrounded by water and vegetation, which exhibit lower correlation values. Therefore, the coupling effect of water and vegetation can decrease the accuracy of SM retrieval.

Fig. 9
figure 9

Distribution of \(R\) for the improved method in global SM retrieval

Fig. 10
figure 10

Distribution of \(S_{\mathrm{RMSE}}\) for the improved method in global SM retrieval

From Fig. 11, the scattered points of the retrievals are mostly distributed along the diagonal line, with \(R\) = 0.923 and \(S_{\mathrm{RMSE}}\) = 0.040 cm3/cm3. Moreover, the fitting performance is better in the areas where the SM values are lower than 0.15 cm3/cm3. The results indicate that CYGNSS tends to underestimate the SMAP SM, especially in the regions with high SM values.

Fig. 11
figure 11

Density scatterplot, \(R\), and \(S_{\mathrm{RMSE}}\) of the soil moisture retrieval results using the improved method

The R-T-V model, which exhibits the best SM retrieval effect among the five linear models (see Table 4), is used as the reference. The improved method demonstrates a decrease in \(S_{\mathrm{RMSE}}\) and the Mean Absolute Error (MAE) \(S_{\mathrm{MAE}}\) of 9.1 and 7.1%, respectively, and an increase in \(R\) and the coefficient of determination \(R^{2}\) of 1.6 and 3.2%, respectively. As shown in Fig. 12, the improvement varies widely across the regions. From Fig. 8 and Fig. 12, the improvements in some regions are insignificant, such as northern Africa and the Arabian Peninsula. However, significant improvements are observed in some regions for the R-T-W model, such as in the Niger River Basin, where the \(S_{\mathrm{RMSE}}\) is increased by 30%. Except for the grids in arid regions, R-T-W model is the optimal for most grids. These results demonstrate that the compensation effect of introducing \(P_{\mathrm{ST}}\) and \(P_{\mathrm{VWC}}\) in these regions is better than that of other auxiliary parameters.

Table 4 Errors statistics of the five models and the improved method
Fig. 12
figure 12

The improvement percentages of \(S_{\mathrm{RMSE}}\) for the SM retrieval results compared with the R-T-V model

Comparison between the retrievals in different land cover categories

To analyze the specific impacts of SM retrievals in different land cover categories, the average values of \(R\) and \(S_{\mathrm{RMSE}}\) are used as shown in Fig. 13. Note that there is no effective data in land cover categories 3 and 15. From Fig. 13, the \(R\) and \(S_{\mathrm{RMSE}}\) are different in different land cover categories. Moreover, the performance of the improved method is better than the R-T-V model, with the lowest \(S_{\mathrm{RMSE}}\) = 0.024 cm3/cm3 observed in land cover 16 (the Barren or sparsely vegetated region). The improvement of \(S_{\mathrm{RMSE}}\) for SM retrievals is obvious in the land cover categories 4, 5, 8, 9, and 14. The \(S_{\mathrm{RMSE}}\) in vegetated areas are higher than those in the bare soil areas. However, introduction of vegetation parameters in the models for compensation has a little improvement in the SM retrieval performances for land cover categories 1, 2, and 6. The results show that the R-T-V model performs better in dense vegetation regions.

Fig. 13
figure 13

Performance of \(R\) and \(S_{\mathrm{RMSE}}\) in each land cover category. The I-M and R-T-V are the improved method and the R-T-V model, respectively

These results demonstrate that the proposed method not only maintains a good SM retrieval performance in the regions with small surface fluctuations or sparse vegetation, but also enhances retrieval performance in the regions with large surface fluctuations or dense vegetation. One can conclude that the proposed method can reduce the errors caused by geographical disparities.

Field validation with ISMN

In this section, 19 sites derived from five networks (ARM, OZNET, SCAN, TxSON, and USCRN) within ISMN are used for field verification. The estimated SMs from the nearest grid to the site are used for analysis. The performance of the improved method and R-T-V model in 19 ISMN sites are listed in Table 5. The main land cover categories provided by SMAP at the sites are 7, 8, 10, 12, 13, 14, and 16. Some sites in certain land cover categories are not analyzed due to the relatively scattered global distribution of ISMN stations and the removal of some data. From Table 5, the performance of the improved method at all sites is better than that of the R-T-V model with the \(R\) increased by 21.0%, the \(S_{\mathrm{RMSE}}\) decreased by 6.9%, and the unbiased Root Mean Square Error (ubRMSE) \(S_{\mathrm{ubRMSE}}\) decreased 11.1%. Furthermore, there are numerous sites with the same precision indexes in both models, such as the Eulo, Kemole_Gulch, and Bodega_6_WSW sites listed in Table 5. The reason can be the optimal model in these sites is model 1 (R-T-V model). Therefore, the comparisons of these sites are not listed here.

Table 5 Performance of the improved method and R-T-V model in 19 ISMN sites

The data for each site are divided into different categories according to the surface classification provided by SMAP. As presented in Fig. 14f, the field data and global SM retrieval results show good consistency, with \(R\) = 0.80 and \(S_{\mathrm{RMSE}}\) = 0.064 cm3/cm3, respectively. From the scatter distribution of each land cover category, the dispersion of the improved method is closer to the 1:1 line. The improvements in each site are illustrated in Figs. 15 and 16. Compared to the R-T-V model, the SM retrieval performance of the improved method is better, with the \(R\) being increased from 2.9% to 92.0%, the \(S_{\mathrm{RMSE}}\) being decreased from 1.0% to 25.0%, and the \(S_{\mathrm{ubRMSE}}\) being decreased from 1.1% to 25.0%.

Fig. 14
figure 14

Scatter distribution in different land cover categories (7, 8, 10, 12, and 14) for the improved method and the R-T-V model. The I-M represents the improved method

Fig. 15
figure 15

The \(R\) of the improved method and the R-T-V model at 16 sites. The I-M represents the improved method

Fig. 16
figure 16

The \(S_{\mathrm{RMSE}}\) of the improved method and the R-T-V model at 16 sites. The I-M represents the improved method

From Fig. 17, there are different improvements in the vegetation regions. The average \(R\) is increased by 22.7%, and the average \(S_{\mathrm{RMSE}}\) is decreased by 8.7%. Moreover, the \(R\) values of some sites, such as the Yankee_Reservoir, Asheville_8_SSW, and Asheville_13_S, exhibit larger increases. Combining the CYGNSS observation area and its surrounding environment sites exhibiting low correlation reveals that these sites are close to water bodies or vegetation. The Pawhuska, Lovell_Summit, and Asheville_8_SSW sites are in densely vegetated areas, whereas the Kemole_Gulch and Batesville_8_WNW sites are on islands and cropland, respectively. Notably, as the CYGNSS observation areas of these sites fall within the EASE 36 km × 36 km grid, their surrounding areas include vegetation, water bodies, and other environmental features close to the sites. Therefore, the CYGNSS effective reflectivity of these sites is influenced by surrounding environments. Furthermore, the sudden precipitation or paddy irrigation near the site can also result in low correlation. The \(S_{\mathrm{RMSE}}\) of the five sites (Watkinsville_#1, WTARS, Asheville_8_SSW, Asheville_13_S, and Batesville_8_WNW) are decreased by more than 8.7%, with the largest value of 25% for Watkinsville_#1 site. Compared with R-T-V method in global SM retrieval, the improved method can better retrieve SM in the regions with complex surface conditions.

Fig. 17
figure 17

Improvements of \(R\) and \(S_{\mathrm{RMSE}}\) for the improved method compared with the R-T-V model

Discussion

In this paper, a gridded SM retrieval method considering geographical differences is proposed. This method compensates for the attenuation of the CYGNSS effective reflectivity using the auxiliary parameters provided by SMAP. However, the possible uncertainty of this method is related to several factors. The first is the uncertainties and internal errors in the used auxiliary data. As the current SM estimation, which relies entirely on CYGNSS data, has not been implemented, the utilization of auxiliary data can only be minimized to maintain accuracy. Introducing more auxiliary data will decrease the robustness and stability of the model. Furthermore, the comparison results of the combined models indicate that the same type of data has different compensation effects on the reflectivity.

The second is the influence of seasonal variation in SM, vegetation, and other factors. Seasonal variations can impact the surface reflectivity derived from spaceborne GNSS-R, which limits the accuracy of SM retrieval. According to the results of Fig. 18, in addition to the regions with low SM values (such as the Sahara Desert), there are obvious differences in CYGNSS SM in different seasons, especially in vegetated regions. The performance of SM retrieval is closely related to the SM variation, as depicted by the red boxes in Figs. 18 and 19. The performance of the SM retrieval method decreased gradually when the value of SM increases due to seasonal variation. Furthermore, the seasonal variations in the environmental factors such as vegetation and soil temperature may also have an impact on the SM retrieval (Colliander et al., 2019a, 2019b; Jin et al., 2024).

Fig. 18
figure 18

The spatial maps of mean CYGNSS soil moisture in four seasons (spring, summer, autumn, and winter)

Fig. 19
figure 19

The spatial maps of \(S_{\mathrm{RMSE}}\) for the improved method in four seasons (spring, summer, autumn, and winter)

In addition, the bias in SM retrieval can originate from spatial scale differences among the various data sources. The inconsistency between the depth measured by the SM at in-situ sites and the penetration depth of the microwaves can also lead to biases.

Conclusion

To address the limitations of land surface complexity and over-reliance problems, a method for global SM retrieval that considers geographical disparities is developed. The CYGNSS data and auxiliary parameters of \(P_{\mathrm{SR}}\), \(P_{\mathrm{VOD}}\), \(P_{\mathrm{ST}}\), and \(P_{\mathrm{VWC}}\) provided by the SMAP are used to develop an improved method. The SMAP and ISMN SM are used as references. Additionally, the sensitivities of the introduced auxiliary parameters to CYGNSS effective reflectivity in different land cover categories and characteristic regions are presented.

The improved method consists of five linear models to consider the influence of geographical disparity on the CYGNSS effective reflectivity and avoid redundant auxiliary data. Based on the performance indicator, the optimal model in each grid is determined. After determining the optimal model in each grid, the SM retrieval is investigated. The results show that the improved method can provide a good retrieval effect in both global and local scales. The global SM retrieval results demonstrate that the performance of the improved method is better than the R-T-V method, with the correlation coefficient \(R\) being increased from 0.908 to 0.923 and the \(S_{\mathrm{RMSE}}\) being decreased from 0.044 to 0.040 cm3/cm3, respectively. The performance of the improved method in local regions is also better than the R-T-V method, with the lowest \(S_{\mathrm{RMSE}}\) of 0.024 cm3/cm3 in the Barren or sparsely vegetated region. Furthermore, the results in different land cover categories reveal that the performance can be maintained in the area with small surface fluctuation and sparse vegetation, and the performance can be improved in the area with large surface fluctuation and dense vegetation, among which the Niger River Basin has the largest increase of \(S_{\mathrm{RMSE}}\), reaching 30%. In the field validation of ISMN, the overall \(R\) and \(S_{\mathrm{RMSE}}\) are 0.80 and 0.064 cm3/cm3, respectively. The average \(S_{\mathrm{RMSE}}\) of chosen sites is decreased by 8.7%.

The SM retrieval results indicate that the improved method can obtain better SM retrieval results in both global and local scales without redundant auxiliary data. Moreover, the findings of this paper can contribute to a novel way that considers the impact of geographical disparity for global and local SM retrieval. Additionally, the coupling physical mechanism of multiple factors needs to be further analyzed in future studies.

Availability of data and materials

The CYGNSS observations can be accessed at https://podaac.jpl.nasa.gov. The SMAP data are available at https://nsidc.org/data/smap/smap-data.html. The ISMN data are obtained from https://ismn.earth/en/.

References

Download references

Acknowledgements

The authors would like to thank the CYGNSS, SMAP, and ISMN teams for providing experimental data.

Funding

The research is supported by Natural Science and Technology Planning Foundation of Guangxi (guikeAD23026257), the National Natural Science Foundation of China (42064002 and 42074029), and the “Ba Gui Scholars” program of the provincial government of Guangxi.

Author information

Authors and Affiliations

Authors

Contributions

LH, AP, and FC conceptualization, methodology and formal analysis; LH and AP and FC software; LH, AP, and FC validation; LL investigation; LL and FG resources; FG and HL data curation; LH, AP, and FC writing—original draft preparation; LH and FC writing—review and editing; LH and FC funding acquisition. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Fade Chen.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, L., Pan, A., Chen, F. et al. A novel global grid model for soil moisture retrieval considering geographical disparity in spaceborne GNSS-R. Satell Navig 5, 29 (2024). https://doi.org/10.1186/s43020-024-00150-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s43020-024-00150-9

Keywords