Inversion of surface vegetation water content based on GNSS-IR and MODIS data fusion

Obtaining high-precision, long-term sequences of vegetation water content (VWC) is of great significance for assessing surface vegetation growth, soil moisture, and fire risk. In recent years, the global navigation satellite system-interferometric reflection (GNSS-IR) has become a new type of remote sensing technology with low cost, all-weather capability, and a high temporal resolution. It has been widely used in the fields of snow depth, sea level, soil moisture content, and vegetation water content. The normalized microwave reflectance index (NMRI) based on GNSS-IR technology has been proven to be effective in monitoring changes in VWC. This paper considers the advantages and disadvantages of remote sensing technology and GNSS-IR technology in estimating VWC. A pointsurface fusion method of GNSS-IR and MODIS data based on the GA–BP neural network is proposed to improve the accuracy of VWC estimation. The vegetation index products (NDVI, GPP, LAI) and the NMRI that unified the temporal and spatial resolution were used as the input and output data of the training model, and the GA–BP neural network was used for training and modeling. Finally, a spatially continuous NMRI product was generated. Taking a particular area of the United States as a research object, experiments show that (1) a neural network can realize the effective fusion of GNSS-IR and MODIS products. By comparing the GA–BP neural network, BP neural network, and multiple linear regression (MLR), the three models fusion effect. The results show that the GA–BP neural network has the best modeling effect, and the r and RMSE between the model estimation result and the reference value are 0.778 and 0.0332, respectively; this network is followed by the BP neural network, in which the r and RMSE are 0.746 and 0.0465, respectively. MLR has the poorest effect, with r and RMSE values of 0.500 and 0.0516, respectively. (2) The spatiotemporal variation in the 16 days/500 m resolution NMRI product obtained by GA–BP neural network fusion is consistent with that in the experimental area. Through the testing of GNSS stations that did not participate in the modeling, the r between the estimated value of the NMRI and the reference value is greater than 0.87, and the RMSE is less than 0.049. Therefore, the method proposed in this paper is optional and effective. The spatially continuous NMRI products obtained by fusion can reflect the changes in VWC in the experimental area more intuitively.


Introduction
Surface vegetation is one of the critical components of terrestrial ecosystems, and the water content in the vegetation canopy is approximately 40-80% (Elvidge 1990), playing an essential role in soil formation and environmental changes. Vegetation water content (VWC) is one of the main factors controlling plant photosynthesis, respiration, primary productivity, and biomass. It plays an essential role in plant physiological status, vegetation function, drought, and fire risk assessment (Peñuelas et al. 1993). Because a plant is covered with soil, its water content will affect the monitoring of soil moisture, and the correct estimation of VWC can improve the accuracy

Open Access
Satellite Navigation https://satellite-navigation.springeropen.com/ *Correspondence: renchao@glut.edu.cn 1 College of Geomatics and Geoinformation, Guilin University of Technology, Guilin 541006, China Full list of author information is available at the end of the article of soil moisture inversion (Jin et al. 2017b). Therefore, obtaining high-precision, long-term sequences of VWC is of considerable significance to the study of surface vegetation and soil moisture.
In 1980, the European Space Agency (ESA) first proposed that GPS L-band signals could be used as ocean scatterometers (Hall 1988). Subsequently, in 1993, ESA's Martin-Neria first introduced the concept of PARIS and, for the first time, used GPS-reflected signals to achieve ocean height measurements (Martin-Neira 1993). Since then, foreign scholars have realized that GPS-reflected signals might serve as a new remote sensing method (Liu et al. 2007). Therefore, two new GNSS remote sensing technologies have been proposed: Global Navigation Satellite System-Reflection (GNSS-R) remote sensing technology and Global Navigation Satellite System-Interferometric Reflection (GNSS-IR) remote sensing technology. However, GNSS-R technology needs to receive both direct and reflected signals from the satellite at the same time. Therefore, it requires an instrument with both a left-handed circularly polarized (LHCP) antenna and a right-handed circularly polarized (RHCP) antenna. Therefore, GNSS-R technology has higher requirements for hardware and a higher cost; thus, it has limited its development and promotion to a certain extent. GNSS-IR technology requires only an ordinary geodetic receiver to perform experiments and has the characteristics of being low cost, useful in all weather, high accuracy, and high spatiotemporal resolution.
GNSS-IR technology was first proposed by Professor Larson of the University of Colorado. Larson and others separated the direct and reflected components from the GPS signal-to-noise ratio (SNR) observations and studied the reflected components and the surrounding environment of the station. The results show that the amplitude of the reflected components can reflect the overall change in the surrounding soil moisture (Larson et al. 2008a). Subsequent research found that the reflected signal could well reflect the evolution of soil moisture within 5 cm of the soil surface. There is a certain correlation between the reflected signal phase, satellite height angle change, and soil moisture content; however, when the soil moisture content is less than 10%, the correlation weakens (Larson et al. 2008b. Considering that GNSS-IR technology can realize the estimation of surface environmental parameters, Larson and others extended GNSS-IR technology to snow depth monitoring and provided a series of results for the development of GNSS-IR technology (Larson et al. 2009;Larson and Nievinski 2013;Larson 2016). Since then, GNSS-IR technology has been widely used to monitor changes in the surface environment, such as soil moisture (Larson 2016;Liang et al. 2019;Ren et al. 2019), sea level Jin et al. 2017a), and snow depth (Zhang et al. 2017(Zhang et al. , 2018. In terms of vegetation detection, Small and others used the GPS noise statistic MP 1 root mean square (RMS) for the first time to qualitatively estimate the growth of plants. They noted that the SNR data in the reflected signal could reflect the growth of vegetation ). Chew and others used a model for soil moisture inversion to quantitatively analyze VWC and SNR and the actual reflection surface height. The results show that when VWC does not exceed 1.5 kg/m 2 , the vegetation water content and SNR amplitude have a linear relationship (Chew et al. 2014). Chen and others proposed a method based on the amplitude and frequency analysis of the interference pattern of the SNR to eliminate the influence of VWC in soil moisture retrieval and achieved excellent results (Chen et al. 2016). In terms of VWC, Larson and others defined a daily VWC metric based on the amplitude of the reflected signal based on the relationship between the SNR amplitude and VWC, which is the normalized microwave reflection index (NMRI) . In 2004, verification was performed at four grassland sites in Montana, and the results showed that under the conditions of similar vegetation and climate, the NMRI and VWC have a strong correlation, and the VWC can be more accurately retrieved by the NMRI . At present, the Plate Boundary Observatory (PBO) in the United States provides daily NMRI data. However, because the NMRI data are based on GPS stations, only VWC changes within 1000 m 2 around the station are monitored. Additionally, the interval between GNSS stations of the PBO observation network is large, and spatial continuity cannot be achieved. Therefore, the application of the NMRI product is further limited.
In recent years, with the rapid development of remote sensing technology and imaging spectroscopy technology, it is relatively easy to obtain large-scale and long-term VWC data using remote sensing technology. Due to the different sensors, the current remote sensing technology can be divided into optical remote sensing and microwave remote sensing. Among them, the normalized difference vegetation index (NDVI), leaf area index (LAI), gross primary productivity (GPP) and other vegetation indexes provided by MODIS satellites belonging to optical remote sensing are widely used in VWC research (Tucker et al. 2005). Although these vegetation indexes have a high spatial resolution, due to the defects of optical sensors, the vegetation index images are easily affected by clouds and smog, resulting in the loss of information. Additionally, the NDVI is primarily regarded as an indicator of vegetation greenness. These indexes are used to infer the biomass and LAI and other vegetation indexes (Gutman and Ignatov 1998;Paruelo et al. 1997;Wylie et al. 2002). Because environmental factors such as plant type and hydrometeorology have different effects on VWC and "greenness, " the NDVI cannot be used to accurately estimate the VWC. At present, microwave remote sensing has also been used to successfully estimate VWC (Brakke et al. 1981;Owe et al. 2001). Although microwave remote sensing is not affected by clouds and fog, its spatial resolution is low. Some scholars have proposed a method of fusing the effects of optical remote sensing images and microwave remote sensing (Dasgupta and Qu 2006). However, due to the significant difference in the spatial resolution between the two, the accuracy and spatial resolution of the fusion results are weak in practical applications. Therefore, using remote sensing technology to estimate VWC has certain flaws.
At present, how to make full use of multisource data to achieve the fusion of multisource data has become a research hotspot. Multisource data fusion has a complex nonlinear relationship, and machine learning technology can better solve nonlinear problems. Therefore, many scholars have conducted multisource data fusion and research using machine learning technology and have achieved good results (Li et al. 2017;Xu et al. 2018;Yuan et al. 2020). In summary, the NMRI products based on GNSS-IR can better reflect the changes in VWC. The vegetation index products based on remote sensing technology have the characteristics of being large-scale and having a long-term series. In this paper, a fusion inversion method of GNSS-IR and remote sensing data based on the GA-BP neural network is proposed. The spatial resolution of optical remote sensing is better than that of microwave remote sensing. Therefore, this article selects MODIS vegetation index products for experiments. The GA-BP neural network is used to achieve effective fusion of the two and fully exploit their advantages. Finally, a spatially continuous NMRI product map is established. To compensate for the limitations of the original NMRI product space, this product is used to better estimate VWC.

GNSS-IR and NMRI principles
The core observational measurement of GNSS-IR is SNR data. SNR is a composite signal of direct component A d and reflected component A r . The commonly used receivers are currently designed to be right-handed circularly polarized, resulting in A d ≫ A r . Because A d is much larger than A r , the direct signal A d is a trend term. It determines the overall change in SNR. The reflected signal A d appears as a local periodic fluctuation. Because A d and A r differ significantly, they can be separated by using low-order polynomials. There is a fixed frequency sine (cosine) function relationship between the separated reflection components A r and sin(θ) (Chew et al. 2013). The reflection component can be expressed as follows: In the formula, θ, λ, and h represent the satellite height angle, carrier wavelength, and vertical distance from the antenna phase center to the reflection point, respectively, and A and φ represent the amplitude and phase of the reflection component, respectively. If t = sin(θ) and f = 2 h/λ, then Equation (1) can be expressed as follows: The NMRI is a comprehensive index used to evaluate the amplitude change of the reflected signal. The core is to calculate the RMS of the pseudo-range multipath index MP 1 on the L1 carrier. MP1 is defined as (Estey and Meertens 1999): In the equation, P 1 is the pseudo-range observation on the L 1 carrier; f 1 and f 2 are the carrier frequencies of L 1 and L 2 , f 1 = 1575.42 MHz, f 2 = 1227.6 MHz; λ 1 and λ 2 are the carrier wavelengths of L 1 and L 2 , λ 1 = 0.19 m, and λ 2 = 0.24 m; φ 1 and φ 2 are the L 1 and L 2 carrier phase observations. The calculation of the NMRI is based on the RMS value of MP 1 , and its calculation method is as follows: In the equation, RMS MP 1 is the RMS value of MP 1 on a single day, and max RMS MP 1 is the average value of RMS MP 1 in the top 5% after the RMS MP 1 values are ranked from large to small.

GA-BP neural network
Considering the time and space complexity of GNSS-IR and MODIS data, it is difficult to achieve effective fusion using only linear methods. Therefore, this article attempts to use the artificial neural network (ANN) of the typical BP neural network model for experimental analysis. The BP neural network was proposed by McClelland and Rumelhart (Rummelhart et al. 1986;Mcclland et al. 1986). It is a multilayer feedforward neural network that can better handle nonlinear problems. Its main structure is composed of an input layer, a hidden layer, and an output layer. Each layer is formed by several neurons, and the output value of each neuron is determined by the input value, action function, and threshold. The learning process of the BP neural network includes two processes: forward information propagation and error backward propagation. In the forward propagation process, the input information is transmitted from the input layer through the hidden layer to the output layer, and the output value and the expected value are compared. If there is an error, the error is propagated back along the original connection path, and the weights of the neurons in each layer are modified layer by layer to reduce the error. This cycle is repeated until the output results meet the accuracy requirements (Alpaydin 2004). The BP neural network topology is shown in Fig. 1. In Fig. 1, x 1 , x 2 , …, x n is the input value of the BP neural network; y m is the output value of the neural network; w ij is the connection weight of input layer neurons and hidden layer neurons; w jk is the connection weight of hidden layer neurons and output layer neurons.
Although the BP neural network can better address nonlinear problems, it still has flaws (Srinivas and Deb 1994). The first is related to the BP neural network modeling process. To reduce network errors and improve accuracy, the number of hidden layer neurons must be appropriately selected. However, there is no specific calculation method for the optimal number of hidden layer neurons. Second, the initial weights and thresholds of the BP neural network are randomly generated. Each node and weight of the BP neural network will affect the output. Therefore, the adaptation process and the global approximation process are time-consuming, resulting in a slow convergence of the network. Finally, the BP neural network uses a gradient descent method, which has local optimization problems.
To solve the shortcomings of the above BP neural network, this paper uses the genetic algorithm (GA) to optimize the BP neural network. The GA is a parallel random search optimization method formed by simulating genetic mechanisms and biological evolution in nature (Holland 1975). This method is based on the sample fitness function and selects, intersects, and mutates the initial population to guide learning and determine the search direction. Because it uses the population to search, it can use random methods to find the optimal solution in multiple regions of the global solution space, making it particularly suitable for large-scale parallel processing (Goldberg 1989). Considering the defects of the BP neural network and the advantages of the GA, the two are combined to construct the GA-BP neural network algorithm. First, the GA is used to optimize the weights and thresholds of the BP network to reduce the range of weights and thresholds. Subsequently, the optimized weights and thresholds are inputted to the BP neural network and solved accurately to improve the training accuracy and speed of the BP neural network. The GA-BP model algorithm flow is shown in Fig. 2.

Methodology
The methodologies of GNSS-IR and MODIS fusion inversion based on the GA-BP neural network are as follows: "GA-BP neural network" section to train the GA-BP model. 5. Model testing and accuracy inspection. Extract the NDVI, GPP, LAI, latitude and longitude of each pixel in the vegetation index image of the predicted date. These data are input into the trained model, and finally, a spatially continuous 16 days/500 m resolution NMRI product is generated. Finally, the accuracy test and evaluation are carried out through GNSS stations that are not involved in modeling.
The inversion process is shown in Fig. 3.

Study area
PBO is currently the only observing network based on the GNSS-IR principle. This network is based on GNSS-IR technology, and the daily estimates of vegetation moisture content, soil moisture, and snow depth are widely used in GNSS-IR technology research. Figure 4 is a schematic diagram of GNSS stations deployed by PBO in the western United States (30° N-50° N, 95° W-125° W). GNSS-IR and MODIS products have significant differences in temporal and spatial resolution. Therefore, it is considered that if the distance between GNSS stations is large and the distribution of the stations is sparse, it may result in poor local modeling results. The terrain around the station, regional climate change, and other factors may affect the effectiveness of the model. Therefore, this paper considers the above three factors comprehensively and selects the blue area (39° N-44.5° N, 111° W-114° W) as the experimental area. The GNSS station distribution in this area is relatively uniform, the terrain is diverse, and climate change is significant, which can better train the model. The DEM map and Landsat image of this area are shown in Fig. 5.   Figure 5b shows that there are 37 stations in this area, with different distribution locations and densities. Among them, 34 circles represent the stations involved in modeling. Three triangles (P112, P354, P359) represent the stations that were not included in the modeling, as they were used as later verification data. With reference to Figs. 4 and 5 and analysis of existing data, the upper half of the study area is southeast Idaho, and the lower half is northern Utah; Idaho and Utah both have a temperate continental climate. Because it is far from the ocean and blocked by the terrain, the humid air mass is difficult to reach, so it is dry and less rainy. It has the characteristics of hot and humid summers and cold and dry winters. Among them, the blue box in Fig. 5b is the famous Great Salt Lake Desert in the United States, and the red box is the Great Salt Lake. The area contains more minerals, so the vegetation is thinner. Therefore, the VWC of the region selected in this paper will have significant differences in time and space to verify the accuracy of the inversion results later.

Results and analysis
In this paper, the correlation coefficients (r) of the NMRI, NDVI, GPP, and LAI vegetation indexes from 37 stations in the experimental area from July 28, 2010, to October 16, 2010, are calculated using the linear regression principle, and the results are shown in Fig. 6. It can be seen from Fig. 6 that there are 30 stations whose correlation coefficient between the NMRI and NDVI exceeded 0.6, accounting for 81% of the total stations. There are 29 stations whose correlation coefficient between the NMRI and GPP exceeded 0.6, accounting for 78% of the total stations. There are 25 stations with a correlation coefficient between the NMRI and LAI greater than 0.6, accounting for 68% of the total stations. The NMRI, NDVI, GPP, and LAI all have strong correlations, and the selected NDVI, GPP, and LAI can participate in modeling.
To better verify the modeling effect of the GA-BP model, this paper compares the GA-BP neural network, BP neural network, and multivariable linear regression model (MLR). First, determine the number of hidden neurons in the model according to the empirical formula m = √ n + l + a . In the formula, m is the number of hidden layer nodes; n and l are the input layer and output layer nodes, respectively; and a is a positive integer between 1 and 10. Because the number of neurons in the input layer is five (i.e., the NDVI, GPP, LAI, Lon, and Lat) and the number of neurons in the output layer is one (i.e., the NMRI), the value range of the neurons in the hidden layer is m = [3,14]. Subsequently, this paper uses a cross-validation method that is randomly run 500 times to select the best number of hidden layers. The results are shown in Table 2. Table 2 shows that the nine hidden layer neurons appear most often. The average root mean square error (RMSE) value of the test at this time is relatively small. Subsequently, this paper randomly shuffles the dataset established in step 3 and divides it into training and test sets and inputs three models at the same time, running 50 times in a loop. Compare the modeling effects of the three models by testing whether the r and RMSE of the set are less than the threshold. The results are shown in Table 3.
From the table above, we can see that, after using the three models of BP, GA-BP, and MLR, the number of r values of the test set that are greater than 0.6 is 22, 23, and 13, respectively; the number of RMSEs in the test set that are less than the threshold is 35, 35, and 34, respectively. Based on the analysis of the numbers of r and RMSE that are less than the threshold, we can see that both the BP neural network and the GA-BP neural network are better than the MRL result, and the GA-BP neural network is slightly better than the BP neural network. To further compare the modeling effects of the three models, this paper uses K-fold cross-validation to further analyze the models. First, a part of the data is separated from the dataset established in step 3 as a test set, and the remaining data are divided into ten parts as a training set. Subsequently, the training model was repeated ten times, nine training datasets were selected for training each time, and the remaining dataset was used for validation. The test data were used to test the model effect. Finally, the test results obtained by running the three models ten times were averaged. The results are shown in Fig. 7.
We can see from Fig. 7 that, among the three models, the GA-BP neural network has the best modeling effect, and the r and RMSE values of the estimated results are 77.8% and 0.0332, respectively. Following the BP neural network, the estimated r and RMSE values are 74.6% and 0.0465, respectively. The MLR model has the worst modeling effect, with r and RMSE estimated at 50% and 0.0516, respectively. This result is consistent with the results shown in Table 3. Based on the above results, both the BP neural network and the GA-BP neural network are relatively stable and better than the MLR results. However, GA-BP neural network modeling works better. Therefore, the GA-BP neural network adopted in this paper is used for modeling. In this paper, the established dataset is randomly divided into 70%, 15%, and 15%, which are used as the training, confirmation, and test sets, respectively. According to the algorithm flow of Fig. 2, the GA-BP model is trained, and the model with the best training effect is selected after multiple pieces of training. Figure 8 shows the change in the reciprocal of sum-squared errors of the genetic algorithm and the change in the fitness function. The model training accuracy is shown in Fig. 9.
It can be seen from Fig. 8 that when the number of iterations is less than 10, the sum-squared errors and fitness functions tend to be stable. The GA finds the gene chain with the smallest error energy as much as possible at the global level, reducing the possibility that the simple BP neural network is easy to oscillate and does not converge (or locally converge). It can be seen from Fig. 9 that the r values of the training and testing parts have reached 0.86 and 0.64, respectively, and the overall modeling accuracy has reached 0.79. The test part RMSE was calculated as 0.031. Therefore, the trained model is feasible and effective. Thus, using the model trained above to perform inversion, six spatially continuous 16 days/500 m resolution NMRI product maps were obtained. Figure 10 shows six spatially continuous NMRI product maps. Figure 11 shows the NDVI, GPP, and LAI images on July 28 and October 16.
By comparing the NDVI, GPP, and LAI images at the same time, it was found that the spatial distribution of the NMRI in the test area was relatively consistent with the NDVI, GPP, and LAI. From the spatial distribution of the NMRI in Fig. 10, it can be seen that the NMRI values in the eastern and northwestern mountains of the experimental area are high, and the NMRI values in the Great Salt Lake and Great Salt Lake Desert areas in the southwest are low. This result is consistent with the geographical characteristics of the experimental area described in "Study area" section. From the time distribution, comparing Figs. 10 and 11, we found that as the weather became colder, the vegetation growth in most areas of the experimental area decreased significantly, and the water content of the vegetation decreased significantly. This change agreed with the climate characteristics of cold and dry winters in the experimental area. Through    Fig. 12. It can be seen from Fig. 12 that the r between the inversion value and the actual value of the three stations is higher than 0.87, which shows that there is a strong correlation between the inversion results and the actual value; for the RMSE, P359 is the largest among the three stations, reaching 0.049. The other two stations are less than 0.03; for MAX, P359 is the largest among the three stations, reaching 0.082, and the other two stations are less than 0.046. It can be seen that the errors obtained at the three stations are small, and there are no gross errors; thus, the inversion results obtained are valid and accurate.
It can be seen from Figs. 10 and 11 that there are still some inconsistencies in the generated spatially continuous NMRI map. For example, in the southwestern part of the map, this region has a low resolution of vegetation moisture content, and the NMRI value is relatively close, which cannot accurately reflect the changes in vegetation moisture content in this region. In the images in Fig. 10e, f, it is obvious that there is an overestimation of the vegetation water content in this area. Combining Fig. 5b, we see there is no PBO H2O station in this area. Therefore, this may be because no GNSS stations from this area are included in modeling. Ultimately, the effect and accuracy of local modeling were affected, resulting in an abnormal NMRI value in this area. Therefore, there may be a large relationship between the fusion effect and the distribution of GNSS stations. For regions with GNSS distribution, it is feasible and effective to use GA-BP neural network to establish a point-plane fusion model to finally obtain a spatially continuous NMRI product.

Conclusion and future research
The accurate and long-term monitoring of VWC is of great significance to environmental scientific research. Given the current limitations of using GNSS-IR technology and remote sensing technology to monitor VWC changes. Based on the idea of multisource data fusion, this paper proposes a method of point-surface fusion using neural networks to integrate MODIS three vegetation indexes: the NDVI, GPP, LAI, and NMRI products based on GNSS-IR. Theoretical analysis and experimental results show that (1) the NDVI, GPP, and LAI all have strong correlations with the NMRI.
(2) We compared the effect of point-surface fusion of three models: the GA-BP neural network, BP neural network, and MLR. The results show that the GA-BP neural network has the best modeling effect, and the r and RMSE values between the model estimation result and the reference value are 0.778 and 0.0332, respectively; it was followed by the BP neural network, with r and RMSE values of 0.746 and 0.0465, respectively; MLR had the poorest effect, with r and RMSE values of 0.500 and 0.0516, respectively. Therefore, the GA-BP neural network can be used to achieve a productive fusion of GNSS-IR and MODIS data. (3) The spatiotemporal variation in the 16 days/500 m resolution NMRI product obtained by point-surface fusion is consistent with that in the experimental area. The estimated results of the six spatially continuous NMRI products were extracted and compared with the reference values. The r values of both were higher than 0.87. The RMSE values were less than 0.049, and the MAX values were less than 0.082. In summary, this paper proposes that the GNSS-IR and MODIS point-surface fusion model based on the GA-BP neural network is feasible and effective, and the resulting spatially continuous NMRI products can be used to better, and more intuitively, represent the VWC changes in the region. It was found through experiments that the distribution distance between GNSS stations in the experimental area had a great impact on the modeling results. If the number of GNSS stations in the study area is small or the distance between the stations is considerable, it will affect the modeling effect of the local area. Therefore, to further verify the method of this paper, the impact of modeling after encrypting GNSS stations by the technique of spatial interpolation will be discussed in depth later. In addition, more VWC-related remote sensing products will be incorporated for modeling.