 Original Article
 Open Access
 Published:
M2CGVIO: motion manifold constraint aided GNSSvisualinertial odometry for ground vehicles
Satellite Navigation volumeÂ 4, ArticleÂ number:Â 13 (2023)
Abstract
VisualInertial Odometry (VIO) has been developed from Simultaneous Localization and Mapping (SLAM) as a lowcost and versatile sensor fusion approach and attracted increasing attention in ground vehicle positioning. However, VIOs usually have the degraded performance in challenging environments and degenerated motion scenarios. In this paper, we propose a ground vehiclebased VIO algorithm based on the MultiState Constraint Kalman Filter (MSCKF) framework. Based on a unified motion manifold assumption, we derive the measurement model of manifold constraints, including velocity, rotation, and translation constraints. Then we present a robust filterbased algorithm dedicated to ground vehicles, whose key is the realtime manifold noise estimation and adaptive measurement update. Besides, GNSS position measurements are loosely coupled into our approach, where the transformation between GNSS and VIO frame is optimized online. Finally, we theoretically analyze the system observability matrix and observability measures. Our algorithm is tested on both the simulation test and public datasets including Brno Urban dataset and Kaist Urban dataset. We compare the performance of our algorithm with classical VIO algorithms (MSCKF, VINSMono, RVIO, ORB_SLAM3) and GVIO algorithms (GNSSMSCKF, VINSFusion). The results demonstrate that our algorithm is more robust than other compared algorithms, showing a competitive position accuracy and computational efficiency.
Introduction
Accurate pose estimations are essential for abundant robotic applications, such as autonomous driving (Xiong et al., 2021), human navigation (Li et al., 2020), unmanned drone delivery, automatic inspections, etc. Usually, vehicles can be positioned by carrying the Global Navigation Satellite System (GNSS) module or LiDARs which are also widely applied in robot navigation as useful range sensors (NÃ¼chter et al., 2007). Multilayer LiDARs are heavy and expensive among many sensors. In comparison, VisualInertial Odometry (VIO) is more popular because it uses a small and lightweight sensor package and works well in environments where GNSS signals are rejected (Sun et al., 2018).
Although VIOs generally achieve high accuracy in indoor environments, achieving the same good performance for ground vehicles in outdoor environments like urban areas is difficult. Firstly, outdoor environments typically lack reliable features, and rapidly moving vehicles can present a significant challenge for feature matching. Secondly, considerable noise can be generated while driving due to the groundâ€™s unevenness and the vehicleâ€™s vibration. In addition, the restricted motion on the ground may suffer from the additional unobservable Degree of Freedom (DOF) (Wu et al., 2017). To address the inadequacy of visual and inertial sensors in the ground vehicle environment, additional observations are required to constrain the vehicle state and reduce the divergence of positioning error.
Integrating the GNSS sensor is an alternative sensor fusion scheme for eliminating the accumulated errors (Li et al., 2021). Utilizing the derived positions from GNSS, GNSSVIO can become fully observable and realize a global driftfree localization. However, the integration with the additional sensor may increase the complexity of the navigation system, especially for a graph optimizationbased framework (Gong et al., 2020; Cioffi & Scaramuzza, 2020). Some Kalman filterbased methods like MultiState Constraint Kalman Filter (MSCKF) (Mourikis et al., 2007) have demonstrated both precision and computational efficiency, thus we focus a filterbased scheme for sensor fusion.
The kinematic constraint is another effective auxiliary update information for improving error estimation based on the fact that the robots are on the ground manifold (Li et al., 2012). It does not require additional sensors and has strong autonomy (Ning et al., 2021). There are various descriptions of kinematic constraints such as the NonHolonomic Constraint (NHC) and plane constraint, whose models depend on some specific motion manifolds. but they lack a unified representation which can be extended to specific constraints easily. Meanwhile, although the measurement model can be derived theoretically, real world scenes may not satisfy the model assumption. It should be noted that the noise of motion manifold constraints is highly correlated with the vehicleâ€™s motion. The model assumption may not be satisfied when the vehicle bumps or turns. Therefore, adaptive filtering is a viable approach to tackling the challenge. Furthermore, its effect on the system observability should be investigated to guarantee that it is beneficial for a new measurement.
This paper focuses on the fusion of VIO, GNSS, and motion manifold constraints to cope with the challenges of accuracy and time efficiency. Our contributions are as follows:

We provide a GNSSaided filterbased VIO approach, which introduces multiple measurements including visual measurements, GNSS positions, and motion manifold constraints.

We provide a unified motion manifold measurement model, including rotation, velocity, and translation constraints, and propose an adaptive filtering algorithm to promote the measurement robustness.

By defining the local observability matrix, we analyze the impact of multiple constraints on the whole system.
The remaining paper is organized as follows: the second section discusses the related work. Then, we sort out the overall framework of the proposed algorithm, where the motion manifold constraintbased filtering method is detailed. The fourth section analyzes the systemâ€™s observability in an ideal model. In the evaluation section, our work is compared with different VIOs and GNSSVIOs in the simulation and urban datasets. The final section concludes the paper.
Related works
Visualinertial odometry
As a multisensor fusion approach, VIO is applied to harsh scenarios such as textureless, motion blur, and occlusions. In addition, it solves the scale estimation problem of monocular visual Simultaneous Localization and Mapping (SLAM) (Campos et al., 2021b). In general, VIO updates poses by filterbased methods (Ribeiro, 2004; Wan, 2000) or graph optimizationbased methods (Grisetti et al., 2011).
Several graph optimizationbased VIO algorithms can significantly improve the accuracy. Representative approaches include OKVIS (Leutenegger et al., 2013) and VINSMono (Qin et al., 2018), which develop an Inertial Measurement Unit (IMU) preintegration technique (Forster et al., 2015), and ORB_SLAM3 (Campos et al., 2021b) introduces ORB corner extraction method.
As a tightly coupled approach, MSCKF inherits the filtering framework of EKF and solves the problem of excessive dimension growth in EKF, which has the great advantages of accurate positioning and light weight. MSCKF is further refined by deriving a closedform IMU state transition equation and applying First Estimate Jacobian (FEJ) for improving consistency (Li & Mourikis, 2013). MSCKF has also extended to a stereo version (Sun et al., 2018), which is applied to Micro Aerial Vehicles (MAVs).
There are several works on GNSSVIO fusion. Qin et al. (2019) propose VINSFusion which is a looselycoupled estimator fusing GNSS relative poses based on VINSMono. Gong et al. (2020); Cioffi and Scaramuzza (2020) adopt a loosely coupled graph optimizationbased approach using GNSS position and velocity. Cao et al. (2022), Liu et al. (2021), however, tightly couple raw GNSS data into the visualinertial system. Li et al. (2022) utilize Precise Point Positioning (PPP) in a factor graph framework and improves the accuracy through high precision carrier phase. Liu et al. propose gMSCKF and emphasize its observabilityaware advantage, while Lee et al. (2022) optimize the spatiotemporal calibration between IMUGNSS in the proposed GAINS. These filterbased works show a comparable level of performance compared to optimizationbased methods, yet they have not taken into account manifold constraints in challenging environments.
Kinematic constraints in vehicle localization
Pose estimation can be optimized for vehicle localization with the prior constraint information on the vehicle body and the ground. Kinematic constraints are combined in filterbased (Ma et al., 2019) and optimizationbased (Yu et al., 2021) VIO based on Ackerman steering model. In LARVIO proposed by Xiaochen et al. (2020), the application of Zero Velocity Update (ZUPT) is judged by the movement of the visual image pixels. ZUPT can prevent the divergence of filtering effectively, while it can only work in the scenario of a stationary vehicle. Tian et al. (2021) recover the scale by optimizing the height difference between the vehicle and the ground. Zhang et al. (2021) reconstruct the full threedimensional value of angular velocity through mathematical derivation with the constraints of wheel odometer and ground manifold.
Another kinematic constraint, namely NHC, is applicable in Inertial Navigation Systems (INS) for land vehicles (Sukkarieh, 2000; Shin et al., 2002). 3D Auxiliary Velocity Updates (AVUs) encompassing NHC and odometerderived velocity are used to improve the accuracy when GNSS signals are blocked (Niu et al., 2007). And Zhang et al. (2021) propose a novel algorithm to meet the need for allwheel steering robot positioning, which extends the application of kinematic constraints. However, these works are often combined with GNSS sensors and are rarely associated with VIO. Apart from NHC, some works are trying to enhance localization accuracy by imposing plane constraints (Wu et al., 2017; Panahandeh et al., 2012), which can correct the system solution effectively. Nevertheless, these kinematic constraints have not been incorporated into a unified manifold representation, and most of them do not consider the adjustment of measurement noise, which has a great influence on the robustness of positioning in real scenarios.
Filter description
Before describing the overall filtering algorithm, we follow the notation in (Sun et al., 2018). The coordinate frames and some notations involved are clarified in Table 1. The IMU frame and body frame are equivalent by default in the following discussion. Our system overview is illustrated in Fig.Â 1
.
State definition
The state vector of the system is defined as:
where \({}^I_G\textbf{q}\) is the rotation from the global frame to the IMU frame, \(\textbf{b}_g\) and \(\textbf{b}_a\) are the biases of the measured angular velocity and linear acceleration from IMU. \({}^G\textbf{v}_I\) and \({}^G\textbf{p}_I\) are the velocity and position of the IMU frame in the global frame. \({}^I_C \textbf{q}, {}^I\textbf{p}_C\) are the extrinsic parameters for online calibration, and \({}^N_G\textbf{q}\) and \({}^N\textbf{p}_G\) represent the transformation from the global frame to the ENU frame. \(\textbf{X}_C\) is the augmented state following (Mourikis et al., 2007), and M is the length of the sliding window. For accuracy and convenience, we introduce the error state vector in the description of the filter:
where \(\tilde{\textbf{X}}_{I}\) and \(\tilde{\textbf{X}}_C\) are the error IMU state vector and error camera state vector respectively. \(\tilde{\varvec{\theta }}\) comes from the first three dimensions of the quaternion:
where the quaternion \(\delta \textbf{q}\) describes the small rotation that makes the true and estimated attitude coincide.
State propagation
For a lowcost IMU, the continuous kinematic model is given by Eq. (4) ignoring the earth rotation effect.
where \(\boldsymbol{\omega }_m\) and \(\textbf{a}_m\) are the angular velocity and linear acceleration in the local IMU frame. \({}^G\textbf{g}\) is the gravity. \(\textbf{n}_{g}\) and \(\textbf{n}_{a}\) are white Gaussian noises of the gyroscope and accelerometer measurements. \(\textbf{n}_{wg}\) and \(\textbf{n}_{wa}\) are the random walk rates of the gyroscope and accelerometer measurement biases (Sun et al., 2018). \(\mathbf{\Omega} (\boldsymbol{\omega })\) is defined as:
After linearizing Eq. (4), we can obtain the error state propagation equation:
where \(\textbf{n}_I=\begin{bmatrix}\textbf{n}_g^T&\textbf{n}_{wg}^T&\textbf{n}_a^T&\textbf{n}_{wa}^T\end{bmatrix}^T\) is the process noise. \(\textbf{F}\) is the continuous state transition matrix and \(\textbf{G}\) is the input noise Jacobian, as shown in Eq. (8). And the state covariance is obtained by the discrete time state transition \(\varvec{\Phi }_k\) and noise covariance matrix \(\textbf{Q}_k\) with respect to \(\textbf{F}\) and \(\textbf{G}\). Then the propagated covariance is:
where \(\textbf{P}_k\) is the state covariance matrix at time step k, and \(\textbf{P}_{k+1k}\) is the predicted covariance matrix at time step \(k+1\) based on the IMU inputs at time step k.
State augmentation
When detecting a new image, we add the new camera error state \(\textbf{X}_{C_{M+1}}\) to the sliding window. The covariance matrix is also augmented by the Jacobian matrix \(\textbf{J}\) where:
Visual measurement model
After initializing the threedimensional coordinates of multiple observed features by the triangulation method, a twodimensional projection \(\textbf{z}\) of a feature point \({}^G\textbf{p}_f\) on the normalized image plane is given by Eq. (10).
where \({}^C\textbf{p}_f = \begin{bmatrix}{}^G\textbf{x}_f&{}^G\textbf{y}_f&{}^G\textbf{z}_f\end{bmatrix}^T\) denotes the feature position in the camera frame. Once the feature projection is estimated, the linear model of reprojection error can be derived:
where \(\textbf{n}\) is the observation Gaussian noise. To eliminate the influence of \(\textbf{p}_f\) in measurement, we define a unitary matrix \(\textbf{V}\) whose columns form the basis of the left nullspace of \(\textbf{H}_f\) (Mourikis et al., 2007), and transform Eq. (11) into:
where \(\textbf{r}_c\) is the projected residual. The measurement update of Kalman filter can be performed using Eq. (12).
In the frontend implementation, we adopt the technology of contrast limited adaptive histogram equalization (Zuiderveld, 1994) (CLAHE) to preprocess the image. This method not only enhances the imageâ€™s contrast so that the algorithm can be applied under weak light conditions but also reduces the noise, which is beneficial for feature matching. Similar methods are applied in (Campos et al., 2021b; Qin et al., 2018). Some VIOs use Fast detector (Sun et al., 2018; Bloesch et al., 2015) to save computing resources. However, feature points such as ORB corners can achieve more accurate estimation if the vehicle has an aggressive motion in the outdoor scene. Finally, the KLT optical flow algorithm (Lucas & Kanade, 1997) and RANSAC method (Fischler & Bolles, 1981) are adopted to track the features and eliminate the outliers.
Motion manifold measurement model
To describe the pose constraints of a robot on the ground, we model the ground as the general motion manifold (Zhang et al., 2021):
where \(\textbf{p}\) is the position of the ground robot. Meanwhile, when the ground robot is on the manifold, the zaxis in the global frame is collinear with the gradient of the motion manifold. Therefore the following equation holds:
Based on the above definitions and assumptions, some state constraints can be obtained.
Velocity constraints
When the ground vehicle is in close contact with the ground, the following equations can be derive from Eqs. (13) and (14):
A similar equation \({}^bv_y = 0\) holds since the vehicle is also on the motion manifold in the crosstrack direction (yaxis). Assuming that the constraint in Eq. (15) is stochastic because of the deviation in the real world, the velocity measurement model can be obtained:
where \(\textbf{r}_{vel}\) and \(\textbf{H}_{vel}\) are the residual and measurement Jacobian, respectively. \({\varvec{\Lambda}} _1 = \begin{bmatrix} \textbf{e}_2&\textbf{e}_3\end{bmatrix}^T\), \(\textbf{n}_n\) is the observation noise, and the body frame is equivalent to the IMU frame. The noise setting of velocity constraints depends on the current motion, which will be elaborated in the adaptive filtering section.
A special scenario is that when there is no specific motion being detected which means \(\textbf{p}\) is constant in Eq. (13), and its first derivative (velocity) and second derivative (acceleration) should be zero. Thus we apply Eq. (17) as a pseudomeasurement of robot velocity and IMU acceleration bias:
where \(\textbf{r}_z\) is the velocity and acceleration bias residual, and \(\textbf{a}_m\) is the IMU acceleration measurement. Following Eq. (17), the corresponding measurement Jacobians are given by:
In the proposed scheme, we detect the static scene by the average moving pixel distance of features and the relative displacement of GNSS position measurements between two image frames. The constraint starts only when the calculated values are less than certain thresholds such as 0.1 pixels for visual features or 0.005Â m for GNSS measurements, which depend on the image size and the noise of GNSS measurements.
Rotation & translation constraints
Assuming that the vehicle is running on the plane \(\pi\) (Wu et al., 2017), the motion manifold can be specified as:
where \(\textbf{A}= \begin{bmatrix}a_1&a_2&a_3\end{bmatrix}\). Furthermore, if \(\pi\) is set as the initial xy plane of the global frame (\(a_0 = a_1 = a_2 = 0\)), the following equations hold:
where \(\Lambda _2 = \begin{bmatrix} \textbf{e}_1&\textbf{e}_2\end{bmatrix}^T\). Thus a rotational rollpitch constraint and a translation constraint on the zaxis can be obtained:
where \(\textbf{z}_{rot}\) is the estimated roll and pitch, and \(z_{tran}\) is the estimated translation on the zaxis. The corresponding measurement Jacobians are given by:
where \({}^\pi _G\textbf{R}\) is an identity matrix if \(\pi\) is the initial xy plane of the global frame.
GNSS position measurement model
Before constructing the GNSS measurement model, the frame transformation \({}_G^N\textbf{R}\) and \({}^N\textbf{p}_G\) is required. We initialize the transformation by a simple method.
We start by the GNSS position measurement:
The origin of the ENU frame is set as the first position solution from GNSS sensor. We collect a sequence of VIO poses {\({}^G\textbf{p}_{I,1}, {}^G\textbf{p}_{I,2},..., {}^G\textbf{p}_{I,n}\)} and GNSS poses {\({}^G\textbf{p}_{\rm{gnss,1}}, {}^G\textbf{p}_{\rm{gnss,2}},..., {}^G\textbf{p}_{\rm{gnss,n}}\)} with a suitable traveling distance. By differentiating the poses, we can eliminate \({}^N\textbf{p}_{G}\):
We initialize the rotation \({}^N_G\textbf{R}\) with the yaw angle by solving Eq. (25), and then the translation \({}^N\textbf{p}_G\) can also be initialized following Eq. (24). Note that we do not require a leastsquares method as Lee et al. (2020) do since the simple initialization is reliable enough for ground vehicles, which can be demonstrated in the experiment section.
The subsequent position measurements are converted from the EarthCentered EarthFixed (ECEF) frame to the ENU frame and construct the hybrid measurements. We loosely couple the GNSS position output in the measurement model. The measurement residual calculated in the ENU frame is given by:
with the Jacobian measurement matrix:
In Eq. (26), the global location is directly used as the position measurement for each time step. The pose estimation will be corrected by the motion manifold constraints in the GNSSdenied environment where GNSS measurements may be inaccurate.
Adaptive filtering
For various scenarios, setting a fixed manifold constraint noise matrix in Eq. (16) is unrealistic and may worsen the pose estimation. We adopt an adaptive strategy to exploit the information on manifold constraints. It is noted from Eq. (15) that the derivative of the motion manifold is equivalent to the velocity in the body frame. Given a sequence of body velocity \({}^b\textbf{v}_i, i = 1,2,...,n\), we suppose the realtime observation noise matrix as:
where \(\textbf{z}_{n} = [{}^bv_y \quad {}^bv_z]^T\). In Eq. (28) \({}^bv_y\) and \({}^bv_z\) are assumed to be independent. We use Root Mean Square (RMS) as the criterion of the algorithm to evaluate \(\hat{\textbf{R}}_k\) assuming that the expectation of \({}^b v_y\) and \({}^b v_z\) is zero:
Therefore, we propose an adaptive filtering algorithm for the motion manifold, as shown in Algorithm. 1. For each iteration, we check four conditions from \(C_1\) to \(C_4\). \(C_1\) means that the manifold constraints are not required under low speed motions, and \(C_2\) determines whether the current state conforms to the manifold assumption in Eq. (15). \(C_3\) is the most critical one. We take the RMS of \({}^G\textbf{v}_b\) computed in Eq. (29) as the evaluation standard of the filtering effect. In \(C_4\) we take the feature matching ratio \(r_f\) in the frontend as the criterion to measure the quality of visual observations. Manifold constraints are enabled when \(r_f\) is low enough, which means that visual observations may have large deviations. These four conditions guarantee both effectiveness and precision in pose estimation. Intuitively, the RMS of \({}^G\textbf{v}_b\) should be reduced after the measurement update is completed. In the next stage of the hybrid measurement update (visual update and manifold constraint), the RMS is set as the new observation noise, as shown in Fig.Â 2. A Mahalanobis distance test (chisquare test) is employed for all the measurements to detect and eliminate potential outliers.
Observability analysis
Observability matrix
Observability reveals whether the information provided by the sensor measurements is sufficient to estimate the states without ambiguities (Huai & Huang, 2018). Since MSCKF and EKFSLAM use the same visual measurements and linearization operation, we can perform observability analysis from the perspective of EKFSLAM, which has the same observability property as MSCKF. Among the estimated states of the algorithm we proposed, IMU biases can prove to be observable (Kelly & Sukhatme, 2011). Considering that the measurement model in Eq. (17) only takes effect in static scenes, we ignore it in the following analysis. Lee et al. (2020) have proved that GNSSVIO has the same four unobservable directions as VIO if estimating states in the VIO frame (i.e., the global frame). At the same time, it is fully observable if estimating states in the ENU frame. Hence we focus on the motion manifold constraintsâ€™ impact on observability. For the sake of simplicity, we only analyze the three state variables of rotation \({}^I_G\textbf{q}\), velocity \({}^G\textbf{v}_I\) and translation \({}^G\textbf{p}_I\). The observability matrix in the period [\(m,m+n\)] is defined as:
where \(\textbf{H}_k\) is the measurement matrix at time step k, and \(\varvec{\Phi }_k\) is the state transition matrix from k to \(k+1\). At time step k, the state vector \(\textbf{X}_I\) is defined as Eq. (31) if there are K features observed by the camera in the time interval [m,m +Â n] (Fig. 3)
:
The observation of the system includes visual measurements and motion manifold constraints. We denote \(\textbf{H}_v = \textbf{H}_{vel}\) and \(\textbf{H}_p = \begin{bmatrix}\textbf{H}_{rot}^T&\textbf{H}_{tran}^T\end{bmatrix}^T\) for convenience. The measurement matrix is a stack of three Jacobian matrices:
The Jacobian \(\textbf{H}_{c,k}\) contains K block rows for the form of ith block rows:
while \(\textbf{H}_{v,k}\) and \(\textbf{H}_{p,k}\) are given in Eq. (34) and Eq. (35):
Similarly, the block row \(\varvec{\varTheta}_{k}\) of the observability matrix encompasses three parts:
\({\varvec{\varTheta}}_{c,{k}}\) can refer to the conclusion in (Li & Mourikis, 2013), while \({\varvec{\varTheta}}_{v,k}\) and \({\varvec{\varTheta}}_{p,{k}}\) are derived in Eq. (37):
where \({}^G\textbf{v}_g={}^{G}\textbf{v}_{I_m}+{}^{G}\textbf{g}\Delta t, \Delta \textbf{p}={}^G\textbf{p}_{I_k}{}^G\textbf{p}_{I_m}{}^G\textbf{v}_{I_m}\Delta t\frac{1}{2}{}^Gg\Delta t^2, \Delta t = \Delta t_{k1}+...+\Delta t_m\). According to (Li et al., 2012), the nullspace matrix of \(\varvec{\varTheta}_{c,k}\) is given by:
It can be verified that \(\mathcal {N}\) is also the nullspace of \(\varvec{\varTheta}_{v,k}\), i.e., \(\varvec{\varTheta}_{v,k}\cdot \mathcal {N}=\textbf{0}\), which means velocity constraints do not change the unobservable dimensions ideally. And for rotation and translation constraints, \(\varvec{\varTheta}_{p,k}\cdot \mathcal {N} \ne \textbf{0}\). Specifically, the zaxis of position is observable because of the plane assumption.
Observability measure
By establishing a local observability matrix in a short time, we can obtain the observability, and observable dimensions of the system (Butcher et al., 2017). Previous analysis indicates that the observable dimension of the system has not changed. The observability of the original observable dimensions can be measured by quantitative observability. Gramian matrix, which is a common measure of observability, is introduced to evaluate the observability of the EKF system with motion manifold constraints.
The Gramian matrix measures the sensitivity of the output concerning the initial condition. The discretetime Gramian matrix is defined as:
We take the obsevability index (OI) in Eq. (40) as the obsevability measure:
Motion manifold constraints can enhance observability, which will be verified in the following section.
Evaluation
In order to validate the effectiveness of our proposed approach, we conduct the simulation and real world tests on different public datasets. Our experiments are conducted on a laptop with Intel(R) Core(TM) i710710U CPU@1.10Ghz and 16Â G RAM.
Simulation
In the simulation test, we assume that the car is moving along a circle on a plane. A total of 200 landmarks are scattered around the real trajectory, as shown in Fig.Â 4. The car loops three times in a circle with a radius of 100Â m. The landmarks are generated along the inside cylinder wall with a radius of 90Â m and outside cylinder walls with a radius of 110Â m. The camera captures landmarks in the field of view of the camera within a range of 20Â m. We generate sensor measurements with noise according to the trajectory and landmarks. The yaw angle between the ENU and global frame is set as \(10^{\circ }\). Specific configurations are shown in Table 2.
Position accuracy
The forward direction of the car corresponds to the yaxis of the IMU frame and the camera frame, so the body velocity in the xaxis and yaxis is constrained. In the simulation test, four algorithms are compared: standard MSCKF (benchmark), MSCKF+MC(Manifold constraints), MSCKF+GNSS, and MSCKF+GNSS+MC. The pose errors and trajectories are illustrated in Fig.Â 5 andÂ Fig. 7. From Fig.Â 7a and b, we notice that the drift of pose error is bounded in MSCKF+GNSS and MSCKF+GNSS+MC due to the global measurements. The pose error is reduced when introducing manifold constraints, especially in the yaw and zaxis position estimation (see Fig.Â 7c and d), which confirms the effectiveness of the manifold constraint. Table 3 lists the total Average RMSE (ARMSE) and Reduction Rate (RR) compared to MSCKF. We can note that MSCKF+GNSS+MC achieves the highest reduction rate for the position error, which outperforms the algorithms that only combine GNSS or manifold constraints.
Observability measures
We compare the observability index in Eq. (40) by calculating the Gramian matrix. We calculate the Gramian matrix for each time step, and the result is shown in Fig.Â 6. The observability index of MSCKF+MC is higher than that of MSCKF, indicating better observability.
Real world test
VIO+Manifold constraints
We compare the performance of motion manifold constraintaided VIO without GNSS measurements. Both filterbased (MSCKF, RVIO (Huai & Huang, 2018), and our algorithm without adaptive filtering strategy) and optimizationbased methods (VINSMono, ORB_SLAM3) are included in our comparison test on Brno Urban Dataset (Ligocki et al., 2020), which provides data from four WUXGA RGB cameras with 1920\(\times\)1200 pixels and 400Hz IMU. We intercept parts of datasets lasting for about three minutes, including parking, turnings, loops, bumps, and other scenes. The scenarios of sequence Brno2_1_10_1 and sequence Brno2_1_10_3 are at night, as shown in Fig.Â 8. For each trial, estimated trajectories are aligned with the ground truth trajectories using Umeyamaâ€™s method (Umeyama, 1991). Table 4 concludes the errors of aligned trajectories on Brno Urban datasets, and Fig.Â 9 shows the estimated trajectories on a typical sequence Brno1_2_1_2.
It can be noted that motion manifold constraints reduce the divergence of pose estimation effectively under special scenarios. In night scenarios (Brno2_1_10_1, Brno2_1_10_3) with scarce visual information, VINSMono and ORB_SLAM3 have a worse performance due to the difficulty in the convergence of visual error, while our work can make up for this disadvantage well. And in the textureless and weak light scenario (Brno1_2_6_1), our method also achieves a small gain with the assistance of motion manifold constraints compared with MSCKF. And in some sequences (Brno1_1_6_1, Brno2_1_10_1) which incorporate drastic driving motions like bumps or fast turns where fixed kinematic noise can not guarantee a positive constraint effect, whereas adopting an adaptive strategy is more reasonable. In general, our algorithm can handle different types of motion better and report more accurate results than other algorithms for most cases.
VIO+GNSS+Manifold constraints
We use Kaist Urban dataset (Jeong et al., 2019) to evaluate the performance of VIO with GNSS measurements. Kaist Urban dataset is collected with a 100Hz IMU, a 10Hz stereo camera, a commercial GPS sensor measuring the positions, and a highprecision RTK. We select the left camera, IMU, and GPS sensors for our test, as shown in Fig.Â 11. Originally the ground truth is generated by graph SLAM, and we transform its local coordinate into the ECEF frame by aligning it with RTK data in fixed status and further transforming the coordinate to the ENU frame. We compare our algorithm with pure GNSS, VINSMono, MSCKF with GNSS, and VINSFusion (Qin et al., 2018) on five Kaist Urban sequences, among which Kaist 31 and 38 have a long distance of 10.7Â km and 11Â km respectively. Note that we select the first static scene as the starting point of the trajectories because of the requirement for static initialization in MSCKF. FigureÂ 10 shows an example of challenging sections and feature extraction in Kaist Urban dataset. For time efficiency comparison, we count the total processing time of the algorithms in the backend for each trial and calculate the average time per image frame. The trajectories and numerical results are reported in Fig.Â 12 and Table 5.
It is noted that the VIO pose estimation achieves a higher accuracy with GNSS measurements. Compared with VINSFusion which does not explicitly initialize the ENU to VIO frame transformation, our algorithms demonstrate its comparative effect on both accuracy and time efficiency. There are some cases that VINS Fusionâ€™s performance is much worse than pure GNSS (Kaist 30, Kaist 31) since the VIO pose has a large deviation dragging down the fusion accuracy with GNSS. Compared with GNSSMSCKF, our proposed algorithm shows a moderate decrease in RMSE in Kaist 31 and Kaist 38, which demonstrates the effectiveness of the motion manifold constraints. In terms of backend processing time, optimization methods like VINSMono and VINSFusion are much more timeconsuming compared with filtering methods (GNSSMSCKF and our proposed method). Although our methods consume more time (about 1.2ms per frame) than GNSSMSCKF, the extra time spent is acceptable. To sum up, our algorithm strikes a better balance between accuracy and time efficiency.
Conclusion
This paper presents a GNSSaided VIO which loosely couples the GNSS position measurements. We formulate a unified framework on motion manifold to represent multiple motion manifold constraints which are specified under additional conditions. To address the challenge of timevarying noise for ground vehicles, we propose an adaptive filtering method for motion manifold constraints and derive the observability of the system, including the observability matrix for motion manifold constraints. Simulation and real world tests demonstrate the effectiveness of our system. In the future, we will introduce more robust visual estimation methods and achieve better scenario adaptability.
Availability of data and materials
The datasets used and analysed in this study are available from the corresponding author on reasonable request.
References
Bloesch, M., Omari, S., Hutter, M., Siegwart, R.: Robust visual inertial odometry using a direct ekfbased approach. In: IEEE/RSJ International Conference on Intelligent Robots & Systems, pp. 298â€“304 (2015).
Butcher, E. A., Wang, J., & Lovell, T. A. (2017). On kalman filtering and observability in nonlinear sequential relative orbit estimation. Journal of Guidance, Control, and Dynamics, 40(9), 2167â€“2182.
Campos, C., Elvira, R., RodrÃguez, J. J. G., Montiel, J. M., & TardÃ³s, J. D. (2021). Orbslam3: An accurate opensource library for visual, visualinertial, and multimap slam. IEEE Transactions on Robotics, 37(6), 1874â€“1890.
Cao, S., Lu, X., & Shen, S. (2022). Gvins: Tightly coupled gnssvisualinertial fusion for smooth and consistent state estimation. IEEE Transactions on Robotics, 38(4), 2004â€“2021.
Choi, Y., Kim, N., Hwang, S., Park, K., Yoon, J. S., An, K., & Kweon, I. S. (2018). Kaist multispectral day/night data set for autonomous and assisted driving. IEEE Transactions on Intelligent Transportation Systems, 19(3), 934â€“948.
Cioffi, G., Scaramuzza, D. (2020). Tightlycoupled fusion of global positional measurements in optimizationbased visualinertial odometry. In: 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE, pp. 5089â€“5095.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381â€“395.
Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D. (2015). Imu preintegration on manifold for efficient visualinertial maximumaposteriori estimation. Georgia Institute of Technology
Gong, Z., Liu, P., Wen, F., Ying, R., Ji, X., Miao, R., & Xue, W. (2020). Graphbased adaptive fusion of gnss and vio under intermittent gnssdegraded environment. IEEE Transactions on Instrumentation and Measurement, 70, 1â€“16.
Grisetti, G., KÃ¼mmerle, R., Strasdat, H., Konolige, K. (2011). g2o: A general framework for (hyper) graph optimization. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 9â€“13.
Huai, Z., Huang, G. (2018). Robocentric visualinertial odometry. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE, pp. 6319â€“6326
Jeong, J., Cho, Y., Shin, Y.S., Roh, H., & Kim, A. (2019). Complex urban dataset with multilevel sensors from highly diverse urban environments. The International Journal of Robotics Research, 38(6), 642â€“657.
Kelly, J., & Sukhatme, G. S. (2011). Visualinertial sensor fusion: Localization, mapping and sensortosensor selfcalibration. The International Journal of Robotics Research, 30(1), 56â€“79.
Lee, W., Eckenhoff, K., Geneva, P., Huang, G. (2020). Intermittent gpsaided vio: Online initialization and calibration. In: 2020 IEEE International conference on robotics and automation (ICRA), IEEE, pp. 5724â€“5731.
Lee, W., Geneva, P., Yang, Y., Huang, G. (2022). Tightlycoupled gnssaided visualinertial localization. In: 2022 International Conference on Robotics and Automation (ICRA), IEEE, pp. 9484â€“9491.
Leutenegger, S., Furgale, P., Rabaud, V., Chli, M., Siegwart, R. (2013). Keyframebased visualinertial slam using nonlinear optimization. In: Proceedings of Robotics: Science and Systems.
Li, J., Pei, L., Zou, D., Xia, S., Wu, Q., Li, T., Sun, Z., & Yu, W. (2020). Attentionslam: A visual monocular slam learning from human gaze. IEEE Sensors Journal, 21(5), 6408â€“6420.
Li, M., & Mourikis, A. I. (2013). Highprecision, consistent ekfbased visualinertial odometry. The International Journal of Robotics Research, 32(6), 690â€“711.
Li, T., Pei, L., Xiang, Y., Yu, W., & Truong, T.K. (2022). P\(^3\)vins: Tightlycoupled ppp/ins/visual slam based on optimization approach. IEEE Robotics and Automation Letters, 7(3), 7021â€“7027.
Li, X., Wang, X., Liao, J., Li, X., Li, S., & Lyu, H. (2021). Semitightly coupled integration of multignss ppp and svins for precise positioning in gnsschallenged environments. Satellite Navigation, 2(1), 1â€“14.
Li, Y., Niu, X., Zhang, Q., Cheng, Y., & Shi, C. (2012). Observability analysis of nonholonomic constraints for landvehicle navigation systems. In: Proceedings of the 25th International technical meeting of the satellite division of the institute of navigation (ION GNSS 2012), pp. 1521â€“1529.
Ligocki, A., Jelinek, A., & Zalud, L. (2020). Brno urban datasetthe new data for selfdriving agents and mapping tasks. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 3284â€“3290.
Liu, J., Gao, W., Hu, Z. (2021). Optimizationbased visualinertial slam tightly coupled with raw gnss measurements. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 11612â€“11618.
Lucas, B.D., & Kanade, T. (1997). An iterative image registration technique with an application tostereo vision. In: Proceedings of the 7th International joint conference on artificial intelligence.
Ma, F., Shi, J., Yang, Y., Li, J., & Dai, K. (2019). Ackmsckf: Tightlycoupled ackermann multistate constraint kalman filter for autonomous vehicle localization. Sensors, 19(21), 4816.
Mourikis, A.I., Roumeliotis, S.I., et al. (2007). A multistate constraint kalman filter for visionaided inertial navigation. In: ICRA, vol. 2, p. 6.
Ning, Y., Sang, W., Yao, G., Bi, J., & Wang, S. (2021). Gnss/mimu tightly coupled integrated with improved multistate zupt/dzupt constraints for a land vehicle in gnssdenied enviroments. International Journal of Image and Data Fusion, 12(3), 226â€“241.
Niu, X., Nassar, S., & ElSheimy, N. (2007). An accurate landvehicle mems imu/gps navigation system using 3d auxiliary velocity updates. Navigation, 54(3), 177â€“188.
NÃ¼chter, A., Lingemann, K., Hertzberg, J., & Surmann, H. (2007). 6d slam3d mapping outdoor environments. Journal of Field Robotics, 24(8â€“9), 699â€“722.
Panahandeh, G., Zachariah, D., Jansson, M.: Exploiting ground plane constraints for visualinertial navigation. In: Proceedings of the 2012 IEEE/ION Position, location and navigation symposium, IEEE, pp. 527â€“534 (2012).
Qin, T., Li, P., & Shen, S. (2018). Vinsmono: A robust and versatile monocular visualinertial state estimator. IEEE Transactions on Robotics, 34(4), 1004â€“1020.
Qin, T., Cao, S., Pan, J., Shen, S. (2019). A general optimizationbased framework for global pose estimation with multiple sensors. arXiv preprint arXiv:1901.03642.
Ribeiro, M. I. (2004). Kalman and extended kalman filters: Concept, derivation and properties. Institute for Systems and Robotics, 43, 46.
Shin, E.H., ElSheimy, N. (2002). Accuracy improvement of low cost ins/gps for land applications. In: Proceedings of the 2002 national technical meeting of the institute of navigation, pp. 146â€“157.
Sukkarieh, S. (2000). Low cost, high integrity, aided inertial navigation systems for autonomous land vehicles. PhD thesis. The University of Sydney, Australia.
Sun, K., Mohta, K., Pfrommer, B., Watterson, M., Liu, S., Mulgaonkar, Y., Taylor, C. J., & Kumar, V. (2018). Robust stereo visual inertial odometry for fast autonomous flight. IEEE Robotics and Automation Letters, 3(2), 965â€“972.
Tian, R., Zhang, Y., Zhu, D., Liang, S., Coleman, S., Kerr, D. (2021). Accurate and robust scale recovery for monocular visual odometry based on plane geometry. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5296â€“5302. https://doi.org/10.1109/ICRA48506.2021.9561215
Umeyama, S. (1991). Leastsquares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(04), 376â€“380.
Wan, E.A., Van DerÂ Merwe, R. (2000). The unscented kalman filter for nonlinear estimation. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, IEEE, (Cat. No. 00EX373), pp. 153â€“158.
Wu, K.J., Guo, C.X., Georgiou, G., Roumeliotis, S.I. (2017). Vins on wheels. In: 2017 IEEE International conference on robotics and automation (ICRA), IEEE, pp. 5155â€“5162.
Xiaochen, Q., Zhang, H., & Wenxing, F. (2020). Lightweight hybrid visualinertial odometry with closedform zero velocity update. Chinese Journal of Aeronautics, 33(12), 3344â€“3359.
Xiong, L., Kang, R., Zhao, J., Zhang, P., Xu, M., Ju, R., Ye, C., & Feng, T. (2021). Gvido: A vehicle dynamics and intermittent gnssaided visualinertial state estimator for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(8), 11845â€“11861.
Yu, Z., Zhu, L., Lu, G. (2021). Vinsmotion: Tightlycoupled fusion of vins and motion constraint. In: 2021 IEEE International conference on robotics and automation (ICRA), IEEE, pp. 7672â€“7678.
Zhang, M., Zuo, X., Chen, Y., Liu, Y., & Li, M. (2021). Pose estimation for ground robots: On manifold representation, integration, reparameterization, and optimization. IEEE Transactions on Robotics, 37(4), 1081â€“1099.
Zhang, Z., Niu, X., Tang, H., Chen, Q., & Zhang, T. (2021). Gnss/ins/odo/wheel angle integrated navigation algorithm for an allwheel steering robot. Measurement Science and Technology, 32(11), 11512.
Zuiderveld, K. (1994). Contrast Limited Adaptive Histogram Equalization. In Graphics gems IV, pp. 474â€“485.
Acknowledgements
We express thanks to Tao Li and Jie Yin who provided the dataset, and Shanghai Key Laboratory of Navigation and Location Based Sevice, Shanghai Jiao Tong University.
Funding
This work was supported in part by the National Nature Science Foundation of China (NSFC) under Grant No. 62273229, and in part by the Equipment PreResearch Field Foundation under Grant No. 80913010303.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hua, T., Pei, L., Li, T. et al. M2CGVIO: motion manifold constraint aided GNSSvisualinertial odometry for ground vehicles. Satell Navig 4, 13 (2023). https://doi.org/10.1186/s43020023001029
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s43020023001029
Keywords
 Sensor fusion
 Visualinertial odometry
 Motion manifold constraint