The outdoor location service has increasingly matured with the rapid development of the Global Navigation Satellite System (GNSS) (Liu et al., 2020). However, GNSS fails to provide indoor positioning service due to its signal obstruction and attenuation. While indoor positioning has become more and more important in people’s daily activities, such as shopping, parking, and health monitoring. Accordingly, many scholars have conducted considerable research on indoor positioning with various techniques, such as Wi-Fi, Bluetooth, geomagnetic localization, Radio Frequency Identification (RFID), ultra-wideband, wireless local area network, computer vision, light visible communication, and Pedestrian Dead Reckoning (PDR) assisted by accelerator and gyroscope (He & Chan, 2016; Naser and Li, 2021; Zhuang et al., 2018; Yang et al., 2015; El-Sheimy & Li, 2021; El-Sheimy & Youssef, 2020).
Among these techniques, Wi-Fi positioning has become a research hotspot due to its mature hardware and software ecology, low cost, and no need of extra deployment. Main Wi-Fi positioning algorithms include Access Point (AP) proximity-aware (Hodes et al., 1997), fingerprint-based positioning (Zhuang et al., 2016), and trilateration localization based on the signal propagation model (Bahl & Padmanabhan, 2000). But the fingerprinting algorithm is more widely used because it can achieve the highest positioning accuracy.
Currently, the neighbor point mismatch is a prime problem in Wi-Fi fingerprint-based positioning. The traditional solution calculates the similarity between the fingerprint RSS vector and the observation RSS vector using different indices, like the Euclidean distance (Kaemarungsi & Krishnamurthy, 2004), cosine similarity (Han et al., 2015), Pearson coefficient (Li et al., 2019), and others (Machaj et al., 2011). Most of these methods use the direct differential computation method by the means of RSS vectors. However, it is difficult to describe the complex nonlinear relationship between signal vectors accurately. Therefore, many scholars recently use Machine Learning (ML) and Deep Learning (DL) for neighbor point matching. It can be broadly divided into two groups One is the supervised learning methods which use various classification methods, like Random Forest (RF) (Lee et al., 2019), Decision Tree (DT) (Chanama & Wongwirat, 2018), Bayes (Chen et al., 2013), Support Vector Machine (SVM), Neural Network (NN) (Zhang et al., 2013; Esmond & Bernard, 2013), Convolutional Neural Network (CNN) (Shao et al., 2018) and other classification algorithms (Feng et al., 2014; Li et al., 2021). The other is unsupervised learning using the methods of clustering, K-Means (Chen et al., 2015), fuzzy cluster (Bi et al. 2018), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Deng et al., 2018), etc.
These two groups of methods have their obvious weaknesses and strengths. The classification algorithms always require a high demand, which includes both sample quality and sample quantity. Considering the time and labor costs, we can easily find that classification, especially the multi-classification, may not be suitable for fingerprint-based positioning since the classifier training requires each of these categories has a large number of samples. Therefore, many methods for sample enhancement have been proposed, for example, crowdsourced data collection (Guo & Pun, 2019), interpolation methods for sample creation (Kolakowski, 2020), and the DL to increase the size of samples, in which the most common method is the Generative Adversarial Network (GAN) (Liu & Wang, 2020; Zou et al., 2020). But the data generated by this method has poor quality, and the generation model is hard to converge when using GAN.
The clustering algorithm also has some problems. Firstly, the computational complexity of the clustering is too high to be used in real-time positioning. Secondly, clustering is more applicable for zone localization, and the accuracy of the point localization using this algorithm is always low. In addition, most of the clustering algorithms require a known number of classes and some initial centers of the clusters, which makes it hard for practical use. The abnormal data has a greater effect on the final result when compared with other methods.
The above ML-based or DL-based methods all face the same problem. They use the APs’ RSS values as the input features, but the RSS values have a strong relationship with the location of the fingerprint. The classifier trained by the fingerprint data of one building cannot be used in another building, sometimes even on another floor. It requires that each building or floor trains and manages its own classifier, which can cause some problems. The first and foremost problem is the model deployment in the servers for practical application. It is necessary to deploy a huge number of models and update the models periodically, which is costly. Another problem is the model management if there are many models on the cloud. It is hard to maintain effectively, and also requires countless resources for the operation of the whole cloud platform.
To solve the above problems, a novel method is proposed using the differences among the samples. To make full use of the differences, we adopt the relative features, like the repeated AP, the signal similarity, and the other features rather than the commonly used absolute features. The boosting algorithms of the eXtreme Gradient Boosting (XGBoost) and the Gradient Boosting Decision Tree (GBDT) are used in this paper for binary classification model training rather than the multi-classification, because they are widely used in binary classification and their performance is much better than others. The test datasets perform well by using the classifier trained by the same building’s data, or another building’s data.