CCF数图焦点第31期：时空数据管理 |CCF DL Focus On Spatio-temporal Data Management

时空数据管理进展与趋势

交通预测综述:从时空数据到智慧交通

Haitao Yuan,Guoliang Li,

Intelligent transportation (e.g., intelligent traffic light) makes our travel more convenient and efficient. With the development of mobile Internet and position technologies, it is reasonable to collect spatio-temporal data and then leverage these data to achieve the goal of intelligent transportation, and here, traffic prediction plays an important role. In this paper, we provide a comprehensive survey on traffic prediction, which is from the spatio-temporal data layer to the intelligent transportation application layer. At first, we split the whole research scope into four parts from bottom to up, where the four parts are, respectively, spatio-temporal data, preprocessing, traffic prediction and traffic application. Later, we review existing work on the four parts. First, we summarize traffic data into five types according to their difference on spatial and temporal dimensions. Second, we focus on four significant data preprocessing techniques: map-matching, data cleaning, data storage and data compression. Third, we focus on three kinds of traffic prediction problems (i.e., classification, generation and estimation/forecasting). In particular, we summarize the challenges and discuss how existing methods address these challenges. Fourth, we list five typical traffic applications. Lastly, we provide emerging research challenges and opportunities. We believe that the survey can help the partitioners to understand existing traffic prediction problems and methods, which can further encourage them to solve their intelligent transportation applications.

格式：

文章

时空数据管理数据库关键技术

POI偏好感知的Top k最优序列路线查询

Huaijie Zhu, Wenbin Li, Wei Liu, Jian Yin & Jianliang Xu,

The optimal sequenced route (OSR) query, as a popular problem in route planning for smart cities, searches for a minimum-distance route passing through several POIs in a specific order from a starting position. In reality, POIs are usually rated, which helps users in making decisions. Existing OSR queries neglect the fact that the POIs in the same category could have different scores, which may affect users’ route choices. In this paper, we study a novel variant of OSR query, namely Rating Constrained Optimal Sequenced Route query (RCOSR), in which the rating score of each POI in the optimal sequenced route should exceed the query threshold. To efficiently process RCOSR queries, we first extend the existing TD-OSR algorithm to propose a baseline method, called MTDOSR. To tackle the shortcomings of MTDOSR, we try to design a new RCOSR algorithm, namely Optimal Subroute Expansion (OSE) Algorithm. To enhance the OSE algorithm, we propose a Reference Node Inverted Index (RNII) to accelerate the distance computation of POI pairs in OSE and quickly retrieve the POIs of each category. To make full use of the OSE and RNII, we further propose a new efficient RCOSR algorithm, called Recurrent Optimal Subroute Expansion (ROSE), which recurrently utilizes OSE to compute the current optimal route as the guiding path and update the distance of POI pairs to guide the expansion. Then, we extend our techniques to handle a variation of RCOSR query, namely RCkOSR query. The experimental results demonstrate that the proposed algorithm significantly outperforms the existing approaches.

格式：

文章

基于群体用户聚集的最优路径查询

Bo Chen, Huaijie Zhu, Wei Liu, Jian Yin, Wang-Chien Lee & Jianliang Xu,

Motivated by location-based social networks which allow people to access location-based services as a group, we study a novel variant of optimal sequenced route (OSR) queries, optimal sequenced route for group meetup (OSR-G) queries. OSR-G query aims to find the optimal meeting POI (point of interest) such that the maximum users’ route distance to the meeting POI is minimized after each user visits a number of POIs of specific categories (e.g., gas stations, restaurants, and shopping malls) in a particular order. To process OSR-G queries, we first propose an OSR-Based (OSRB) algorithm as our baseline, which examines every POI in the meeting category and utilizes existing OSR (called E-OSR) algorithm to compute the optimal route for each user to the meeting POI. To address the shortcomings (i.e., requiring to examine every POI in the meeting category) of OSRB, we propose an upper bound based filtering algorithm, called circle filtering (CF) algorithm, which exploits the circle property to filter the unpromising meeting POIs. In addition, we propose a lower bound based pruning (LBP) algorithm, namely LBP-SP which exploits a shortest path lower bound to prune the unqualified meeting POIs to reduce the search space. Furthermore, we develop an approximate algorithm, namely APS, to accelerate OSR-G queries with a good approximation ratio. Finally the experimental results show that both CF and LBP-SP outperform the OSRB algorithm and have high pruning rates. Moreover, the proposed approximate algorithm runs faster than the exact OSR-G algorithms and has a good approximation ratio.

格式：

文章

具有可扩展性的地理标记实体参与型系统中Top-k实体的高效索引

Anirban Mondal, Ayaan Kakkar, Nilesh Padhariya & Mukesh Mohania,

Next-generation enterprise management systems are beginning to be developed based on the Systems of Engagement (SOE) model. We visualize an SOE as a set of entities. Each entity is modeled by a single parent document with dynamic embedded links (i.e., child documents) that contain multi-modal information about the entity from various networks. Since entities in an SOE are generally queried using keywords, our goal is to efficiently retrieve the top-k entities related to a given keyword-based query by considering the relevance scores of both their parent and child documents. Furthermore, we extend the afore-mentioned problem to incorporate the case where the entities are geo-tagged. The main contributions of this work are three-fold. First, it proposes an efficient bitmap-based approach for quickly identifying the candidate set of entities, whose parent documents contain all queried keywords. A variant of this approach is also proposed to reduce memory consumption by exploiting skews in keyword popularity. Second, it proposes the two-tier HI-tree index, which uses both hashing and inverted indexes, for efficient document relevance score lookups. Third, it proposes an R-tree-based approach to extend the afore-mentioned approaches for the case where the entities are geo-tagged. Fourth, it performs comprehensive experiments with both real and synthetic datasets to demonstrate that our proposed schemes are indeed effective in providing good top-k result recall performance within acceptable query response times.

格式：

文章

一种结合CNN和BiLSTM的推文地理位置预测方法

Rhea Mahajan & Vibhakar Mansotra,

Twitter is one of the most popular micro-blogging and social networking platforms where users post their opinions, preferences, activities, thoughts, views, etc., in form of tweets within the limit of 280 characters. In order to study and analyse the social behavior and activities of a user across a region, it becomes necessary to identify the location of the tweet. This paper aims to predict geolocation of real-time tweets at the city level collected for a period of 30 days by using a combination of convolutional neural network and a bidirectional long short-term memory by extracting features within the tweets and features associated with the tweets. We have also compared our results with previous baseline models and the findings of our experiment show a significant improvement over baselines methods achieving an accuracy of 92.6 with a median error of 22.4 km at city level prediction.

格式：

文章

面向移动对象的Top-k竞争位置选择

Ping Liu, Meng Wang, Jiangtao Cui & Hui Li,

The location selection (LS) problem identifies an optimal site to place a new facility such that its influence on given objects can be maximized. With the proliferation of GPS-enabled mobile devices, LS studies have made progress for moving objects. However, the state-of-the-art LS techniques over moving objects assume the new facility has no competitor, which is too restrictive and unrealistic for real-world business. In this paper we study Competitive Location Selection over Moving objects (CLS-M), which takes into account competition against existing facilities in mobile scenarios. We present a competition-based influence score model to evaluate the influence of a candidate. To solve the problem, we propose an influence pruning algorithm to prune objects who are either influenced by inferior candidates or affected by no candidate. Experimental study over two real-world datasets demonstrates that the proposed algorithm outperforms state-of-the-art LS techniques in terms of efficiency.

格式：

文章

时空数据挖掘与分析

OODIDA：用于联网车辆的车载/离线分布式实时数据分析

Gregor Ulm, Simon Smith, Adrian Nilsson, Emil Gustavsson & Mats Jirstrand,

A fleet of connected vehicles easily produces many gigabytes of data per hour, making centralized (off-board) data processing impractical. In addition, there is the issue of distributing tasks to on-board units in vehicles and processing them efficiently. Our solution to this problem is On-board/Off-board Distributed Data Analytics (OODIDA), which is a platform that tackles both task distribution to connected vehicles as well as concurrent execution of tasks on arbitrary subsets of edge clients. Its message-passing infrastructure has been implemented in Erlang/OTP, while the end points use a language-independent JSON interface. Computations can be carried out in arbitrary programming languages. The message-passing infrastructure of OODIDA is highly scalable, facilitating the execution of large numbers of concurrent tasks.

格式：

文章

现代空间数据处理函数库的效果评估

Varun Pandey, Alexander van Renen, Andreas Kipf & Alfons Kemper,

Many applications today like Uber, Yelp, Tinder, etc. rely on spatial data or locations from its users. These applications and services either build their own spatial data management systems or rely on existing solutions. JTS Topology Suite (JTS), its C++ port GEOS, Google S2, ESRI Geometry API, and Java Spatial Index (JSI) are some of the spatial processing libraries that these systems build upon. These applications and services depend on indexing capabilities available in these libraries for high-performance spatial query processing. In this work, we compare these libraries qualitatively and quantitatively based on four different spatial queries using two real world datasets. We also compare these libraries with an open-source implementation of the Vantage Point Tree—an index structure that has been well studied in image retrieval and nearest-neighbor search algorithms for high-dimensional data. We found that Vantage Point Trees are very competitive and even outperform the aforementioned libraries in two queries.

格式：

文章

时空数据表示学习

面向个性化 POI 推荐的结合社交关系的时空表示学习研究

Shaojie Dai, Yanwei Yu, Hao Fan & Junyu Dong,

erence due to separate embedding learning or network modeling. To this end, we propose a novel unified spatio-temporal neural network framework, named PPR, which leverages users’ check-in records and social ties to recommend personalized POIs for querying users by joint embedding and sequential modeling. Specifically, PPR first learns user and POI representations by joint modeling User-POI relation, sequential patterns, geographical influence, and social ties in a heterogeneous graph and then models user personalized sequential patterns using the designed spatio-temporal neural network based on LSTM model for the personalized POI recommendation. Furthermore, we extend PPR to an end-to-end recommendation model by jointly learning node representations and modeling user personalized sequential preference. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms state-of-the-art baselines for successive POI recommendation in terms of Accuracy, Precision, Recall and NDCG. The source code is available at: https://www.anonymous.4open.science/r/DSE-1BEC.

格式：

文章