编者寄语

时空数据与人类社会发展密不可分,无论是在智慧城市,5G,物联网还是数字孪生等领域,定位导航、交通出行、社交活动、推荐系统等各种应用层出不穷。近年来,国家提出“推动互联网、大数据、人工智能和实体经济深度融合,建设数字中国、智慧社会”。城市的发展是经济发展和社会进步的重要标志。目前,城市信息模型(CIM)的发展与建设对传统时空数据管理在物理和虚拟空间中都带来了挑战,在研究过程中需要不断提升时间刻画粒度以及空间覆盖范围。时空数据管理,主要解决数据的建模、存储和索引查询相关的问题;时空数据分析和挖掘,涉及大量时空数据相关的分析算法;借助人工智能技术对海量时空数据的高效管理、分析与挖掘,能够很大程度上提高城市运作效率,链接泛在的城市神经网络,从而驱动智慧城市的建设,推动时空大数据的共享和治理。时空数据管理方法和技术背后蕴藏着巨大的社会和经济价值,正日益成为学界和工业界的研究热点。因此,对海量、多源、异构的时空数据进行多维多尺度高效的管理、分析,挖掘是数字中国战略的重要基础。本期数图焦点,我们围绕时空数据管理筛选了9篇国内外论文与大家分享,欢迎阅读。

目录

资料格式

时空数据管理进展与趋势

交通预测综述:从时空数据到智慧交通

Intelligent transportation (e.g., intelligent traffic light) makes our travel more convenient and efficient. With the development of mobile Internet and position technologies, it is reasonable to collect spatio-temporal data and then leverage these data to achieve the goal of intelligent transportation, and here, traffic prediction plays an important role. In this paper, we provide a comprehensive survey on traffic prediction, which is from the spatio-temporal data layer to the intelligent transportation application layer. At first, we split the whole research scope into four parts from bottom to up, where the four parts are, respectively, spatio-temporal data, preprocessing, traffic prediction and traffic application. Later, we review existing work on the four parts. First, we summarize traffic data into five types according to their difference on spatial and temporal dimensions. Second, we focus on four significant data preprocessing techniques: map-matching, data cleaning, data storage and data compression. Third, we focus on three kinds of traffic prediction problems (i.e., classification, generation and estimation/forecasting). In particular, we summarize the challenges and discuss how existing methods address these challenges. Fourth, we list five typical traffic applications. Lastly, we provide emerging research challenges and opportunities. We believe that the survey can help the partitioners to understand existing traffic prediction problems and methods, which can further encourage them to solve their intelligent transportation applications.

格式:
文章

时空数据管理数据库关键技术

POI偏好感知的Top k最优序列路线查询

The optimal sequenced route (OSR) query, as a popular problem in route planning for smart cities, searches for a minimum-distance route passing through several POIs in a specific order from a starting position. In reality, POIs are usually rated, which helps users in making decisions. Existing OSR queries neglect the fact that the POIs in the same category could have different scores, which may affect users’ route choices. In this paper, we study a novel variant of OSR query, namely Rating Constrained Optimal Sequenced Route query (RCOSR), in which the rating score of each POI in the optimal sequenced route should exceed the query threshold. To efficiently process RCOSR queries, we first extend the existing TD-OSR algorithm to propose a baseline method, called MTDOSR. To tackle the shortcomings of MTDOSR, we try to design a new RCOSR algorithm, namely Optimal Subroute Expansion (OSE) Algorithm. To enhance the OSE algorithm, we propose a Reference Node Inverted Index (RNII) to accelerate the distance computation of POI pairs in OSE and quickly retrieve the POIs of each category. To make full use of the OSE and RNII, we further propose a new efficient RCOSR algorithm, called Recurrent Optimal Subroute Expansion (ROSE), which recurrently utilizes OSE to compute the current optimal route as the guiding path and update the distance of POI pairs to guide the expansion. Then, we extend our techniques to handle a variation of RCOSR query, namely RCkOSR query. The experimental results demonstrate that the proposed algorithm significantly outperforms the existing approaches.

格式:
文章
基于群体用户聚集的最优路径查询

Motivated by location-based social networks which allow people to access location-based services as a group, we study a novel variant of optimal sequenced route (OSR) queries, optimal sequenced route for group meetup (OSR-G) queries. OSR-G query aims to find the optimal meeting POI (point of interest) such that the maximum users’ route distance to the meeting POI is minimized after each user visits a number of POIs of specific categories (e.g., gas stations, restaurants, and shopping malls) in a particular order. To process OSR-G queries, we first propose an OSR-Based (OSRB) algorithm as our baseline, which examines every POI in the meeting category and utilizes existing OSR (called E-OSR) algorithm to compute the optimal route for each user to the meeting POI. To address the shortcomings (i.e., requiring to examine every POI in the meeting category) of OSRB, we propose an upper bound based filtering algorithm, called circle filtering (CF) algorithm, which exploits the circle property to filter the unpromising meeting POIs. In addition, we propose a lower bound based pruning (LBP) algorithm, namely LBP-SP which exploits a shortest path lower bound to prune the unqualified meeting POIs to reduce the search space. Furthermore, we develop an approximate algorithm, namely APS, to accelerate OSR-G queries with a good approximation ratio. Finally the experimental results show that both CF and LBP-SP outperform the OSRB algorithm and have high pruning rates. Moreover, the proposed approximate algorithm runs faster than the exact OSR-G algorithms and has a good approximation ratio.

格式:
文章
具有可扩展性的地理标记实体参与型系统中Top-k实体的高效索引

Next-generation enterprise management systems are beginning to be developed based on the Systems of Engagement (SOE) model. We visualize an SOE as a set of entities. Each entity is modeled by a single parent document with dynamic embedded links (i.e., child documents) that contain multi-modal information about the entity from various networks. Since entities in an SOE are generally queried using keywords, our goal is to efficiently retrieve the top-k entities related to a given keyword-based query by considering the relevance scores of both their parent and child documents. Furthermore, we extend the afore-mentioned problem to incorporate the case where the entities are geo-tagged. The main contributions of this work are three-fold. First, it proposes an efficient bitmap-based approach for quickly identifying the candidate set of entities, whose parent documents contain all queried keywords. A variant of this approach is also proposed to reduce memory consumption by exploiting skews in keyword popularity. Second, it proposes the two-tier HI-tree index, which uses both hashing and inverted indexes, for efficient document relevance score lookups. Third, it proposes an R-tree-based approach to extend the afore-mentioned approaches for the case where the entities are geo-tagged. Fourth, it performs comprehensive experiments with both real and synthetic datasets to demonstrate that our proposed schemes are indeed effective in providing good top-k result recall performance within acceptable query response times.

格式:
文章
一种结合CNN和BiLSTM的推文地理位置预测方法

Twitter is one of the most popular micro-blogging and social networking platforms where users post their opinions, preferences, activities, thoughts, views, etc., in form of tweets within the limit of 280 characters. In order to study and analyse the social behavior and activities of a user across a region, it becomes necessary to identify the location of the tweet. This paper aims to predict geolocation of real-time tweets at the city level collected for a period of 30 days by using a combination of convolutional neural network and a bidirectional long short-term memory by extracting features within the tweets and features associated with the tweets. We have also compared our results with previous baseline models and the findings of our experiment show a significant improvement over baselines methods achieving an accuracy of 92.6 with a median error of 22.4 km at city level prediction.

格式:
文章
面向移动对象的Top-k竞争位置选择

The location selection (LS) problem identifies an optimal site to place a new facility such that its influence on given objects can be maximized. With the proliferation of GPS-enabled mobile devices, LS studies have made progress for moving objects. However, the state-of-the-art LS techniques over moving objects assume the new facility has no competitor, which is too restrictive and unrealistic for real-world business. In this paper we study Competitive Location Selection over Moving objects (CLS-M), which takes into account competition against existing facilities in mobile scenarios. We present a competition-based influence score model to evaluate the influence of a candidate. To solve the problem, we propose an influence pruning algorithm to prune objects who are either influenced by inferior candidates or affected by no candidate. Experimental study over two real-world datasets demonstrates that the proposed algorithm outperforms state-of-the-art LS techniques in terms of efficiency.

格式:
文章

时空数据挖掘与分析

OODIDA:用于联网车辆的车载/离线分布式实时数据分析

A fleet of connected vehicles easily produces many gigabytes of data per hour, making centralized (off-board) data processing impractical. In addition, there is the issue of distributing tasks to on-board units in vehicles and processing them efficiently. Our solution to this problem is On-board/Off-board Distributed Data Analytics (OODIDA), which is a platform that tackles both task distribution to connected vehicles as well as concurrent execution of tasks on arbitrary subsets of edge clients. Its message-passing infrastructure has been implemented in Erlang/OTP, while the end points use a language-independent JSON interface. Computations can be carried out in arbitrary programming languages. The message-passing infrastructure of OODIDA is highly scalable, facilitating the execution of large numbers of concurrent tasks.

格式:
文章
现代空间数据处理函数库的效果评估

Many applications today like Uber, Yelp, Tinder, etc. rely on spatial data or locations from its users. These applications and services either build their own spatial data management systems or rely on existing solutions. JTS Topology Suite (JTS), its C++ port GEOS, Google S2, ESRI Geometry API, and Java Spatial Index (JSI) are some of the spatial processing libraries that these systems build upon. These applications and services depend on indexing capabilities available in these libraries for high-performance spatial query processing. In this work, we compare these libraries qualitatively and quantitatively based on four different spatial queries using two real world datasets. We also compare these libraries with an open-source implementation of the Vantage Point Tree—an index structure that has been well studied in image retrieval and nearest-neighbor search algorithms for high-dimensional data. We found that Vantage Point Trees are very competitive and even outperform the aforementioned libraries in two queries.

格式:
文章

时空数据表示学习

面向个性化 POI 推荐的结合社交关系的时空表示学习研究

erence due to separate embedding learning or network modeling. To this end, we propose a novel unified spatio-temporal neural network framework, named PPR, which leverages users’ check-in records and social ties to recommend personalized POIs for querying users by joint embedding and sequential modeling. Specifically, PPR first learns user and POI representations by joint modeling User-POI relation, sequential patterns, geographical influence, and social ties in a heterogeneous graph and then models user personalized sequential patterns using the designed spatio-temporal neural network based on LSTM model for the personalized POI recommendation. Furthermore, we extend PPR to an end-to-end recommendation model by jointly learning node representations and modeling user personalized sequential preference. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms state-of-the-art baselines for successive POI recommendation in terms of Accuracy, Precision, Recall and NDCG. The source code is available at: https://www.anonymous.4open.science/r/DSE-1BEC.

格式:
文章

本期编委成员

李战怀

CCF数据库专委会主任、西北工业大学

崔斌

CCF数据库专委会副主任、北京大学

王晓黎

厦门大学副教授

李博涵

南京航空航天大学副教授

范举

中国人民大学教授

往期回顾