Dense Node Representation for Geolocation

Tommaso Fornaciari and Dirk Hovy
Bocconi University


Prior research has shown that geolocation can be substantially improved by augmenting text with user network information. While it has proven effective for geolocation, it suffers from the curse of dimensionality, since networks are usually represented as sparse adjacency matrices of connections, which grow exponentially. In order to incorporate this in-formation, we need to reduce the network dimensions, in turn limiting performance and risking sample bias. In this paper, we address these limitations by instead using dense network representations. We explore two methods to learn continuous node representations, 1) node2vec (Grover and Leskovec, 2016), which relies on the skip-gram model (Mikolov et al., 2013) over node neighborhoods, and 2) doc2vec (Le and Mikolov, 2014), which represents nodes as document embeddings over user mentions. Our method enable us to encode information from arbitrarily large networks in a fixed-length vector, without reducing the interactions volume. We combine both methods with textual input in an attention-based convolutional neural network and evaluate the contribution of each component on geolocation performance. Not being a network method properly said, it helps to improve the performance of state-of-the-art models which rely on textual data, exploiting information about users interactions.