Kenneth Joseph, Kathleen Carley, and Jason I Hong
ACM Transactions on Intelligent Systems and Technology (TIST)
In previous work, we used Latent Dirichlet Allocation (LDA), a technique commonly applied to document collections to discover latent themes, to cluster foursquare users in New York City. We found that although the feature set used was agnostic of geo-spatial location, time, users’ friends on social networking sites and venue function, qualitative evidence existed that groups of people of different types (e.g. tourists), communities (e.g. users tightly clustered in space) and interests (e.g. people who enjoy athletics) could be uncovered. In the present work, we use the same feature set and a similar methodology, but we extend these efforts in seeking a more quantitative understanding of why groups of users frequent certain venues. Specifically, we develop metrics to test the cohesiveness in time, space and function of sets of venues uncovered by LDA that are checked in to by similar users. We find that nearly all venue sets the model uncovers are more cohesive than we would expect by chance along one or more of these metrics, supporting previous work in a variety of domains. In addition, we discover a significant negative correlation between the spread of venue sets in space and function, thus suggesting a “neighborhood” effect observed in other recent work. Finally, we show that the model captures distinct “micro-cultures” within the city and discuss how we can understand these based on the notion of self-representation and by leveraging latent connections between users. These findings are intended to both support and inform social science in the way that location-based services can help to understand community and human behavior in the urban environment.