Geotagging Named Entities in Web Pages
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Examining Committee Member(s) and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
We study the problem of geotagging named entities where the goal is to identify the most relevant location of a named entity based on the content of the Web pages where the entity is mentioned. We hypothesize the relationship between the mentions of an entity and its geo-center in web pages, and propose a framework that explores this hypothesis and provides a model that can give a ranked list of locations at different location granularities for an entity. We further study the problem of dispersion, and show that the dispersion of a name can be estimated and a geo-center can be detected at an exact dispersion level. Two key features of our approach are: (i) minimal assumption is made on the structure of the mentions hence the approach can be applied to a diverse and heterogeneous set of web pages, and (ii) the approach is unsupervised, leveraging shallow English linguistic features and large gazetteers. We evaluate our methods under different settings and with different categories of named entities. Our evaluation reveals that the geo-center of a name can be estimated with a good accuracy based on some simple statistics of the mentions, and that the accuracy of the estimation varies with the categories of the names.
