Fall 2025 theses and dissertations (non-restricted) will be available in ERA on November 17, 2025.

Geotagging Named Entities in Web Pages

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Examining Committee Member(s) and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

We study the problem of geotagging named entities where the goal is to identify the most relevant location of a named entity based on the content of the Web pages where the entity is mentioned. We hypothesize the relationship between the mentions of an entity and its geo-center in web pages, and propose a framework that explores this hypothesis and provides a model that can give a ranked list of locations at different location granularities for an entity. We further study the problem of dispersion, and show that the dispersion of a name can be estimated and a geo-center can be detected at an exact dispersion level. Two key features of our approach are: (i) minimal assumption is made on the structure of the mentions hence the approach can be applied to a diverse and heterogeneous set of web pages, and (ii) the approach is unsupervised, leveraging shallow English linguistic features and large gazetteers. We evaluate our methods under different settings and with different categories of named entities. Our evaluation reveals that the geo-center of a name can be estimated with a good accuracy based on some simple statistics of the mentions, and that the accuracy of the estimation varies with the categories of the names.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source