Semantic Annotation of Numerical Data in Web Tables

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Supervisor / Co-Supervisor and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

A large portion of quantitative information about entities mentioned in Web pages is expressed as Web tables, and these tables often lack proper schema and annotation, which introduces challenges for the purpose of querying and further analysis. In this thesis, we study the problem of annotating the numerical columns of Web tables by linking them to properties in a knowledge graph.

Unlike some approaches in the literature that use contextual information (such as column headers and captions), which can be missing or not reliable, or labeled data for model training, which can be difficult to obtain, our approach relies only on the semantic information readily available in knowledge graphs. We show that our approach can reliably detect both semantic types (e.g., height) and unit labels (e.g., centimeters) when the semantic type is present in the knowledge graph.

Our evaluation on real-world web tables data shows that our method outperforms, in terms of precision and F1 score, some of the state-of-the-art approaches on semantic labeling. Our evaluation also gives an insight of precision on unit detection given that no previous works have explored the similar problem to the best of our knowledge.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source