A tightness continuum measure of Chinese semantic units, and its application to information retrieval

Loading...
Thumbnail Image

Author

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Master's

Degree

Master of Science

Department

Department of Computing Science

Examining Committee Member(s) and Their Department(s)

Citation for Previous Publication

Link to Related Item

Abstract

Chinese is very different from alphabetical languages such as English, as there are no delimiters between Chinese words. So Chinese segmentation is an important step for most Chinese natural language processing (NLP) tasks. We propose a tightness continuum for Chinese semantic units. The construction of the continuum is based on statistical informations. Based on this continuum, sequences can be dynamically segmented, and then that information can be exploited in a number of information retrieval tasks. In order to show that our tightness continuum is useful for NLP tasks, we propose two methods to exploit the tightness continuum within IR systems. The first method refines the result of a general Chinese word segmenter. The second method embeds the tightness value into IR score functions. Experimental results show that our tightness measure is reasonable and does improve the performance of IR systems.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source