Black History Month is here! Discover ERA research focused on Black experiences in Canada and worldwide. Use our general search below to get started!

Public Health Applications Using Big Data and Machine Learning Methods: Name- and Location-based Aboriginal Ethnicity Classification and Sentiment Analysis of Breast Cancer Screening in the United States Using Twitter

Loading...
Thumbnail Image

Institution

http://id.loc.gov/authorities/names/n79058482

Degree Level

Doctoral

Degree

Doctor of Philosophy

Department

School of Public Health

Specialization

Epidemiology

Citation for Previous Publication

Wong, K. O., Davis, F. G., Zaïane, O. R., & Yasui, Y. (2016). Sentiment analysis of breast cancer screening in the United States using twitter. In KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval (Vol. 1, pp. 265-274). SciTePress.

Link to Related Item

Abstract

Applications using big data and machine learning techniques are transforming how people live in the 21st century, however they are generally underutilized in public health compared to other domains. We proposed and conducted two independent studies to investigate how big data and machine learning techniques may serve important functions to address different public health challenges in North America. In Name- and Location-based Aboriginal Ethnicity Classification, we developed and tested the classification performance of a machine learning method to predict individuals’ Aboriginal status using name and location information from the 1901 Canadian census. Our automated approach has yielded good classification results, especially for a number of Aboriginal (all-inclusive) and sub-Aboriginal (such as First Nations, Algonquian, and Kootenay) statuses. The classification performance for predicting ethnicity status of these four Aboriginal groupings ranged between 0.99-1.00 in accuracy, 0.99-1.00 in ROC, 0.63-0.65 in sensitivity, 0.99-1.00 in specificity, 0.78-0.86 in PPV, and 0.99-1.00 in NPV in the validation sets. The demonstrated application illustrated that using high decision boundary values resulted in predicted First Nations-specific prevalence statistics closely approximated to the true underlying prevalence. In Sentiment Analysis of Breast Cancer Screening in the United States Using Twitter, we slightly modified the existing VADER sentiment classifier to automatically classify the sentiment of breast cancer screening-related tweets into neutral, positive, and negative. Extensive data visualization was conducted to illustrate the temporal (via time-series plot), geospatial (via point, hot spot, and quintile maps), and thematic (via word-clouds) patterns of breast cancer screening sentiment in the U.S. The ecological associations between the averaged sentiment scores and percentage of breast cancer screening uptake at the state level were examined, and significant inverse relationships (p<0.05) were found between negative sentiments and recent uptakes of mammogram and clinical breast exam.

Item Type

http://purl.org/coar/resource_type/c_46ec

Alternative

License

Other License Text / Link

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

en

Location

Time Period

Source