Leveraging Natural language Processing and Machine Learning Techniques to find Frailty Deficits from Clinical Dataset
Date
Author
Institution
Degree Level
Degree
Department
Specialization
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Introduction Frailty is a syndrome that is often associated with aging. It can be identified through specific frailty scales or a comprehensive assessment by a healthcare provider. In Alberta, it appears that there are no specific billing or diagnostic codes for frailty. So, healthcare providers may use specific assessments or codes related to conditions such as muscle weakness or decreased physical activity to identify frailty. Purpose This project aims to leverage Natural Language Processing algorithms to extract frailty keywords from structured and Unstructured clinical datasets to identify frailty deficits and classify patients into frail and non-frail classes using Machine Learning algorithms. Methods The dataset included 450 patients over the age of 60, medical information related to diseases, and clinical frailty scales. We first clean medical notes using NLP techniques and removing negation terms, then extract keywords from clinical notes and structured datasets, and finally, we use resampling techniques to deal with imbalanced clinical datasets, and we feed these extracted keywords into machine learning classifiers to classify patients as frail or not frail. Results There are many different types of machine learning classifiers that have been used for this task, Random Forest and Decision Three with 0.95 performed better than LR, KNN, NB, SVM, and neural network models. Conclusion Natural Language Processing algorithms can effectively extract frailty keywords using Electronic Medical Record (EMR) notes. Moreover, comparing the results shows that using both structured and unstructured data gives better results than using only structured data.
