Directly Learning Predictors on Missing Data with Neural Networks
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
The problem of missing data is omnipresent in a wide range of real-world datasets. When learning and predicting on this data with neural networks, the typical strategy is to fill-in or complete these missing values in the dataset, called impute-then-regress. Much less common is to attempt to directly learn neural networks on the missing data, without imputing; one such approach, called NeuMiss, introduces a novel layer in the network but can be finicky to train. In this work, we explore two simple augmentations that make it simple to use standard neural network architectures: augmenting the input by concatenating a missingness indicator and introducing synthetic missingness. Synthetic missingness involves masking additional input attributes; this simple data augmentation technique expands the dataset, but surprisingly has not been explored. We show that both of these augmentations improve prediction performance across several datasets, and levels of missingness.
