Distributional Losses for Regression
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
In this thesis we introduce a new loss for regression, the Histogram Loss. There is some evidence that, in the problem of sequential decision making, estimating the full distribution of return offers a considerable gain in performance, even though only the mean of that distribution is used in decision making. A parallel line of research in classification has found that converting hard one-hot targets to soft targets, distributions that contain information about the relationship between classes or ambiguity in the label, can improve accuracy. These findings have given rise to questions about the underlying reasons that are still left unanswered. Our proposed loss function is influenced by these two ideas and involves learning the conditional distribution of the target variable by minimizing KL-divergence between a target distribution and a flexible histogram prediction. Experiments on four datasets show that the Histogram Loss often outperforms commonly used regression losses. We then design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Through this investigation we also provide additional insights about open questions and hypotheses posed in previous works.
