CANOR COACH: Towards Noise-Robust Human-in-the-Loop Reinforcement Learning
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Reinforcement learning has been widely applied in different control tasks. However, its performance often faces the challenge of low sample efficiency. Introducing human prior knowledge is often seen as a possible solution, such as behaviour cloning, learning from advice, and inverse reinforcement learning. Learning from feedback is an example of exploiting human knowledge and it is a method to enable the agent to learn from binary feedback, which describes the teacher’s attitude towards the agent’s action. Compared to traditional learning from demonstration methods, learning from feedback does not require expert-level knowledge. But this can also be a demerit as nonexpert feedback comes with inevitable noise. In this thesis, we investigate how and to which extent noise impacts the learning performance. We also propose a series of methods to de-noise the feedback data online and achieve noise-robust human-in-the-loop reinforcement learning with different amounts of prior knowledge.
