Reinforcement Learning based Controller Design for Nonlinear Process Control
Date
Author
Institution
Degree Level
Degree
Department
Specialization
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Reinforcement learning (RL) has received wide attention in various fields lately. Model-free RL brings data-driven solutions that learn the control strategy directly from interaction with process data without the need for a process model. This is especially beneficial in the case of nonlinear processes where the process model might not be readily available or accurate. It circumvents the need for a model identification step. It is also able to re-train in the case of process shifts or process noise to improve performance. In contrast, traditional model-based control methods require an explicit process model, and the performance of parametric models deteriorates over time in case of process shifts or unmeasured disturbances. However, despite learning schemes like deep deterministic policy gradient (DDPG), deep Q-networks (DQN) or actor-critic, convergence to an optimal policy in process control remains a persistent challenge.
This thesis focuses on the integration of RL based methods into the process control domain. The first part of the thesis addresses the multivariate control of continuous state and action spaces of chemical processes. A parallel learning architecture is utilized to improve the control quality and convergence to an optimal policy through a better exploration of the state and action space. A centralized RL agent is able to successfully learn an effective policy for servotracking control of a quadruple tank system as an example. It can learn directly from interactions with the process while ensuring that the process remains operational.
The second part of the thesis deals with developing a hierarchical RL based constrained controller for a higher-level optimization of the Primary Separation Vessel (PSV). A supervisory RL agent is concerned with improving the bitumen recovery rate through interface level setpoint manipulation. A lower-level RL ensures control of the froth-middlings interface level. It does so despite the nonlinear nature of the process and the unpredictability of the ore composition of the slurry fed into the PSV. The unpredictability also necessitates regulation of the tailings density below a sanding threshold. It is carried out through manipulations of the tailings flowrate by a non-interacting sanding prevention RL agent. In the interface level control loop, behavioral cloning based two-phase learning scheme to promote stable state space exploration is also proposed. Based on simulation results, the behavioral cloning scheme ensured improved convergence to the near-optimal policy. The proposed hierarchical structure successfully demonstrates improved bitumen recovery rate by manipulating the interface level while preventing sanding, demonstrating the feasibility of such approaches to chemical processes.
