Computational Laboratory for Energy And Nanoscience

University Homepage | Department of Physics
University Homepage | Department of Electrical and Computer Engineering
Map it | City of Ottawa | Regional News | Local Weather | Government of Canada

Manuscript Summary

Active Measure Reinforcement Learning for Observation Cost Minimization

C. Bellinger, R. Coles, M. Crowley, I. Tamblyn

Canadian Conference on AI, 37, 2021L10 (2021)

In many real-world scientific and engineering applications, measuring the state of a system after each action is costly or time-consuming, yet standard reinforcement learning algorithms assume free access to state observations at every time step. In this work, we introduce Active Measure Reinforcement Learning (Amrl), a framework in which the RL agent can freely explore the relationship between actions and rewards but is charged each time it measures the next state. The Amrl-Q agent learns to balance the value of obtaining accurate state measurements against their cost, progressively transitioning from relying on costly direct measurements to leveraging a learned transition model of the environment. The approach is shown to outperform standard Q-learning, Dyna-Q, and POMCP planning baselines, demonstrating that intelligent observation strategies can significantly reduce measurement costs without sacrificing policy quality.



Journal Link | Open Access Link

UOIT uOttawa uWaterloo UOIT