Guiding Autonomous Agents to Better Behaviors through Human Advice

Published in Thirteenth IEEE International Conference on Data Mining (IDCM'13), Dallas, TX, 2013

Recommended citation: G. Kunapuli, P. Odom, J. W. Shavlik and S. Natarajan. Guiding Autonomous Agents to Better Behaviors through Human Advice . Thirteenth IEEE International Conference on Data Mining (ICDM'13), Dallas, TX, December 7-10, 2013. http://gkunapuli.github.io/files/13kbirlICDM.pdf

Inverse Reinforcement Learning (IRL) is an approach for domain-reward discovery from demonstration, where an agent mines the reward function of a Markov decision process by observing an expert acting in the domain. In the standard setting, it is assumed that the expert acts (nearly) optimally, and a large number of trajectories, i.e., training examples are available for reward discovery (and consequently, learning domain behavior). These are not practical assumptions: trajectories are often noisy, and there can be a paucity of examples. Our novel approach incorporates advice-giving into the IRL framework to address these issues. Inspired by preference elicitation, a domain expert provides advice on states and actions (features) by stating preferences over them. We evaluate our approach on several domains and show that with small amounts of targeted preference advice, learning is possible from noisy demonstrations, and requires far fewer trajectories compared to simply learning from trajectories alone.

[BibTeX]