thesis

Integrating motion planning into reinforcement learning to solve hard exploration problems

Defense date:

Nov. 18, 2020

Edit

Institution:

Sorbonne université

Disciplines:

Abstract EN:

Motion planning is able to solve robotics problems much quicker than any reinforcement learning algorithm by efficiently searching for a viable trajectory. Indeed, while the main object of interest in the field of Reinforcement Learning is the behavior of an agent, Motion Planning is concerned with the geometry and properties of the state-space, and uses a different set of primitives to achieve more efficient exploration. Some of these primitives require a model of the system and are not studied in this work, others such as reset-anywhere are only available in simulated environments. In contrast, Motion Planning approaches do not benefit from the same generalization properties as the policies produced by reinforcement learning. In this thesis, we study the ways in which techniques inspired from motion planning can speed up the solving of hard exploration problems for reinforcement learning without sacrificing the advantages of model-free learning and generalization. We identify a deadlock that can occur when applying reinforcement learning to seemingly-trivial sparse-reward problems, and contribute an exploration algorithm inspired by motion planning but specifically designed for reinforcement learning environments, as well as a framework to use the collected data to train a reinforcement learning algorithm in previously-intractable scenarios.

Abstract FR:

Dans cette thèse, nous étudions les façons dont des techniques inspirées de la planification de mouvement peuvent accélérer la résolution de problèmes d'exploration difficile pour l'apprentissage par renforcement, sans sacrifier la généralisation ni les avantages de l'apprentissage sans modèle. Nous identifions une impasse qui peut advenir lors qu'on applique l'apprentissage par renforcement à des problèmes apparemment triviaux mais qui ont une récompense éparse. De plus, nous contribuons un algorithme d'exploration inspiré de la planification de mouvement mais conçu spécifiquement pour des environnements d'apprentissage, ainsi qu'un cadre pour utiliser les données collectées pour entraîner un algorithme d'apprentissage par renforcement dans des scénarios auparavant trop complexes.