thesis

Computational modelling of the variability and regulation of reinforcement learning and the associated dopaminergic signalling in Rats

Defense date:

July 8, 2019

Edit

Institution:

Sorbonne université

Disciplines:

Abstract EN:

In this work, I will discuss two main topics concerning learnt behaviour in Rats: firstly meta-learning, i.e. the regulation of learning and decision-making parameters; secondly, inter-individual variability in the strategies used in a simple Pavlovian conditioning experiment. In both cases, I will adopt a computational standpoint using reinforcement learning algorithms to model experimental data while also attending to related dopamine functions in the rat brain. If environmental access to food and reproductive opportunities evolves at a relatively stable pace, the learning abilities of an organism should be sufficient to keep track of this evolution and enable appropriate behaviour in response to these changes, but, if the environment should change unexpectedly, an additional process of meta-learning might be required to cope with this change. In particular, controlling the learning rate or speed with which state, stimulus or action values are updated in response to discrete environmental feedback, and balancing exploitation of what seems to be the best option with exploration of potentially better ones, could constitute two powerful meta-learning strategies when faced with a volatile environment. I will start my investigation of meta-learning by analysing the results of a three-armed bandit task with pharmacological inhibition of dopamine, a neurotransmitter suspected of regulating the exploration-exploitation trade-off by Humphries et al. (2012). After this, I will assess how well different models with meta-learning mechanisms regulating either the learning rate or explorationexploitation trade-off can explain long-term changes in behaviour observed between the control sessions of the same three-armed bandit task. Finally, in a Pavlovian conditioning task in which the appearance of a lever predicts food delivery, it is well known that two kinds of behaviour can appear in a rat population (Flagel et al., 2007). On the one hand, so-called sign-trackers become strongly attracted to the lever which they will approach and nibble, while goal-trackers will prefer to immediately go to the site of reward delivery. In parallel, there are differences in the associated dopamine signals, sign-trackers presenting a classical reward prediction error pattern, i.e. a burst of phasic activity which shifts from the time of reward delivery in the early stages of the task to the apparition of the lever-CS in later stages, contrary to goal-trackers whose dopamine signals are mostly stable throughout the task. A model attempting to explain these behavioural and neurological results was previously proposed by Lesaint et al. (2014a), and I will apply this model to new experimental findings based on a task with different inter-trial interval durations. This will result in adjustments to the previous model and propositions for going forward.

Abstract FR:

Dans cette thèse, je traiterai de deux sujets principaux concernant les comportements d’apprentissage chez le Rat : premièrement, ce qu’il convient d’appeler le méta-apprentissage ("meta-learning" en anglais), c’est-à-dire la capacité d’autorégulation des paramètres comportementaux qui déterminent la prise de décision ;deuxièmement, la variabilité inter-individuelle quant au choix de la stratégie à employer dans le cadre d’une expérience de conditionnement pavlovien. Pour chacune de ces deux thématiques, j’adopterai un point de vue computationnel ancré dansles techniques de l’apprentissage par renforcement afin de modéliser des données expérimentales, tout en dressant des parallèles avec les fonctions dopaminergiques censées être associées à ces processus.