Journal Article FZJ-2017-00771

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond

 ;  ;

2016
Elsevier Amsterdam

Behavioural brain research 311, 110 - 121 () [10.1016/j.bbr.2016.05.017]

This record in other databases:      

Please use a persistent id in citations: doi:

Abstract: Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize “Winner-Take-All (WTA)” selection of the maximum-valued action (i.e., ‘max’ operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic “soft-max” selection. The striatal “max” circuit and the cortical “soft-max” circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action.

Classification:

Contributing Institute(s):
  1. Computational and Systems Neuroscience (INM-6)
  2. Theoretical Neuroscience (IAS-6)
Research Program(s):
  1. 574 - Theory, modelling and simulation (POF3-574) (POF3-574)
  2. 571 - Connectivity and Activity (POF3-571) (POF3-571)
  3. SMHB - Supercomputing and Modelling for the Human Brain (HGF-SMHB-2013-2017) (HGF-SMHB-2013-2017)
  4. RL-BRD-J - Neural network mechanisms of reinforcement learning (BMBF-01GQ1343) (BMBF-01GQ1343)
  5. W2Morrison - W2/W3 Professorinnen Programm der Helmholtzgemeinschaft (B1175.01.12) (B1175.01.12)

Appears in the scientific report 2016
Database coverage:
Medline ; BIOSIS Previews ; Current Contents - Life Sciences ; Ebsco Academic Search ; IF < 5 ; JCR ; NCBI Molecular Biology Database ; NationallizenzNationallizenz ; No Authors Fulltext ; SCOPUS ; Science Citation Index ; Science Citation Index Expanded ; Thomson Reuters Master Journal List ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > IAS > IAS-6
Institute Collections > INM > INM-6
Workflow collections > Public records
Publications database

 Record created 2017-01-20, last modified 2024-03-13


Restricted:
Download fulltext PDF Download fulltext PDF (PDFA)
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)