Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond

Morita, Kenji; Morrison, Abigail; Jitsev, Jenia
doi:10.1016/j.bbr.2016.05.017
000826548 001__ 826548
000826548 005__ 20240313103124.0
000826548 0247_ $$2doi$$a10.1016/j.bbr.2016.05.017
000826548 0247_ $$2ISSN$$a0166-4328
000826548 0247_ $$2ISSN$$a1872-7549
000826548 0247_ $$2WOS$$aWOS:000380418200012
000826548 0247_ $$2altmetric$$aaltmetric:7315845
000826548 0247_ $$2pmid$$apmid:27173430
000826548 037__ $$aFZJ-2017-00771
000826548 082__ $$a610
000826548 1001_ $$0P:(DE-HGF)0$$aMorita, Kenji$$b0$$eCorresponding author
000826548 245__ $$aCorticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond
000826548 260__ $$aAmsterdam$$bElsevier$$c2016
000826548 3367_ $$2DRIVER$$aarticle
000826548 3367_ $$2DataCite$$aOutput Types/Journal article
000826548 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1568969660_22608
000826548 3367_ $$2BibTeX$$aARTICLE
000826548 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000826548 3367_ $$00$$2EndNote$$aJournal Article
000826548 520__ $$aValue-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize “Winner-Take-All (WTA)” selection of the maximum-valued action (i.e., ‘max’ operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic “soft-max” selection. The striatal “max” circuit and the cortical “soft-max” circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action.
000826548 536__ $$0G:(DE-HGF)POF3-574$$a574 - Theory, modelling and simulation (POF3-574)$$cPOF3-574$$fPOF III$$x0
000826548 536__ $$0G:(DE-HGF)POF3-571$$a571 - Connectivity and Activity (POF3-571)$$cPOF3-571$$fPOF III$$x1
000826548 536__ $$0G:(DE-Juel1)HGF-SMHB-2013-2017$$aSMHB - Supercomputing and Modelling for the Human Brain (HGF-SMHB-2013-2017)$$cHGF-SMHB-2013-2017$$fSMHB$$x2
000826548 536__ $$0G:(DE-Juel1)BMBF-01GQ1343$$aRL-BRD-J - Neural network mechanisms of reinforcement learning (BMBF-01GQ1343)$$cBMBF-01GQ1343$$x3
000826548 536__ $$0G:(DE-HGF)B1175.01.12$$aW2Morrison - W2/W3 Professorinnen Programm der Helmholtzgemeinschaft (B1175.01.12)$$cB1175.01.12$$x4
000826548 588__ $$aDataset connected to CrossRef
000826548 7001_ $$0P:(DE-Juel1)158080$$aJitsev, Jenia$$b1
000826548 7001_ $$0P:(DE-Juel1)151166$$aMorrison, Abigail$$b2
000826548 773__ $$0PERI:(DE-600)2013604-3$$a10.1016/j.bbr.2016.05.017$$gVol. 311, p. 110 - 121$$p110 - 121$$tBehavioural brain research$$v311$$x0166-4328$$y2016
000826548 8564_ $$uhttps://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.pdf$$yRestricted
000826548 8564_ $$uhttps://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.gif?subformat=icon$$xicon$$yRestricted
000826548 8564_ $$uhttps://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.jpg?subformat=icon-1440$$xicon-1440$$yRestricted
000826548 8564_ $$uhttps://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.jpg?subformat=icon-180$$xicon-180$$yRestricted
000826548 8564_ $$uhttps://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.jpg?subformat=icon-640$$xicon-640$$yRestricted
000826548 8564_ $$uhttps://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.pdf?subformat=pdfa$$xpdfa$$yRestricted
000826548 909CO $$ooai:juser.fz-juelich.de:826548$$pVDB
000826548 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)158080$$aForschungszentrum Jülich$$b1$$kFZJ
000826548 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)151166$$aForschungszentrum Jülich$$b2$$kFZJ
000826548 9131_ $$0G:(DE-HGF)POF3-574$$1G:(DE-HGF)POF3-570$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lDecoding the Human Brain$$vTheory, modelling and simulation$$x0
000826548 9131_ $$0G:(DE-HGF)POF3-571$$1G:(DE-HGF)POF3-570$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lDecoding the Human Brain$$vConnectivity and Activity$$x1
000826548 9141_ $$y2016
000826548 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000826548 915__ $$0StatID:(DE-HGF)1030$$2StatID$$aDBCoverage$$bCurrent Contents - Life Sciences
000826548 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search
000826548 915__ $$0StatID:(DE-HGF)0550$$2StatID$$aNo Authors Fulltext
000826548 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bBEHAV BRAIN RES : 2015
000826548 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000826548 915__ $$0StatID:(DE-HGF)0110$$2StatID$$aWoS$$bScience Citation Index
000826548 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000826548 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5
000826548 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC
000826548 915__ $$0StatID:(DE-HGF)0310$$2StatID$$aDBCoverage$$bNCBI Molecular Biology Database
000826548 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews
000826548 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000826548 915__ $$0StatID:(DE-HGF)0420$$2StatID$$aNationallizenz
000826548 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000826548 920__ $$lyes
000826548 9201_ $$0I:(DE-Juel1)INM-6-20090406$$kINM-6$$lComputational and Systems Neuroscience$$x0
000826548 9201_ $$0I:(DE-Juel1)IAS-6-20130828$$kIAS-6$$lTheoretical Neuroscience$$x1
000826548 980__ $$ajournal
000826548 980__ $$aVDB
000826548 980__ $$aI:(DE-Juel1)INM-6-20090406
000826548 980__ $$aI:(DE-Juel1)IAS-6-20130828
000826548 980__ $$aUNRESTRICTED
000826548 981__ $$aI:(DE-Juel1)IAS-6-20130828
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe