Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond

Morita, Kenji; Morrison, Abigail; Jitsev, Jenia

doi:10.1016/j.bbr.2016.05.017

Items
Marc 21

001			826548
005			20240313103124.0
024	7	_	\|a 10.1016/j.bbr.2016.05.017 \|2 doi
024	7	_	\|a 0166-4328 \|2 ISSN
024	7	_	\|a 1872-7549 \|2 ISSN
024	7	_	\|a WOS:000380418200012 \|2 WOS
024	7	_	\|a altmetric:7315845 \|2 altmetric
024	7	_	\|a pmid:27173430 \|2 pmid
037	_	_	\|a FZJ-2017-00771
082	_	_	\|a 610
100	1	_	\|a Morita, Kenji \|0 P:(DE-HGF)0 \|b 0 \|e Corresponding author
245	_	_	\|a Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond
260	_	_	\|a Amsterdam \|c 2016 \|b Elsevier
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1568969660_22608 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
520	_	_	\|a Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize “Winner-Take-All (WTA)” selection of the maximum-valued action (i.e., ‘max’ operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic “soft-max” selection. The striatal “max” circuit and the cortical “soft-max” circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action.
536	_	_	\|a 574 - Theory, modelling and simulation (POF3-574) \|0 G:(DE-HGF)POF3-574 \|c POF3-574 \|f POF III \|x 0
536	_	_	\|a 571 - Connectivity and Activity (POF3-571) \|0 G:(DE-HGF)POF3-571 \|c POF3-571 \|f POF III \|x 1
536	_	_	\|a SMHB - Supercomputing and Modelling for the Human Brain (HGF-SMHB-2013-2017) \|0 G:(DE-Juel1)HGF-SMHB-2013-2017 \|c HGF-SMHB-2013-2017 \|f SMHB \|x 2
536	_	_	\|a RL-BRD-J - Neural network mechanisms of reinforcement learning (BMBF-01GQ1343) \|0 G:(DE-Juel1)BMBF-01GQ1343 \|c BMBF-01GQ1343 \|x 3
536	_	_	\|a W2Morrison - W2/W3 Professorinnen Programm der Helmholtzgemeinschaft (B1175.01.12) \|0 G:(DE-HGF)B1175.01.12 \|c B1175.01.12 \|x 4
588	_	_	\|a Dataset connected to CrossRef
700	1	_	\|a Jitsev, Jenia \|0 P:(DE-Juel1)158080 \|b 1
700	1	_	\|a Morrison, Abigail \|0 P:(DE-Juel1)151166 \|b 2
773	_	_	\|a 10.1016/j.bbr.2016.05.017 \|g Vol. 311, p. 110 - 121 \|0 PERI:(DE-600)2013604-3 \|p 110 - 121 \|t Behavioural brain research \|v 311 \|y 2016 \|x 0166-4328
856	4	_	\|u https://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.pdf \|y Restricted
856	4	_	\|u https://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.gif?subformat=icon \|x icon \|y Restricted
856	4	_	\|u https://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.jpg?subformat=icon-1440 \|x icon-1440 \|y Restricted
856	4	_	\|u https://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.jpg?subformat=icon-180 \|x icon-180 \|y Restricted
856	4	_	\|u https://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.jpg?subformat=icon-640 \|x icon-640 \|y Restricted
856	4	_	\|u https://juser.fz-juelich.de/record/826548/files/Corticostriatal%20circuit%20mechanisms%20of%20value-based%20action%20selection%3A%20Implementation%20of%20reinforcement%20learning%20algorithms%20and%20beyond.pdf?subformat=pdfa \|x pdfa \|y Restricted
909	C	O	\|p VDB \|o oai:juser.fz-juelich.de:826548
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)158080
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)151166
913	1	_	\|a DE-HGF \|b Key Technologies \|l Decoding the Human Brain \|1 G:(DE-HGF)POF3-570 \|0 G:(DE-HGF)POF3-574 \|2 G:(DE-HGF)POF3-500 \|v Theory, modelling and simulation \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3
913	1	_	\|a DE-HGF \|b Key Technologies \|l Decoding the Human Brain \|1 G:(DE-HGF)POF3-570 \|0 G:(DE-HGF)POF3-571 \|2 G:(DE-HGF)POF3-500 \|v Connectivity and Activity \|x 1 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3
914	1	_	\|y 2016
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1030 \|2 StatID \|b Current Contents - Life Sciences
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0600 \|2 StatID \|b Ebsco Academic Search
915	_	_	\|a No Authors Fulltext \|0 StatID:(DE-HGF)0550 \|2 StatID
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b BEHAV BRAIN RES : 2015
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0110 \|2 StatID \|b Science Citation Index
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0111 \|2 StatID \|b Science Citation Index Expanded
915	_	_	\|a IF < 5 \|0 StatID:(DE-HGF)9900 \|2 StatID
915	_	_	\|a Peer Review \|0 StatID:(DE-HGF)0030 \|2 StatID \|b ASC
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0310 \|2 StatID \|b NCBI Molecular Biology Database
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1050 \|2 StatID \|b BIOSIS Previews
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline
915	_	_	\|a Nationallizenz \|0 StatID:(DE-HGF)0420 \|2 StatID
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Thomson Reuters Master Journal List
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)INM-6-20090406 \|k INM-6 \|l Computational and Systems Neuroscience \|x 0
920	1	_	\|0 I:(DE-Juel1)IAS-6-20130828 \|k IAS-6 \|l Theoretical Neuroscience \|x 1
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-Juel1)INM-6-20090406
980	_	_	\|a I:(DE-Juel1)IAS-6-20130828
980	_	_	\|a UNRESTRICTED
981	_	_	\|a I:(DE-Juel1)IAS-6-20130828

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help