A too-good-to-be-true prior to reduce shortcut reliance

Dagaev, Nikolay; Patil, Kaustubh R.; Luo, Xiaoliang; Barry, Daniel N.; Love, Bradley C.; Roads, Brett D.
doi:10.1016/j.patrec.2022.12.010
001024830 001__ 1024830
001024830 005__ 20250203103155.0
001024830 0247_ $$2doi$$a10.1016/j.patrec.2022.12.010
001024830 0247_ $$2ISSN$$a0167-8655
001024830 0247_ $$2ISSN$$a1872-7344
001024830 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-02496
001024830 0247_ $$2pmid$$a37915616
001024830 0247_ $$2WOS$$aWOS:000935348300001
001024830 037__ $$aFZJ-2024-02496
001024830 082__ $$a004
001024830 1001_ $$0P:(DE-HGF)0$$aDagaev, Nikolay$$b0
001024830 245__ $$aA too-good-to-be-true prior to reduce shortcut reliance
001024830 260__ $$aAmsterdam [u.a.]$$bElsevier$$c2023
001024830 3367_ $$2DRIVER$$aarticle
001024830 3367_ $$2DataCite$$aOutput Types/Journal article
001024830 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1712669802_18043
001024830 3367_ $$2BibTeX$$aARTICLE
001024830 3367_ $$2ORCID$$aJOURNAL_ARTICLE
001024830 3367_ $$00$$2EndNote$$aJournal Article
001024830 520__ $$aDespite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on ǣshortcutsǥ superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships, including shortcuts. We find that LCNs can serve as shortcut detectors. Furthermore, an LCN’s predictions can be used in a two-stage approach to encourage a high-capacity network (HCN) to rely on deeper invariant features that should generalize broadly. In particular, items that the LCN can master are downweighted when training the HCN. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization.
001024830 536__ $$0G:(DE-HGF)POF4-5251$$a5251 - Multilevel Brain Organization and Variability (POF4-525)$$cPOF4-525$$fPOF IV$$x0
001024830 536__ $$0G:(DE-HGF)POF4-5254$$a5254 - Neuroscientific Data Analytics and AI (POF4-525)$$cPOF4-525$$fPOF IV$$x1
001024830 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
001024830 7001_ $$0P:(DE-HGF)0$$aRoads, Brett D.$$b1
001024830 7001_ $$0P:(DE-HGF)0$$aLuo, Xiaoliang$$b2
001024830 7001_ $$0P:(DE-HGF)0$$aBarry, Daniel N.$$b3
001024830 7001_ $$0P:(DE-Juel1)172843$$aPatil, Kaustubh R.$$b4
001024830 7001_ $$0P:(DE-HGF)0$$aLove, Bradley C.$$b5$$eCorresponding author
001024830 773__ $$0PERI:(DE-600)1466342-9$$a10.1016/j.patrec.2022.12.010$$gVol. 166, p. 164 - 171$$p164 - 171$$tPattern recognition letters$$v166$$x0167-8655$$y2023
001024830 8564_ $$uhttps://www.sciencedirect.com/science/article/pii/S0167865522003841?via%3Dihub
001024830 8564_ $$uhttps://juser.fz-juelich.de/record/1024830/files/1-s2.0-S0167865522003841-main-1.pdf$$yOpenAccess
001024830 8564_ $$uhttps://juser.fz-juelich.de/record/1024830/files/1-s2.0-S0167865522003841-main-1.gif?subformat=icon$$xicon$$yOpenAccess
001024830 8564_ $$uhttps://juser.fz-juelich.de/record/1024830/files/1-s2.0-S0167865522003841-main-1.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001024830 8564_ $$uhttps://juser.fz-juelich.de/record/1024830/files/1-s2.0-S0167865522003841-main-1.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001024830 8564_ $$uhttps://juser.fz-juelich.de/record/1024830/files/1-s2.0-S0167865522003841-main-1.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001024830 909CO $$ooai:juser.fz-juelich.de:1024830$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
001024830 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172843$$aForschungszentrum Jülich$$b4$$kFZJ
001024830 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)172843$$a HHU Düsseldorf$$b4
001024830 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$a Department of Experimental Psychology, University College London, London, United Kingdom$$b5
001024830 9131_ $$0G:(DE-HGF)POF4-525$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5251$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vDecoding Brain Organization and Dysfunction$$x0
001024830 9131_ $$0G:(DE-HGF)POF4-525$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5254$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vDecoding Brain Organization and Dysfunction$$x1
001024830 9141_ $$y2024
001024830 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology$$d2023-08-22
001024830 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001024830 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bPATTERN RECOGN LETT : 2022$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001024830 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)9905$$2StatID$$aIF >= 5$$bPATTERN RECOGN LETT : 2022$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2023-08-22
001024830 915__ $$0StatID:(DE-HGF)0420$$2StatID$$aNationallizenz$$d2023-08-22$$wger
001024830 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2023-08-22
001024830 9201_ $$0I:(DE-Juel1)INM-7-20090406$$kINM-7$$lGehirn & Verhalten$$x0
001024830 980__ $$ajournal
001024830 980__ $$aVDB
001024830 980__ $$aUNRESTRICTED
001024830 980__ $$aI:(DE-Juel1)INM-7-20090406
001024830 9801_ $$aFullTexts
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help