LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Schuhmann, Christoph; Kaczmarczyk, Robert; Komatsuzaki, Aran; Katta, Aarush; Vencu, Richard; Beaumont, Romain; Jitsev, Jenia; Coombes, Theo; Mullis, Clayton
000905696 001__ 905696
000905696 005__ 20220131120340.0
000905696 0247_ $$2Handle$$a2128/30478
000905696 037__ $$aFZJ-2022-00923
000905696 1001_ $$0P:(DE-HGF)0$$aSchuhmann, Christoph$$b0
000905696 1112_ $$aNeurIPS Workshop Datacentric AI$$conline$$d2021-12-14 - 2021-12-14$$gDCAI2021$$wonline
000905696 245__ $$aLAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
000905696 260__ $$c2021
000905696 300__ $$a5 p.
000905696 3367_ $$2ORCID$$aCONFERENCE_PAPER
000905696 3367_ $$033$$2EndNote$$aConference Paper
000905696 3367_ $$2BibTeX$$aINPROCEEDINGS
000905696 3367_ $$2DRIVER$$aconferenceObject
000905696 3367_ $$2DataCite$$aOutput Types/Conference Paper
000905696 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1642841436_7299
000905696 520__ $$aMulti-modal language-vision models trained on hundreds of millions of image-textpairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability toperform zero- or few-shot learning and transfer even in absence of per-sample labelson target image data. Despite this trend, to date there has been no publicly availabledatasets of sufficient scale for training such models from scratch. To address thisissue, in a community effort we build and release for public LAION-400M, adataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddingsand kNN indices that allow efficient similarity search
000905696 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
000905696 588__ $$aDataset connected to DataCite
000905696 7001_ $$0P:(DE-HGF)0$$aVencu, Richard$$b1
000905696 7001_ $$0P:(DE-HGF)0$$aBeaumont, Romain$$b2
000905696 7001_ $$0P:(DE-HGF)0$$aKaczmarczyk, Robert$$b3
000905696 7001_ $$0P:(DE-HGF)0$$aMullis, Clayton$$b4
000905696 7001_ $$0P:(DE-HGF)0$$aKatta, Aarush$$b5
000905696 7001_ $$0P:(DE-HGF)0$$aCoombes, Theo$$b6
000905696 7001_ $$0P:(DE-Juel1)158080$$aJitsev, Jenia$$b7
000905696 7001_ $$0P:(DE-HGF)0$$aKomatsuzaki, Aran$$b8
000905696 8564_ $$uhttps://arxiv.org/abs/2111.02114
000905696 8564_ $$uhttps://juser.fz-juelich.de/record/905696/files/159_CameraReady_Workshop_Submission_LAION_400M__Public_Dataset_with_CLIP_Filtered_400M_Image_Text_Pairs-1.pdf$$yOpenAccess
000905696 909CO $$ooai:juser.fz-juelich.de:905696$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000905696 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)158080$$aForschungszentrum Jülich$$b7$$kFZJ
000905696 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
000905696 9141_ $$y2021
000905696 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000905696 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000905696 980__ $$acontrib
000905696 980__ $$aVDB
000905696 980__ $$aUNRESTRICTED
000905696 980__ $$aI:(DE-Juel1)JSC-20090406
000905696 9801_ $$aFullTexts
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help