001     905696
005     20220131120340.0
024 7 _ |a 2128/30478
|2 Handle
037 _ _ |a FZJ-2022-00923
100 1 _ |a Schuhmann, Christoph
|0 P:(DE-HGF)0
|b 0
111 2 _ |a NeurIPS Workshop Datacentric AI
|g DCAI2021
|c online
|d 2021-12-14 - 2021-12-14
|w online
245 _ _ |a LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
260 _ _ |c 2021
300 _ _ |a 5 p.
336 7 _ |a CONFERENCE_PAPER
|2 ORCID
336 7 _ |a Conference Paper
|0 33
|2 EndNote
336 7 _ |a INPROCEEDINGS
|2 BibTeX
336 7 _ |a conferenceObject
|2 DRIVER
336 7 _ |a Output Types/Conference Paper
|2 DataCite
336 7 _ |a Contribution to a conference proceedings
|b contrib
|m contrib
|0 PUB:(DE-HGF)8
|s 1642841436_7299
|2 PUB:(DE-HGF)
520 _ _ |a Multi-modal language-vision models trained on hundreds of millions of image-textpairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability toperform zero- or few-shot learning and transfer even in absence of per-sample labelson target image data. Despite this trend, to date there has been no publicly availabledatasets of sufficient scale for training such models from scratch. To address thisissue, in a community effort we build and release for public LAION-400M, adataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddingsand kNN indices that allow efficient similarity search
536 _ _ |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5112
|c POF4-511
|f POF IV
|x 0
588 _ _ |a Dataset connected to DataCite
700 1 _ |a Vencu, Richard
|0 P:(DE-HGF)0
|b 1
700 1 _ |a Beaumont, Romain
|0 P:(DE-HGF)0
|b 2
700 1 _ |a Kaczmarczyk, Robert
|0 P:(DE-HGF)0
|b 3
700 1 _ |a Mullis, Clayton
|0 P:(DE-HGF)0
|b 4
700 1 _ |a Katta, Aarush
|0 P:(DE-HGF)0
|b 5
700 1 _ |a Coombes, Theo
|0 P:(DE-HGF)0
|b 6
700 1 _ |a Jitsev, Jenia
|0 P:(DE-Juel1)158080
|b 7
700 1 _ |a Komatsuzaki, Aran
|0 P:(DE-HGF)0
|b 8
856 4 _ |u https://arxiv.org/abs/2111.02114
856 4 _ |u https://juser.fz-juelich.de/record/905696/files/159_CameraReady_Workshop_Submission_LAION_400M__Public_Dataset_with_CLIP_Filtered_400M_Image_Text_Pairs-1.pdf
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:905696
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 7
|6 P:(DE-Juel1)158080
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5112
|x 0
914 1 _ |y 2021
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a contrib
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21