LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Schuhmann, Christoph; Kaczmarczyk, Robert; Komatsuzaki, Aran; Katta, Aarush; Vencu, Richard; Beaumont, Romain; Jitsev, Jenia; Coombes, Theo; Mullis, Clayton

Items
Marc 21

001			905696
005			20220131120340.0
024	7	_	\|a 2128/30478 \|2 Handle
037	_	_	\|a FZJ-2022-00923
100	1	_	\|a Schuhmann, Christoph \|0 P:(DE-HGF)0 \|b 0
111	2	_	\|a NeurIPS Workshop Datacentric AI \|g DCAI2021 \|c online \|d 2021-12-14 - 2021-12-14 \|w online
245	_	_	\|a LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
260	_	_	\|c 2021
300	_	_	\|a 5 p.
336	7	_	\|a CONFERENCE_PAPER \|2 ORCID
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a Output Types/Conference Paper \|2 DataCite
336	7	_	\|a Contribution to a conference proceedings \|b contrib \|m contrib \|0 PUB:(DE-HGF)8 \|s 1642841436_7299 \|2 PUB:(DE-HGF)
520	_	_	\|a Multi-modal language-vision models trained on hundreds of millions of image-textpairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability toperform zero- or few-shot learning and transfer even in absence of per-sample labelson target image data. Despite this trend, to date there has been no publicly availabledatasets of sufficient scale for training such models from scratch. To address thisissue, in a community effort we build and release for public LAION-400M, adataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddingsand kNN indices that allow efficient similarity search
536	_	_	\|a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5112 \|c POF4-511 \|f POF IV \|x 0
588	_	_	\|a Dataset connected to DataCite
700	1	_	\|a Vencu, Richard \|0 P:(DE-HGF)0 \|b 1
700	1	_	\|a Beaumont, Romain \|0 P:(DE-HGF)0 \|b 2
700	1	_	\|a Kaczmarczyk, Robert \|0 P:(DE-HGF)0 \|b 3
700	1	_	\|a Mullis, Clayton \|0 P:(DE-HGF)0 \|b 4
700	1	_	\|a Katta, Aarush \|0 P:(DE-HGF)0 \|b 5
700	1	_	\|a Coombes, Theo \|0 P:(DE-HGF)0 \|b 6
700	1	_	\|a Jitsev, Jenia \|0 P:(DE-Juel1)158080 \|b 7
700	1	_	\|a Komatsuzaki, Aran \|0 P:(DE-HGF)0 \|b 8
856	4	_	\|u https://arxiv.org/abs/2111.02114
856	4	_	\|u https://juser.fz-juelich.de/record/905696/files/159_CameraReady_Workshop_Submission_LAION_400M__Public_Dataset_with_CLIP_Filtered_400M_Image_Text_Pairs-1.pdf \|y OpenAccess
909	C	O	\|o oai:juser.fz-juelich.de:905696 \|p openaire \|p open_access \|p VDB \|p driver \|p dnbdelivery
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 7 \|6 P:(DE-Juel1)158080
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5112 \|x 0
914	1	_	\|y 2021
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a contrib
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe