TY - CONF
AU - Schuhmann, Christoph
AU - Vencu, Richard
AU - Beaumont, Romain
AU - Kaczmarczyk, Robert
AU - Mullis, Clayton
AU - Katta, Aarush
AU - Coombes, Theo
AU - Jitsev, Jenia
AU - Komatsuzaki, Aran
TI - LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
M1 - FZJ-2022-00923
SP - 5 p.
PY - 2021
AB - Multi-modal language-vision models trained on hundreds of millions of image-textpairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability toperform zero- or few-shot learning and transfer even in absence of per-sample labelson target image data. Despite this trend, to date there has been no publicly availabledatasets of sufficient scale for training such models from scratch. To address thisissue, in a community effort we build and release for public LAION-400M, adataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddingsand kNN indices that allow efficient similarity search
T2 - NeurIPS Workshop Datacentric AI
CY - 14 Dec 2021 - 14 Dec 2021, online (online)
Y2 - 14 Dec 2021 - 14 Dec 2021
M2 - online, online
LB - PUB:(DE-HGF)8
UR - https://juser.fz-juelich.de/record/905696
ER -