Contribution to a conference proceedings FZJ-2022-00923

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

 ;  ;  ;  ;  ;  ;  ;  ;

2021

NeurIPS Workshop Datacentric AI, DCAI2021, onlineonline, online, 14 Dec 2021 - 14 Dec 20212021-12-142021-12-14 5 p. ()

Please use a persistent id in citations:

Abstract: Multi-modal language-vision models trained on hundreds of millions of image-textpairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability toperform zero- or few-shot learning and transfer even in absence of per-sample labelson target image data. Despite this trend, to date there has been no publicly availabledatasets of sufficient scale for training such models from scratch. To address thisissue, in a community effort we build and release for public LAION-400M, adataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddingsand kNN indices that allow efficient similarity search


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2021
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2022-01-20, last modified 2022-01-31


Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)