Contribution to a conference proceedings FZJ-2022-00923

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

 ;  ;  ;  ;  ;  ;  ;  ;

2021

NeurIPS Workshop Datacentric AI, DCAI2021, onlineonline, online, 14 Dec 2021 - 14 Dec 20212021-12-142021-12-14 5 p. ()

Please use a persistent id in citations:

Abstract: Multi-modal language-vision models trained on hundreds of millions of image-textpairs (e.g. CLIP, DALL-E) gained a recent surge, showing remarkable capability toperform zero- or few-shot learning and transfer even in absence of per-sample labelson target image data. Despite this trend, to date there has been no publicly availabledatasets of sufficient scale for training such models from scratch. To address thisissue, in a community effort we build and release for public LAION-400M, adataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddingsand kNN indices that allow efficient similarity search


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2021
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Ereignisse > Beiträge zu Proceedings
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank
Open Access

 Datensatz erzeugt am 2022-01-20, letzte Änderung am 2022-01-31


Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)