Contribution to a conference proceedings/Contribution to a book FZJ-2024-00372

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
LAION-5B: An open large-scale dataset for training next generation image-text models

 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;

2022
Curran Associates, Inc. Red Hook, NY
ISBN: 9781713871088

9781713871088, NeurIPS 2022, New Orleans, LouisianaNew Orleans, Louisiana, USA, 28 Nov 2022 - 9 Dec 20222022-11-282022-12-09 Red Hook, NY : Curran Associates, Inc., Advances in neural information processing systems 35, 25278 - 25294 () [10.34734/FZJ-2024-00372]

This record in other databases:

Please use a persistent id in citations: doi:

Abstract: Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B - a dataset consisting of 5.85 billion CLIP-filteredimage-text pairs, of which 2.32B contain English language. We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale. Additionally we provide several nearest neighbor indices, an improved web-interface for dataset exploration and subset generation, and detection scores for watermark, NSFW, and toxic content detection.


Note: Also on arXiv: https://doi.org/10.48550/arXiv.2210.08402

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2023
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Document types > Books > Contribution to a book
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2024-01-10, last modified 2024-02-26


OpenAccess:
Download fulltext PDF
External link:
Download fulltextFulltext
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)