LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Schuhmann, Christoph; Kaczmarczyk, Robert; Komatsuzaki, Aran; Katta, Aarush; Vencu, Richard; Beaumont, Romain; Jitsev, Jenia; Coombes, Theo; Mullis, Clayton

% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Schuhmann:905696,
      author       = {Schuhmann, Christoph and Vencu, Richard and Beaumont,
                      Romain and Kaczmarczyk, Robert and Mullis, Clayton and
                      Katta, Aarush and Coombes, Theo and Jitsev, Jenia and
                      Komatsuzaki, Aran},
      title        = {{LAION}-400{M}: {O}pen {D}ataset of {CLIP}-{F}iltered 400
                      {M}illion {I}mage-{T}ext {P}airs},
      reportid     = {FZJ-2022-00923},
      pages        = {5 p.},
      year         = {2021},
      abstract     = {Multi-modal language-vision models trained on hundreds of
                      millions of image-textpairs (e.g. CLIP, DALL-E) gained a
                      recent surge, showing remarkable capability toperform zero-
                      or few-shot learning and transfer even in absence of
                      per-sample labelson target image data. Despite this trend,
                      to date there has been no publicly availabledatasets of
                      sufficient scale for training such models from scratch. To
                      address thisissue, in a community effort we build and
                      release for public LAION-400M, adataset with CLIP-filtered
                      400 million image-text pairs, their CLIP embeddingsand kNN
                      indices that allow efficient similarity search},
      month         = {Dec},
      date          = {2021-12-14},
      organization  = {NeurIPS Workshop Datacentric AI,
                       online (online), 14 Dec 2021 - 14 Dec
                       2021},
      cin          = {JSC},
      cid          = {I:(DE-Juel1)JSC-20090406},
      pnm          = {5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs)
                      and Research Groups (POF4-511)},
      pid          = {G:(DE-HGF)POF4-5112},
      typ          = {PUB:(DE-HGF)8},
      url          = {https://juser.fz-juelich.de/record/905696},
}

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe