Big Data in HPC, Hadoop, and HDFS - Part Two

Riedel, Morris
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Riedel:187261,
      author       = {Riedel, Morris},
      title        = {{B}ig {D}ata in {HPC}, {H}adoop, and {HDFS} - {P}art {T}wo},
      reportid     = {FZJ-2015-00934},
      year         = {2015},
      abstract     = {One of the solutions to enable scalable 'big data' analysis
                      and analytics is to take advantage of parallelization
                      techniques. The talk differentiates between two paradigms
                      that is on the one hand the massively parallel paradigm
                      known in High Performance Computing (HPC) using techniques
                      such as the Message Passing Interface (MPI) and OpenMP and
                      on the other hand the map-reduce paradigm using rather
                      pleasently parallel approaches. The first part of the talk
                      focusses on 'Big Data in HPC' using two concrete codes as
                      examples: (1) clustering using a parallel and scalable
                      DBSCAN implementation and (2) classification using a
                      parallel and scalable Support Vector Machine (SVM)
                      implementation. The second part focusses on 'Big Data in
                      Hadoop (based on the map-reduce processing paradigm) and its
                      Hadoop Distributed File System (HDFS)' using known examples
                      from text analysis such as wordcount. In between the
                      material comparisons are given such as distributed
                      filesystems vs. parallel filesystems or configuration
                      elements important for HPC administrators. The talk ends
                      with offering future topics in the context of big data
                      analytics (e.g. in-situ analytics in exascale computing) or
                      big data management challenges for reproducability of HPC
                      $\&$ map-reduce runs required for future publications based
                      on open referencable data.},
      month         = {Jan},
      date          = {2015-01-19},
      organization  = {Cy-Tera/LinkSCEEM HPC Administrator
                       Workshop, Nicosia (Cyprus), 19 Jan 2015
                       - 21 Jan 2015},
      subtyp        = {Invited},
      cin          = {JSC},
      cid          = {I:(DE-Juel1)JSC-20090406},
      pnm          = {512 - Data-Intensive Science and Federated Computing
                      (POF3-512)},
      pid          = {G:(DE-HGF)POF3-512},
      typ          = {PUB:(DE-HGF)6},
      url          = {https://juser.fz-juelich.de/record/187261},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help