Conference Presentation (Invited) FZJ-2015-00934

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Big Data in HPC, Hadoop, and HDFS - Part Two



2015

Cy-Tera/LinkSCEEM HPC Administrator Workshop, NicosiaNicosia, Cyprus, 19 Jan 2015 - 21 Jan 20152015-01-192015-01-21

Please use a persistent id in citations:

Abstract: One of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of parallelization techniques. The talk differentiates between two paradigms that is on the one hand the massively parallel paradigm known in High Performance Computing (HPC) using techniques such as the Message Passing Interface (MPI) and OpenMP and on the other hand the map-reduce paradigm using rather pleasently parallel approaches. The first part of the talk focusses on 'Big Data in HPC' using two concrete codes as examples: (1) clustering using a parallel and scalable DBSCAN implementation and (2) classification using a parallel and scalable Support Vector Machine (SVM) implementation. The second part focusses on 'Big Data in Hadoop (based on the map-reduce processing paradigm) and its Hadoop Distributed File System (HDFS)' using known examples from text analysis such as wordcount. In between the material comparisons are given such as distributed filesystems vs. parallel filesystems or configuration elements important for HPC administrators. The talk ends with offering future topics in the context of big data analytics (e.g. in-situ analytics in exascale computing) or big data management challenges for reproducability of HPC & map-reduce runs required for future publications based on open referencable data.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 512 - Data-Intensive Science and Federated Computing (POF3-512) (POF3-512)

Appears in the scientific report 2015
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Conference Presentations
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2015-01-27, last modified 2021-01-29


Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)