Big Data in HPC, Hadoop, and HDFS - Part Two

Riedel, Morris

Items
Marc 21

001			187261
005			20210129214958.0
024	7	_	\|a 2128/8313 \|2 Handle
037	_	_	\|a FZJ-2015-00934
041	_	_	\|a English
100	1	_	\|a Riedel, Morris \|0 P:(DE-Juel1)132239 \|b 0 \|u fzj
111	2	_	\|a Cy-Tera/LinkSCEEM HPC Administrator Workshop \|c Nicosia \|d 2015-01-19 - 2015-01-21 \|w Cyprus
245	_	_	\|a Big Data in HPC, Hadoop, and HDFS - Part Two
260	_	_	\|c 2015
336	7	_	\|a Conference Presentation \|b conf \|m conf \|0 PUB:(DE-HGF)6 \|s 1422430384_8068 \|2 PUB:(DE-HGF) \|x Invited
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a Other \|2 DataCite
336	7	_	\|a LECTURE_SPEECH \|2 ORCID
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
520	_	_	\|a One of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of parallelization techniques. The talk differentiates between two paradigms that is on the one hand the massively parallel paradigm known in High Performance Computing (HPC) using techniques such as the Message Passing Interface (MPI) and OpenMP and on the other hand the map-reduce paradigm using rather pleasently parallel approaches. The first part of the talk focusses on 'Big Data in HPC' using two concrete codes as examples: (1) clustering using a parallel and scalable DBSCAN implementation and (2) classification using a parallel and scalable Support Vector Machine (SVM) implementation. The second part focusses on 'Big Data in Hadoop (based on the map-reduce processing paradigm) and its Hadoop Distributed File System (HDFS)' using known examples from text analysis such as wordcount. In between the material comparisons are given such as distributed filesystems vs. parallel filesystems or configuration elements important for HPC administrators. The talk ends with offering future topics in the context of big data analytics (e.g. in-situ analytics in exascale computing) or big data management challenges for reproducability of HPC & map-reduce runs required for future publications based on open referencable data.
536	_	_	\|a 512 - Data-Intensive Science and Federated Computing (POF3-512) \|0 G:(DE-HGF)POF3-512 \|c POF3-512 \|x 0 \|f POF III
773	_	_	\|y 2015
856	4	_	\|u http://morrisriedel.de/sites/default/files/share/2015-01-21-BigData-PartTwo-Riedel-Small-v1.pdf
856	4	_	\|u https://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.pdf \|y OpenAccess
856	4	_	\|u https://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-144 \|x icon-144 \|y OpenAccess
856	4	_	\|u https://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-180 \|x icon-180 \|y OpenAccess
856	4	_	\|u https://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-640 \|x icon-640 \|y OpenAccess
909	C	O	\|o oai:juser.fz-juelich.de:187261 \|p openaire \|p open_access \|p VDB \|p driver
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)132239
913	0	_	\|a DE-HGF \|b Schlüsseltechnologien \|l Supercomputing \|1 G:(DE-HGF)POF2-410 \|0 G:(DE-HGF)POF2-412 \|2 G:(DE-HGF)POF2-400 \|v Grid Technologies and Infrastructures \|x 0
913	1	_	\|a DE-HGF \|b Key Technologies \|1 G:(DE-HGF)POF3-510 \|0 G:(DE-HGF)POF3-512 \|2 G:(DE-HGF)POF3-500 \|v Data-Intensive Science and Federated Computing \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3 \|l Supercomputing & Big Data
914	1	_	\|y 2015
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a conf
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a FullTexts
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help