000187261 001__ 187261
000187261 005__ 20210129214958.0
000187261 0247_ $$2Handle$$a2128/8313
000187261 037__ $$aFZJ-2015-00934
000187261 041__ $$aEnglish
000187261 1001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b0$$ufzj
000187261 1112_ $$aCy-Tera/LinkSCEEM HPC Administrator Workshop$$cNicosia$$d2015-01-19 - 2015-01-21$$wCyprus
000187261 245__ $$aBig Data in HPC, Hadoop, and HDFS - Part Two
000187261 260__ $$c2015
000187261 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1422430384_8068$$xInvited
000187261 3367_ $$033$$2EndNote$$aConference Paper
000187261 3367_ $$2DataCite$$aOther
000187261 3367_ $$2ORCID$$aLECTURE_SPEECH
000187261 3367_ $$2DRIVER$$aconferenceObject
000187261 3367_ $$2BibTeX$$aINPROCEEDINGS
000187261 520__ $$aOne of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of parallelization techniques. The talk differentiates between two paradigms that is on the one hand the massively parallel paradigm known in High Performance Computing (HPC) using techniques such as the Message Passing Interface (MPI) and OpenMP and on the other hand the map-reduce paradigm using rather pleasently parallel approaches. The first part of the talk focusses on 'Big Data in HPC' using two concrete codes as examples: (1) clustering using a parallel and scalable DBSCAN implementation and (2) classification using a parallel and scalable Support Vector Machine (SVM) implementation. The second part focusses on 'Big Data in Hadoop (based on the map-reduce processing paradigm) and its Hadoop Distributed File System (HDFS)' using known examples from text analysis such as wordcount. In between the material comparisons are given such as distributed filesystems vs. parallel filesystems or configuration elements important for HPC administrators. The talk ends with offering future topics in the context of big data analytics (e.g. in-situ analytics in exascale computing) or big data management challenges for reproducability of HPC & map-reduce runs required for future publications based on open referencable data.
000187261 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0
000187261 773__ $$y2015
000187261 8564_ $$uhttp://morrisriedel.de/sites/default/files/share/2015-01-21-BigData-PartTwo-Riedel-Small-v1.pdf
000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.pdf$$yOpenAccess
000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-144$$xicon-144$$yOpenAccess
000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000187261 909CO $$ooai:juser.fz-juelich.de:187261$$pdriver$$pVDB$$popen_access$$popenaire
000187261 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich GmbH$$b0$$kFZJ
000187261 9130_ $$0G:(DE-HGF)POF2-412$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vGrid Technologies and Infrastructures$$x0
000187261 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0
000187261 9141_ $$y2015
000187261 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000187261 920__ $$lyes
000187261 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000187261 980__ $$aconf
000187261 980__ $$aVDB
000187261 980__ $$aUNRESTRICTED
000187261 980__ $$aFullTexts
000187261 980__ $$aI:(DE-Juel1)JSC-20090406
000187261 9801_ $$aFullTexts