000187260 001__ 187260
000187260 005__ 20210129214958.0
000187260 0247_ $$2Handle$$a2128/8312
000187260 037__ $$aFZJ-2015-00933
000187260 041__ $$aEnglish
000187260 1001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b0$$eCorresponding Author$$ufzj
000187260 1112_ $$aCy-Tera/LinkSCEEM HPC Administrator Workshop$$cNicosia$$d2015-01-19 - 2015-01-21$$wCyprus
000187260 245__ $$aBig Data in HPC, Hadoop, and HDFS - Part One
000187260 260__ $$c2015
000187260 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1422430335_8069$$xInvited
000187260 3367_ $$033$$2EndNote$$aConference Paper
000187260 3367_ $$2DataCite$$aOther
000187260 3367_ $$2ORCID$$aLECTURE_SPEECH
000187260 3367_ $$2DRIVER$$aconferenceObject
000187260 3367_ $$2BibTeX$$aINPROCEEDINGS
000187260 520__ $$aOne of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of parallelization techniques. The talk differentiates between two paradigms that is on the one hand the massively parallel paradigm known in High Performance Computing (HPC) using techniques such as the Message Passing Interface (MPI) and OpenMP and on the other hand the map-reduce paradigm using rather pleasently parallel approaches. The first part of the talk focusses on 'Big Data in HPC' using two concrete codes as examples: (1) clustering using a parallel and scalable DBSCAN implementation and (2) classification using a parallel and scalable Support Vector Machine (SVM) implementation. The second part focusses on 'Big Data in Hadoop (based on the map-reduce processing paradigm) and its Hadoop Distributed File System (HDFS)' using known examples from text analysis such as wordcount. In between the material comparisons are given such as distributed filesystems vs. parallel filesystems or configuration elements important for HPC administrators. The talk ends with offering future topics in the context of big data analytics (e.g. in-situ analytics in exascale computing) or big data management challenges for reproducability of HPC & map-reduce runs required for future publications based on open referencable data.
000187260 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0
000187260 773__ $$y2015
000187260 8564_ $$uhttp://morrisriedel.de/sites/default/files/share/2015-01-21-BigData-PartOne-Riedel-Small-v1.pdf
000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.pdf$$yOpenAccess
000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.jpg?subformat=icon-144$$xicon-144$$yOpenAccess
000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000187260 909CO $$ooai:juser.fz-juelich.de:187260$$pdriver$$pVDB$$popen_access$$popenaire
000187260 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich GmbH$$b0$$kFZJ
000187260 9130_ $$0G:(DE-HGF)POF2-412$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vGrid Technologies and Infrastructures$$x0
000187260 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0
000187260 9141_ $$y2015
000187260 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000187260 920__ $$lyes
000187260 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000187260 980__ $$aconf
000187260 980__ $$aVDB
000187260 980__ $$aUNRESTRICTED
000187260 980__ $$aFullTexts
000187260 980__ $$aI:(DE-Juel1)JSC-20090406
000187260 9801_ $$aFullTexts