000187261 001__ 187261 000187261 005__ 20210129214958.0 000187261 0247_ $$2Handle$$a2128/8313 000187261 037__ $$aFZJ-2015-00934 000187261 041__ $$aEnglish 000187261 1001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b0$$ufzj 000187261 1112_ $$aCy-Tera/LinkSCEEM HPC Administrator Workshop$$cNicosia$$d2015-01-19 - 2015-01-21$$wCyprus 000187261 245__ $$aBig Data in HPC, Hadoop, and HDFS - Part Two 000187261 260__ $$c2015 000187261 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1422430384_8068$$xInvited 000187261 3367_ $$033$$2EndNote$$aConference Paper 000187261 3367_ $$2DataCite$$aOther 000187261 3367_ $$2ORCID$$aLECTURE_SPEECH 000187261 3367_ $$2DRIVER$$aconferenceObject 000187261 3367_ $$2BibTeX$$aINPROCEEDINGS 000187261 520__ $$aOne of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of parallelization techniques. The talk differentiates between two paradigms that is on the one hand the massively parallel paradigm known in High Performance Computing (HPC) using techniques such as the Message Passing Interface (MPI) and OpenMP and on the other hand the map-reduce paradigm using rather pleasently parallel approaches. The first part of the talk focusses on 'Big Data in HPC' using two concrete codes as examples: (1) clustering using a parallel and scalable DBSCAN implementation and (2) classification using a parallel and scalable Support Vector Machine (SVM) implementation. The second part focusses on 'Big Data in Hadoop (based on the map-reduce processing paradigm) and its Hadoop Distributed File System (HDFS)' using known examples from text analysis such as wordcount. In between the material comparisons are given such as distributed filesystems vs. parallel filesystems or configuration elements important for HPC administrators. The talk ends with offering future topics in the context of big data analytics (e.g. in-situ analytics in exascale computing) or big data management challenges for reproducability of HPC & map-reduce runs required for future publications based on open referencable data. 000187261 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0 000187261 773__ $$y2015 000187261 8564_ $$uhttp://morrisriedel.de/sites/default/files/share/2015-01-21-BigData-PartTwo-Riedel-Small-v1.pdf 000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.pdf$$yOpenAccess 000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-144$$xicon-144$$yOpenAccess 000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-180$$xicon-180$$yOpenAccess 000187261 8564_ $$uhttps://juser.fz-juelich.de/record/187261/files/FZJ-2015-00934.jpg?subformat=icon-640$$xicon-640$$yOpenAccess 000187261 909CO $$ooai:juser.fz-juelich.de:187261$$pdriver$$pVDB$$popen_access$$popenaire 000187261 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich GmbH$$b0$$kFZJ 000187261 9130_ $$0G:(DE-HGF)POF2-412$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vGrid Technologies and Infrastructures$$x0 000187261 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0 000187261 9141_ $$y2015 000187261 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess 000187261 920__ $$lyes 000187261 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 000187261 980__ $$aconf 000187261 980__ $$aVDB 000187261 980__ $$aUNRESTRICTED 000187261 980__ $$aFullTexts 000187261 980__ $$aI:(DE-Juel1)JSC-20090406 000187261 9801_ $$aFullTexts