000187260 001__ 187260 000187260 005__ 20210129214958.0 000187260 0247_ $$2Handle$$a2128/8312 000187260 037__ $$aFZJ-2015-00933 000187260 041__ $$aEnglish 000187260 1001_ $$0P:(DE-Juel1)132239$$aRiedel, Morris$$b0$$eCorresponding Author$$ufzj 000187260 1112_ $$aCy-Tera/LinkSCEEM HPC Administrator Workshop$$cNicosia$$d2015-01-19 - 2015-01-21$$wCyprus 000187260 245__ $$aBig Data in HPC, Hadoop, and HDFS - Part One 000187260 260__ $$c2015 000187260 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1422430335_8069$$xInvited 000187260 3367_ $$033$$2EndNote$$aConference Paper 000187260 3367_ $$2DataCite$$aOther 000187260 3367_ $$2ORCID$$aLECTURE_SPEECH 000187260 3367_ $$2DRIVER$$aconferenceObject 000187260 3367_ $$2BibTeX$$aINPROCEEDINGS 000187260 520__ $$aOne of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of parallelization techniques. The talk differentiates between two paradigms that is on the one hand the massively parallel paradigm known in High Performance Computing (HPC) using techniques such as the Message Passing Interface (MPI) and OpenMP and on the other hand the map-reduce paradigm using rather pleasently parallel approaches. The first part of the talk focusses on 'Big Data in HPC' using two concrete codes as examples: (1) clustering using a parallel and scalable DBSCAN implementation and (2) classification using a parallel and scalable Support Vector Machine (SVM) implementation. The second part focusses on 'Big Data in Hadoop (based on the map-reduce processing paradigm) and its Hadoop Distributed File System (HDFS)' using known examples from text analysis such as wordcount. In between the material comparisons are given such as distributed filesystems vs. parallel filesystems or configuration elements important for HPC administrators. The talk ends with offering future topics in the context of big data analytics (e.g. in-situ analytics in exascale computing) or big data management challenges for reproducability of HPC & map-reduce runs required for future publications based on open referencable data. 000187260 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0 000187260 773__ $$y2015 000187260 8564_ $$uhttp://morrisriedel.de/sites/default/files/share/2015-01-21-BigData-PartOne-Riedel-Small-v1.pdf 000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.pdf$$yOpenAccess 000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.jpg?subformat=icon-144$$xicon-144$$yOpenAccess 000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.jpg?subformat=icon-180$$xicon-180$$yOpenAccess 000187260 8564_ $$uhttps://juser.fz-juelich.de/record/187260/files/FZJ-2015-00933.jpg?subformat=icon-640$$xicon-640$$yOpenAccess 000187260 909CO $$ooai:juser.fz-juelich.de:187260$$pdriver$$pVDB$$popen_access$$popenaire 000187260 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132239$$aForschungszentrum Jülich GmbH$$b0$$kFZJ 000187260 9130_ $$0G:(DE-HGF)POF2-412$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vGrid Technologies and Infrastructures$$x0 000187260 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0 000187260 9141_ $$y2015 000187260 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess 000187260 920__ $$lyes 000187260 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 000187260 980__ $$aconf 000187260 980__ $$aVDB 000187260 980__ $$aUNRESTRICTED 000187260 980__ $$aFullTexts 000187260 980__ $$aI:(DE-Juel1)JSC-20090406 000187260 9801_ $$aFullTexts