Scientific Workflow Optimization for Improved Peptide and Protein Identification

Holl, Sonja; Zimmermann, Olav; Mohammed, Yassene; Palmblad, Magnus

doi:10.1186/s12859-015-0714-x

Items
Marc 21

001			280903
005			20210129221437.0
024	7	_	\|a 10.1186/s12859-015-0714-x \|2 doi
024	7	_	\|a 2128/9723 \|2 Handle
024	7	_	\|a WOS:000360426000008 \|2 WOS
024	7	_	\|a altmetric:4467684 \|2 altmetric
024	7	_	\|a pmid:26335531 \|2 pmid
037	_	_	\|a FZJ-2016-00614
041	_	_	\|a English
082	_	_	\|a 004
100	1	_	\|a Holl, Sonja \|0 P:(DE-Juel1)132139 \|b 0 \|u fzj
245	_	_	\|a Scientific Workflow Optimization for Improved Peptide and Protein Identification
260	_	_	\|a London \|c 2015 \|b BioMed Central
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1453206659_25282 \|2 PUB:(DE-HGF)
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a article \|2 DRIVER
520	_	_	\|a Background: Peptide-spectrum matching is a common step in most data processing workflows for massspectrometry-based proteomics. Many algorithms and software packages, both free and commercial, have beendeveloped to address this task. However, these algorithms typically require the user to select instrument- andsample-dependent parameters, such as mass measurement error tolerances and number of missed enzymaticcleavages. In order to select the best algorithm and parameter set for a particular dataset, in-depth knowledgeabout the data as well as the algorithms themselves is needed. Most researchers therefore tend to use defaultparameters, which are not necessarily optimal.Results: We have applied a new optimization framework for the Taverna scientific workflow management system(http://ms-utils.org/Taverna_Optimization.pdf) to find the best combination of parameters for a given scientificworkflow to perform peptide-spectrum matching. The optimizations themselves are non-trivial, as demonstrated byseveral phenomena that can be observed when allowing for larger mass measurement errors in sequence databasesearches. On-the-fly parameter optimization embedded in scientific workflow management systems enables expertsand non-experts alike to extract the maximum amount of information from the data. The same workflows could beused for exploring the parameter space and compare algorithms, not only for peptide-spectrum matching, but alsofor other tasks, such as retention time prediction.Conclusion: Using the optimization framework, we were able to learn about how the data was acquired as well asthe explored algorithms. We observed a phenomenon identifying many ammonia-loss b-ion spectra as peptideswith N-terminal pyroglutamate and a large precursor mass measurement error. These insights could only be gainedwith the extension of the common range for the mass measurement error tolerance parameters explored by theoptimization framework.
536	_	_	\|a 511 - Computational Science and Mathematical Methods (POF3-511) \|0 G:(DE-HGF)POF3-511 \|c POF3-511 \|f POF III \|x 0
536	_	_	\|a 512 - Data-Intensive Science and Federated Computing (POF3-512) \|0 G:(DE-HGF)POF3-512 \|c POF3-512 \|f POF III \|x 1
700	1	_	\|a Mohammed, Yassene \|0 P:(DE-HGF)0 \|b 1
700	1	_	\|a Zimmermann, Olav \|0 P:(DE-Juel1)132307 \|b 2 \|u fzj
700	1	_	\|a Palmblad, Magnus \|0 P:(DE-HGF)0 \|b 3 \|e Corresponding author
773	_	_	\|a 10.1186/s12859-015-0714-x \|0 PERI:(DE-600)2041484-5 \|p 284 \|t BMC bioinformatics \|v 16 \|y 2015 \|x 1471-2105
856	4	_	\|y OpenAccess \|u https://juser.fz-juelich.de/record/280903/files/art%253A10.1186%252Fs12859-015-0714-x.pdf
856	4	_	\|y OpenAccess \|x icon \|u https://juser.fz-juelich.de/record/280903/files/art%253A10.1186%252Fs12859-015-0714-x.gif?subformat=icon
856	4	_	\|y OpenAccess \|x icon-1440 \|u https://juser.fz-juelich.de/record/280903/files/art%253A10.1186%252Fs12859-015-0714-x.jpg?subformat=icon-1440
856	4	_	\|y OpenAccess \|x icon-180 \|u https://juser.fz-juelich.de/record/280903/files/art%253A10.1186%252Fs12859-015-0714-x.jpg?subformat=icon-180
856	4	_	\|y OpenAccess \|x icon-640 \|u https://juser.fz-juelich.de/record/280903/files/art%253A10.1186%252Fs12859-015-0714-x.jpg?subformat=icon-640
856	4	_	\|y OpenAccess \|x pdfa \|u https://juser.fz-juelich.de/record/280903/files/art%253A10.1186%252Fs12859-015-0714-x.pdf?subformat=pdfa
909	C	O	\|o oai:juser.fz-juelich.de:280903 \|p openaire \|p open_access \|p driver \|p VDB \|p dnbdelivery
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)132139
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)132307
910	1	_	\|a External Institute \|0 I:(DE-HGF)0 \|k Extern \|b 3 \|6 P:(DE-HGF)0
913	1	_	\|a DE-HGF \|b Key Technologies \|1 G:(DE-HGF)POF3-510 \|0 G:(DE-HGF)POF3-511 \|2 G:(DE-HGF)POF3-500 \|v Computational Science and Mathematical Methods \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3 \|l Supercomputing & Big Data
913	1	_	\|a DE-HGF \|b Key Technologies \|1 G:(DE-HGF)POF3-510 \|0 G:(DE-HGF)POF3-512 \|2 G:(DE-HGF)POF3-500 \|v Data-Intensive Science and Federated Computing \|x 1 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3 \|l Supercomputing & Big Data
914	1	_	\|y 2015
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1050 \|2 StatID \|b BIOSIS Previews
915	_	_	\|a Creative Commons Attribution CC BY 4.0 \|0 LIC:(DE-HGF)CCBY4 \|2 HGFVOC
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b BMC BIOINFORMATICS : 2014
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0500 \|2 StatID \|b DOAJ
915	_	_	\|a IF < 5 \|0 StatID:(DE-HGF)9900 \|2 StatID
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0310 \|2 StatID \|b NCBI Molecular Biology Database
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0111 \|2 StatID \|b Science Citation Index Expanded
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Thomson Reuters Master Journal List
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	1	_	\|a UNRESTRICTED
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe