Journal Article PreJuSER-54088

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Classification of Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning Techniques

 ;  ;  ;  ;

2007

Journal of Chemical Information and Modeling 47, 92 - 103 () [10.1021/ci6002619]

This record in other databases:      

Please use a persistent id in citations: doi:

Abstract: In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.

Keyword(s): Algorithms (MeSH) ; Artificial Intelligence (MeSH) ; Costs and Cost Analysis (MeSH) ; Cytochrome P-450 Enzyme System (MeSH) ; Databases, Factual: classification (MeSH) ; Drug Evaluation, Preclinical: methods (MeSH) ; Pharmaceutical Preparations (MeSH) ; Pharmaceutical Preparations ; Cytochrome P-450 Enzyme System ; J


Note: Record converted from VDB: 12.11.2012

Contributing Institute(s):
  1. Zentralinstitut für Angewandte Mathematik (ZAM)
Research Program(s):
  1. Scientific Computing (P41)

Appears in the scientific report 2007
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Workflow collections > Public records
Institute Collections > JSC
Publications database

 Record created 2012-11-13, last modified 2018-02-11



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)