Effect of training sample size and classification difficulty on the accuracy of genomic predictors

Domenii publicaţii > Ştiinţe medicale + Tipuri publicaţii > Articol în revistã ştiinţificã

Autori: V. Popovici, W. Chen, B.G. Gallas, C. Hatzis, W. Shi, F.W. Samuelson, Y. Nikolsky, M. Tsyganova, A. Ishkin, T. Nikolskaya, K.R. Hess, V. Valero, D. Booser, M. Delorenzi, G.N. Hortobagyi, L. Shi, W.F. Symmans, L. Pusztai

Editorial: Breast Cancer Research, 12, p.R5, 2010.

Rezumat:

Introduction

As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically-relevant endpoints.
Methods

We used gene expression data from 230 breast cancers (grouped into training and independent validation sets) and we examined 40 predictors (five univariate feature selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set using two different resampling methods and compared with the accuracy observed in the independent validation set.
Results

A ranking of the three classification problems was obtained and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than the cross-validation estimates. The required sample size for each endpoint was estimated and both gene-level and pathway-level analyses were performed on the obtained models.
Conclusions

We showed that genomic predictor accuracy is largely determined by an interplay between sample size and classification difficulty. Variations on univariate feature selection methods and choice of classification algorithm have only a modest impact on predictor performance and several statistically equally good predictors can be developed for any given classification problem.

Cuvinte cheie: sample size, classifier, prediction, breast cancer, maqc

URL: http://breast-cancer-research.com/content/12/1/R5

Vlad Popovici
ianuarie 18, 2010
Niciun comentariu

Staff Login

Login Id

Password

Apel catre partide privind pozitia Romaniei in European Innovation Scoreboard

Poziția Asociației „Ad Astra” a Cercetătorilor din România privind încălcările tot mai frecvente ale eticii academice în publicațiile științifice

Observatii privind proiectul legii cercetatorului, aflat in discutie publica

Inscriere cercetatori

Premii Ad Astra

Effect of training sample size and classification difficulty on the accuracy of genomic predictors

Întrebări frecvente

Contacteaza-ne

Ajută-ne!

Staff Login

Login Id

Password

Search

Apel catre partide privind pozitia Romaniei in European Innovation Scoreboard

Poziția Asociației „Ad Astra” a Cercetătorilor din România privind încălcările tot mai frecvente ale eticii academice în publicațiile științifice

Observatii privind proiectul legii cercetatorului, aflat in discutie publica

Inscriere cercetatori

Premii Ad Astra

Effect of training sample size and classification difficulty on the accuracy of genomic predictors

Share

Întrebări frecvente

Contacteaza-ne

Ajută-ne!