Monthly
288 pp. per issue
6 x 9, illustrated
ISSN
0899-7667
E-ISSN
1530-888X
2014 Impact factor:
2.21

Neural Computation

November 2006, Vol. 18, No. 11, Pages 2762-2776
(doi: 10.1162/neco.2006.18.11.2762)
© 2006 Massachusetts Institute of Technology
On the Consistency of Bayesian Variable Selection for High Dimensional Binary Regression and Classification
Article PDF (130.02 KB)
Abstract

Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. We use a prior to select a limited number of candidate variables to enter the model, applying a popular method with selection indicators. We show that this approach can induce posterior estimates of the regression functions that are consistently estimating the truth, if the true regression model is sparse in the sense that the aggregated size of the regression coefficients are bounded. The estimated regression functions therefore can also produce consistent classifiers that are asymptotically optimal for predicting future binary outputs. These provide theoretical justifications for some recent empirical successes in microarray data analysis.