BestReg(BestReg) | R Documentation |
An efficient algorithm is used to find the combination of inputs which produces a multiple linear regression with the best AIC, BIC or UBIC.
BestReg(Xy.df, Criterion = "UBIC", verbose = TRUE)
Xy.df |
dataframe containing the design matrix X and the output variable y. All columns must be named. |
Criterion |
one "AIC", "BIC" or "UBIC" |
verbose |
TRUE, print extra information. FALSE, silent. |
An efficient branch-and-bound optimization algorithm is used to find the multiple linear regression model with k inputs which has the smallest residual sum of squares for k = 1, ..., p, where p is the total number of covariates, which is determined by one minus the number of columns in Xy.df. All models are assumed to include an intercept term. From this the best model using AIC, BIC or UBIC is easily found. Finally the best model with at least one input is compared with the model containing no inputs, that is, just a mean. If this model is better, it is selected.
lm object for the best fitting model
An information message is printed which indicates which inputs were selected. This is separate from the value.
A.I. McLeod
Chen, J. and Chen, Z. (2007). Extended Bayesian Information Criteria for Model Selection with Large Model Space. Preprint.
Furnival, G.M. and Wilson, R. W. (1974). Regressions by Leaps and Bounds Technometrics, 16, 499–511.
Miller, A. J. (1990), Subset Selection in Regression, London: Chapman and Hall.
#Example 1 #prostate data example data(prostate) BestReg(prostate) #Example 2 #using #See documentation for built-in dataset: "mtcars" #The output variable, mpg, is in the first column so we need to re-order. #There are 11 columns, put mpg last data(mtcars) mtcars.df<-mtcars[,c(2:11, 1)] ans1<-BestReg(mtcars.df) summary(ans1) ans2<-BestReg(mtcars.df, Criterion="BIC") summary(ans2) ans3<-BestReg(mtcars.df, Criterion="AIC") summary(ans3) #Example 3 #white noise test. UBIC selects none. AIC selects one. set.seed(32179) p<-10 #number of inputs n<-100 #number of observations X<-matrix(rnorm(n*p), ncol=p) y<-rnorm(n) Xy.df<-as.data.frame(cbind(X,y)) names(Xy.df)<-c(paste("X",1:p,sep=""),"y") ansUBIC<-BestReg(Xy.df) ansAIC<-BestReg(Xy.df, Criterion="AIC")