BestReg(BestReg)R Documentation

Best all subsets regression using AIC BIC or UBIC

Description

An efficient algorithm is used to find the combination of inputs which produces a multiple linear regression with the best AIC, BIC or UBIC.

Usage

BestReg(Xy.df, Criterion = "UBIC", verbose = TRUE)

Arguments

Xy.df dataframe containing the design matrix X and the output variable y. All columns must be named.
Criterion one "AIC", "BIC" or "UBIC"
verbose TRUE, print extra information. FALSE, silent.

Details

An efficient branch-and-bound optimization algorithm is used to find the multiple linear regression model with k inputs which has the smallest residual sum of squares for k = 1, ..., p, where p is the total number of covariates, which is determined by one minus the number of columns in Xy.df. All models are assumed to include an intercept term. From this the best model using AIC, BIC or UBIC is easily found. Finally the best model with at least one input is compared with the model containing no inputs, that is, just a mean. If this model is better, it is selected.

Value

lm object for the best fitting model

Note

An information message is printed which indicates which inputs were selected. This is separate from the value.

Author(s)

A.I. McLeod

References

Chen, J. and Chen, Z. (2007). Extended Bayesian Information Criteria for Model Selection with Large Model Space. Preprint.

Furnival, G.M. and Wilson, R. W. (1974). Regressions by Leaps and Bounds Technometrics, 16, 499–511.

Miller, A. J. (1990), Subset Selection in Regression, London: Chapman and Hall.

See Also

lm, leaps

Examples

#Example 1
#prostate data example
data(prostate)
BestReg(prostate)

#Example 2
#using 
#See documentation for built-in dataset: "mtcars"
#The output variable, mpg, is in the first column so we need to re-order.
#There are 11 columns, put mpg last
data(mtcars)
mtcars.df<-mtcars[,c(2:11, 1)]
ans1<-BestReg(mtcars.df)
summary(ans1)
ans2<-BestReg(mtcars.df, Criterion="BIC")
summary(ans2)
ans3<-BestReg(mtcars.df, Criterion="AIC")
summary(ans3)

#Example 3
#white noise test. UBIC selects none. AIC selects one.
set.seed(32179)
p<-10   #number of inputs
n<-100  #number of observations
X<-matrix(rnorm(n*p), ncol=p)
y<-rnorm(n)
Xy.df<-as.data.frame(cbind(X,y))
names(Xy.df)<-c(paste("X",1:p,sep=""),"y")
ansUBIC<-BestReg(Xy.df)
ansAIC<-BestReg(Xy.df, Criterion="AIC")

[Package BestReg version 1.1 Index]