OptReg(OptReg)R Documentation

Best Subset Regression based on AIC BIC EBIC BICq and BICq with CVd

Description

An efficient algorithm is used to find the combination of inputs which produces a multiple linear regression with the minimum information criterion (IC): AIC, BIC, EBIC, or BICq.

Usage

OptReg(Xy=Xy,intercept=TRUE,IC = c("AIC","BIC","EBIC","BICq"),
method=c("exhaustive", "backward", "forward", "seqrep"), g=1, q=0.25, verbose = TRUE)

Arguments

Xy Dataframe containing the design matrix X and the output variable y. All columns must be named.
intercept If the value is TRUE, include the term of intercept. If FALSE, no intercept.
IC One or more information criteria: "AIC", "BIC", "EBIC", or "BICq".
method Method used to search the best subset.
g parameter gamma in EBIC. EBIC with g=0 is the same as BIC.
q parameter q in BICq. BICq with q=0.5 is the same as BIC.
verbose TRUE, print extra information. FALSE, silent.

Details

An efficient branch-and-bound optimization algorithm is used to find the multiple linear regression model with k inputs which has the smallest residual sum of squares for k = 1, ..., K, where K is the total number of covariates, which is determined by one minus the number of columns in Xy. Then the best model is found using information criterion: AIC, BIC, EBIC, or BICq.

Value

The value is a list:

object[k] : lm object for the best fitting model using the k-th IC

Note

An information message is printed which indicates which inputs were selected. This is separate from the value.

Author(s)

A.I. McLeod and C. Xu

References

Xu, C. and McLeod, A.I. (2009).Another Extended Bayesian Information Criteria for Model Selection, preprint.

Chen, J. and Chen, Z. (2008). Extended Bayesian Information Criteria for Model Selection with Large Model Space. Biometrika 2008 95: 759-771.

Furnival, G.M. and Wilson, R. W. (1974). Regressions by Leaps and Bounds Technometrics, 16, 499–511.

Miller, A. J. (1990), Subset Selection in Regression, London: Chapman and Hall.

See Also

lm, leaps

Examples

#Example 1
#prostate data example
data(prostate)
names(prostate)
prostate=prostate[,-ncol(prostate)]
OptReg(prostate)

#Example 2
#using dataset: "mtcars"
#The output variable, mpg, is in the first column. We need to re-order.
#There are 11 columns, put mpg last
data(mtcars)
mtcars.df<-mtcars[,c(2:11, 1)]
ans<-OptReg(mtcars.df)
summary(ans[[1]])
summary(ans[[2]])

#Example 3
#White noise test: AIC and BIC select one variable, while EBIC and BICq select none. 
set.seed(32179)
p<-10   #number of inputs
n<-100  #number of observations
X<-matrix(rnorm(n*p), ncol=p)
y<-rnorm(n)
Xy<-as.data.frame(cbind(X,y))
names(Xy)<-c(paste("X",1:p,sep=""),"y")
best=OptReg(Xy)

#Example 4 
require(MASS)
K=20; n=100; rho=0.2

set.seed(77777)
mu=rep(0,K)
Sigma <- matrix(rep(rho,K*K),K,K)+diag(rep(1-rho,K))
X=mvrnorm(n, mu, Sigma) 

#AIC and EBIC do not select the true variables.
#BIC and BICq with q=0.25 selects the true variables.
k=15
bs=rep(1,k)
y=as.vector(tcrossprod(t(bs),X[,1:k]))+rnorm(n)  #no intercept term
colnames(X)=paste("X",1:K,sep="")
Xy=data.frame(X,y=y) 
best=OptReg(Xy,intercept=FALSE)
#best=OptReg(Xy,IC=c("AIC","BIC","EBIC","BICq"), intercept=FALSE, q=c(0.15,0.25,0.75,0.8)

#Example 5
#subsets=BestSubsets(Xy)
#subsets=BestSubsets(Xy,intercept=FALSE)
#subsets
#qk(Xy, subsets=subsets, output.R2rms=TRUE)
#d=round(n*(1-1/(log(n)-1)))
#CVd(Xy,subsets[1,],leave.d=c(1,nrow(Xy)/3,nrow(Xy)/2))
#CVd.subsets(Xy,subsets,leave.d=c(1,nrow(Xy)/2))

[Package OptReg version 4.0 Index]