diff --git a/CVE/DESCRIPTION b/CVE/DESCRIPTION index 04f5c23..7b32961 100644 --- a/CVE/DESCRIPTION +++ b/CVE/DESCRIPTION @@ -2,7 +2,7 @@ Package: CVE Type: Package Title: Conditional Variance Estimator for Sufficient Dimension Reduction Version: 0.2 -Date: 2019-11-13 +Date: 2019-12-20 Author: Daniel Kapla , Lukas Fertl Maintainer: Daniel Kapla Description: Implementation of the Conditional Variance Estimation (CVE) method. diff --git a/CVE/R/CVE.R b/CVE/R/CVE.R index e7fdc7b..d676ca1 100644 --- a/CVE/R/CVE.R +++ b/CVE/R/CVE.R @@ -2,16 +2,13 @@ #' #' Conditional Variance Estimation (CVE) is a novel sufficient dimension #' reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, -#' where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, +#' where \eqn{B'X} is a lower dimensional projection of the predictors and +#' \eqn{Y} is a univariate responce. CVE, #' similarly to its main competitor, the mean average variance estimation #' (MAVE), is not based on inverse regression, and does not require the #' restrictive linearity and constant variance conditions of moment based SDR #' methods. CVE is data-driven and applies to additive error regressions with -#' continuous predictors and link function. The effectiveness and accuracy of -#' CVE compared to MAVE and other SDR techniques is demonstrated in simulation -#' studies. CVE is shown to outperform MAVE in some model set-ups, while it -#' remains largely on par under most others. -#' Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real +#' continuous predictors and link function. Let \eqn{X} be a real #' \eqn{p}-dimensional covariate vector. We assume that the dependence of #' \eqn{Y} and \eqn{X} is modelled by #' \deqn{Y = g(B'X) + \epsilon} @@ -20,11 +17,11 @@ #' zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} #' is an unknown, continuous non-constant function, #' and \eqn{B = (b_1, ..., b_k)} is -#' a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}. +#' a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}. #' Without loss of generality \eqn{B} is assumed to be orthonormal. #' #' @author Daniel Kapla, Lukas Fertl, Bura Efstathia -#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance +#' @references Fertl, L. and Bura, E. (2019), Conditional Variance #' Estimation for Sufficient Dimension Reduction. Working Paper. #' #' @docType package @@ -33,7 +30,10 @@ #' Conditional Variance Estimator (CVE). #' -#' @inherit CVE-package description +#' This is the main function in the \code{CVE} package. It creates objects of +#' class \code{"cve"} to estimate the mean subspace. Helper functions that +#' require a \code{"cve"} object can then be applied to the output from this +#' function. #' #' @param formula an object of class \code{"formula"} which is a symbolic #' description of the model to be fitted like \eqn{Y\sim X}{Y ~ X} where @@ -46,13 +46,41 @@ #' @param method This character string specifies the method of fitting. The #' options are #' \itemize{ -#' \item "simple" implementation as described in the paper. +#' \item "simple" implementation, #' \item "weighted" variation with adaptive weighting of slices. #' } -#' see paper. +#' see Fertl, L. and Bura, E. (2019). #' @param max.dim upper bounds for \code{k}, (ignored if \code{k} is supplied). #' @param ... optional parameters passed on to \code{cve.call}. #' +#' +#' Conditional Variance Estimation (CVE) is a sufficient dimension reduction +#' (SDR) method for regressions studying \eqn{E(Y|X)}, the conditional +#' expectation of a response \eqn{Y} given a set of predictors \eqn{X}. This +#' function provides methods for estimating the dimension and the subspace +#' spanned by the columns of a \eqn{p\times k}{p x k} matrix \eqn{B} of minimal +#' rank \eqn{k} such that +#' \deqn{% +#' E(Y|X) = E(Y|B'X) % +#' } +#' or, equivalently, +#' \deqn{% +#' Y = g(B'X) + \epsilon % +#' } +#' where \eqn{X} is independent of \eqn{\epsilon} with positive definite +#' variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean +#' zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} +#' is an unknown, continuous non-constant function, and \eqn{B = (b_1,..., b_k)} +#' is a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}. +#' +#' Both the dimension \eqn{k} and the subspace \eqn{span(B)} are unknown. The +#' CVE method makes very few assumptions. +#' +#' A kernel matrix \eqn{\hat{B}}{Bhat} is estimated such that the column space +#' of \eqn{\hat{B}}{Bhat} should be close to the mean subspace \eqn{span(B)}. +#' The primary output from this method is a set of orthonormal vectors, +#' \eqn{\hat{B}}{Bhat}, whose span estimates \eqn{span(B)}. +#' #' @return an S3 object of class \code{cve} with components: #' \describe{ #' \item{X}{design matrix of predictor vector used for calculating @@ -130,7 +158,7 @@ #' #' @seealso For a detailed description of \code{formula} see #' \code{\link{formula}}. -#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance +#' @references Fertl, L. and Bura, E. (2019), Conditional Variance #' Estimation for Sufficient Dimension Reduction. Working Paper. #' #' @importFrom stats model.frame @@ -159,8 +187,8 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) { #' @inherit cve title #' @inherit cve description #' -#' @param X Design matrix with dimension \eqn{n\times p}{n x p}. -#' @param Y numeric array of length \eqn{n} of Responses. +#' @param X Design predictor matrix. +#' @param Y \eqn{n}-dimensional vector of responces. #' @param h bandwidth or function to estimate bandwidth, defaults to internaly #' estimated bandwidth. #' @param nObs parameter for choosing bandwidth \code{h} using @@ -193,7 +221,7 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) { #' @param V.init Semi-orthogonal matrix of dimensions `(ncol(X), ncol(X) - k) #' used as starting value in the optimization. (If supplied, #' \code{attempts} is set to 0 and \code{k} to match dimension). -#' @param logger a logger function (only for advanced user, slows down the +#' @param logger a logger function (only for advanced users, slows down the #' computation). #' #' @inherit cve return @@ -209,11 +237,11 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) { #' # Y = f(B'X) + err #' # with f(x1) = x1 and err ~ N(0, 0.25^2) #' Y <- X %*% B + 0.25 * rnorm(100) -#' +#' #' # calculate cve with method 'simple' for k = 1 #' set.seed(21) #' cve.obj.simple1 <- cve(Y ~ X, k = 1) -#' +#' #' # same as #' set.seed(21) #' cve.obj.simple2 <- cve.call(X, Y, k = 1) diff --git a/CVE/R/coef.R b/CVE/R/coef.R index 5bc63c9..8545d19 100644 --- a/CVE/R/coef.R +++ b/CVE/R/coef.R @@ -1,14 +1,14 @@ -#' Gets estimated SDR basis. +#' Extracts estimated SDR basis. #' #' Returns the SDR basis matrix for dimension \code{k}, i.e. returns the -#' cve-estimate with dimension \eqn{p\times k}{p x k}. +#' cve-estimate of \eqn{B} with dimension \eqn{p\times k}{p x k}. #' -#' @param object instance of \code{cve} as output from \code{\link{cve}} or -#' \code{\link{cve.call}}. +#' @param object an object of class \code{"cve"}, usually, a result of a call to +#' \code{\link{cve}} or \code{\link{cve.call}}. #' @param k the SDR dimension. -#' @param ... ignored. +#' @param ... ignored (no additional arguments). #' -#' @return dir the matrix of CS or CMS of given dimension +#' @return The matrix \eqn{B} of dimensions \eqn{p\times k}{p x k}. #' #' @examples #' # set dimensions for simulation model @@ -19,7 +19,7 @@ #' b1 <- rep(1 / sqrt(p), p) #' b2 <- (-1)^seq(1, p) / sqrt(p) #' B <- cbind(b1, b2) -#' +#' #' set.seed(21) #' # creat predictor data x ~ N(0, I_p) #' x <- matrix(rnorm(n * p), n, p) @@ -31,7 +31,7 @@ #' cve.obj <- cve(y ~ x, max.dim = 5) #' # get cve-estimate for B with dimensions (p, k = 2) #' B2 <- coef(cve.obj, k = 2) -#' +#' #' # Projection matrix on span(B) #' # equivalent to `B %*% t(B)` since B is semi-orthonormal #' PB <- B %*% solve(t(B) %*% B) %*% t(B) diff --git a/CVE/R/directions.R b/CVE/R/directions.R index ba648e5..23f1b8b 100644 --- a/CVE/R/directions.R +++ b/CVE/R/directions.R @@ -1,15 +1,17 @@ #' @export -directions <- function(dr, k) { +directions <- function(object, k, ...) { UseMethod("directions") } #' Computes projected training data \code{X} for given dimension `k`. #' -#' Projects the dimensional design matrix \eqn{X} on the columnspace of the -#' cve-estimate for given dimension \eqn{k}. +#' Returns \eqn{B'X}. That is the dimensional design matrix \eqn{X} on the +#' columnspace of the cve-estimate for given dimension \eqn{k}. #' -#' @param dr Instance of \code{'cve'} as returned by \code{\link{cve}}. +#' @param object an object of class \code{"cve"}, usually, a result of a call to +#' \code{\link{cve}} or \code{\link{cve.call}}. #' @param k SDR dimension to use for projection. +#' @param ... ignored (no additional arguments). #' #' @return the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B} #' is the cve-estimate for dimension \eqn{k}. @@ -32,12 +34,14 @@ directions <- function(dr, k) { #' # plot y against projected data #' plot(x.proj, y) #' +#' @seealso \code{\link{cve}} +#' #' @method directions cve #' @aliases directions directions.cve #' @export -directions.cve <- function(dr, k) { - if (!(k %in% names(dr$res))) { +directions.cve <- function(object, k, ...) { + if (!(k %in% names(object$res))) { stop("SDR directions for requested dimension `k` not computed.") } - return(dr$X %*% dr$res[[as.character(k)]]$B) + return(object$X %*% object$res[[as.character(k)]]$B) } diff --git a/CVE/R/estimateBandwidth.R b/CVE/R/estimateBandwidth.R index bc671c9..d0a3326 100644 --- a/CVE/R/estimateBandwidth.R +++ b/CVE/R/estimateBandwidth.R @@ -7,14 +7,14 @@ #' h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2} #' Alternative version 2 is used for dimension prediction which is given by #' \deqn{% -#' h = (2 * tr(\Sigma) / p) * \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{% +#' h = \frac{2 tr(\Sigma)}{p} \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{% #' h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))} -#' with \eqn{n} the sample size, \eqn{p} its dimension and the -#' covariance-matrix \eqn{\Sigma}, which is \code{(n-1)/n} times the sample -#' covariance estimate. +#' with \eqn{n} the sample size, \eqn{p} the dimension of \eqn{X} and +#' \eqn{\Sigma} is \eqn{(n - 1) / n} times the sample covariance matrix of +#' \eqn{X}. #' -#' @param X a \eqn{n\times p}{n x p} matrix with samples in its rows. -#' @param k Dimension of lower dimensional projection. +#' @param X the \eqn{n\times p}{n x p} matrix of predictor values. +#' @param k the SDR dimension. #' @param nObs number of points in a slice, only for version 2. #' @param version either \code{1} or \code{2}. #' diff --git a/CVE/R/plot.R b/CVE/R/plot.R index df3a43c..087d4b6 100644 --- a/CVE/R/plot.R +++ b/CVE/R/plot.R @@ -1,13 +1,16 @@ -#' Loss distribution elbow plot. +#' Elbow plot of the loss function. #' #' Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from #' \code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds -#' to \eqn{L_n(V, X_i)} where \eqn{V \in S(p, p - k)}{V} is the minimizer of -#' \eqn{L_n(V)}, for further details see the paper. +#' to \eqn{L_n(V, X_i)} where \eqn{V} is a stiefel manifold element as +#' minimizer of +#' \eqn{L_n(V)}, for further details see Fertl, L. and Bura, E. (2019). #' -#' @param x Object of class \code{"cve"} (result of [\code{\link{cve}}]). +#' @param x an object of class \code{"cve"}, usually, a result of a call to +#' \code{\link{cve}} or \code{\link{cve.call}}. #' @param ... Pass through parameters to [\code{\link{plot}}] and #' [\code{\link{lines}}] +#' #' @examples #' # create B for simulation #' B <- cbind(rep(1, 6), (-1)^seq(6)) / sqrt(6) @@ -34,7 +37,7 @@ #' # elbow plot #' plot(cve.obj.simple) #' -#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance +#' @references Fertl, L. and Bura, E. (2019), Conditional Variance #' Estimation for Sufficient Dimension Reduction. Working Paper. #' #' @seealso see \code{\link{par}} for graphical parameters to pass through diff --git a/CVE/R/predict.R b/CVE/R/predict.R index 703395f..10f6049 100644 --- a/CVE/R/predict.R +++ b/CVE/R/predict.R @@ -1,15 +1,15 @@ #' Predict method for CVE Fits. #' -#' Predict response using projected data where the forward model \eqn{g(B'X)} -#' is estimated using \code{\link{mars}}. +#' Predict response using projected data \eqn{B'C} by fitting +#' \eqn{g(B'C) + \epsilon} using \code{\link{mars}}. #' -#' @param object instance of class \code{cve} (result of \code{cve}, -#' \code{cve.call}). -#' @param newdata Matrix of the new data to be predicted. -#' @param dim dimension of SDR space to be used for data projecition. +#' @param object an object of class \code{"cve"}, usually, a result of a call to +#' \code{\link{cve}} or \code{\link{cve.call}}. +#' @param newdata Matrix of new predictor values, \eqn{C}. +#' @param k dimension of SDR space to be used for data projection. #' @param ... further arguments passed to \code{\link{mars}}. #' -#' @return prediced response of data \code{newdata}. +#' @return prediced response at \code{newdata}. #' #' @examples #' # create B for simulation @@ -44,11 +44,11 @@ #' @importFrom mda mars #' @method predict cve #' @export -predict.cve <- function(object, newdata, dim, ...) { +predict.cve <- function(object, newdata, k, ...) { if (missing(newdata)) { stop("No data supplied.") } - if (missing(dim)) { + if (missing(k)) { stop("No dimension supplied.") } @@ -56,7 +56,7 @@ predict.cve <- function(object, newdata, dim, ...) { newdata <- matrix(newdata, nrow = 1L) } - B <- object$res[[as.character(dim)]]$B + B <- object$res[[as.character(k)]]$B model <- mda::mars(object$X %*% B, object$Y) predict(model, newdata %*% B) diff --git a/CVE/R/predict_dim.R b/CVE/R/predict_dim.R index e127e97..3c8e4fa 100644 --- a/CVE/R/predict_dim.R +++ b/CVE/R/predict_dim.R @@ -126,19 +126,18 @@ predict_dim_wilcoxon <- function(object, p.value = 0.05) { #' #' This function estimates the dimension of the mean dimension reduction space, #' i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'} -#' performs cross-validation using \code{mars}. Given +#' performs l.o.o cross-validation using \code{mars}. Given #' \code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is -#' performed on the dataset \eqn{(Y i, B_k' X_i)_{i = 1, ..., n}} where -#' \eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate given -#' \eqn{k}. The estimated SDR dimension is the \eqn{k} where the -#' cross-validation mean squared error is the lowest. The method \code{'elbow'} -#' estimates the dimension via \eqn{k = argmin_k L_n(V_{p − k})} where -#' \eqn{V_{p − k}} is the CVE estimate of the orthogonal columnspace of -#' \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} but finds -#' the minimum using the wilcoxon-test. +#' performed on the dataset \eqn{(Y_i, B_k' X_i)_{i = 1, ..., n}} where +#' \eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate. The +#' estimated SDR dimension is the \eqn{k} where the +#' cross-validation mean squared error is minimal. The method \code{'elbow'} +#' estimates the dimension via \eqn{k = argmin_k L_n(V_{p - k})} where +#' \eqn{V_{p - k}} is space that is orthogonal to the columns-space of the CVE estimate of \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} +#' but finds the minimum using the wilcoxon-test. #' -#' @param object instance of class \code{cve} (result of \code{\link{cve}}, -#' \code{\link{cve.call}}). +#' @param object an object of class \code{"cve"}, usually, a result of a call to +#' \code{\link{cve}} or \code{\link{cve.call}}. #' @param method This parameter specify which method will be used in dimension #' estimation. It provides three methods \code{'CV'} (default), \code{'elbow'}, #' and \code{'wilcoxon'} to estimate the dimension of the SDR. diff --git a/CVE/man/CVE-package.Rd b/CVE/man/CVE-package.Rd index 6ee7bb4..5a5bedb 100644 --- a/CVE/man/CVE-package.Rd +++ b/CVE/man/CVE-package.Rd @@ -8,16 +8,13 @@ \description{ Conditional Variance Estimation (CVE) is a novel sufficient dimension reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, -where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, +where \eqn{B'X} is a lower dimensional projection of the predictors and +\eqn{Y} is a univariate responce. CVE, similarly to its main competitor, the mean average variance estimation (MAVE), is not based on inverse regression, and does not require the restrictive linearity and constant variance conditions of moment based SDR methods. CVE is data-driven and applies to additive error regressions with -continuous predictors and link function. The effectiveness and accuracy of -CVE compared to MAVE and other SDR techniques is demonstrated in simulation -studies. CVE is shown to outperform MAVE in some model set-ups, while it -remains largely on par under most others. -Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real +continuous predictors and link function. Let \eqn{X} be a real \eqn{p}-dimensional covariate vector. We assume that the dependence of \eqn{Y} and \eqn{X} is modelled by \deqn{Y = g(B'X) + \epsilon} @@ -26,11 +23,11 @@ variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} is an unknown, continuous non-constant function, and \eqn{B = (b_1, ..., b_k)} is -a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}. +a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}. Without loss of generality \eqn{B} is assumed to be orthonormal. } \references{ -Fertl Lukas, Bura Efstathia. (2019), Conditional Variance +Fertl, L. and Bura, E. (2019), Conditional Variance Estimation for Sufficient Dimension Reduction. Working Paper. } \author{ diff --git a/CVE/man/coef.cve.Rd b/CVE/man/coef.cve.Rd index c3c4cff..f02f9e8 100644 --- a/CVE/man/coef.cve.Rd +++ b/CVE/man/coef.cve.Rd @@ -2,24 +2,24 @@ % Please edit documentation in R/coef.R \name{coef.cve} \alias{coef.cve} -\title{Gets estimated SDR basis.} +\title{Extracts estimated SDR basis.} \usage{ \method{coef}{cve}(object, k, ...) } \arguments{ -\item{object}{instance of \code{cve} as output from \code{\link{cve}} or -\code{\link{cve.call}}.} +\item{object}{an object of class \code{"cve"}, usually, a result of a call to +\code{\link{cve}} or \code{\link{cve.call}}.} \item{k}{the SDR dimension.} -\item{...}{ignored.} +\item{...}{ignored (no additional arguments).} } \value{ -dir the matrix of CS or CMS of given dimension +The matrix \eqn{B} of dimensions \eqn{p\times k}{p x k}. } \description{ Returns the SDR basis matrix for dimension \code{k}, i.e. returns the -cve-estimate with dimension \eqn{p\times k}{p x k}. +cve-estimate of \eqn{B} with dimension \eqn{p\times k}{p x k}. } \examples{ # set dimensions for simulation model @@ -30,7 +30,7 @@ n <- 200 # samplesize b1 <- rep(1 / sqrt(p), p) b2 <- (-1)^seq(1, p) / sqrt(p) B <- cbind(b1, b2) - + set.seed(21) # creat predictor data x ~ N(0, I_p) x <- matrix(rnorm(n * p), n, p) @@ -42,7 +42,7 @@ y <- (x \%*\% b1)^2 + 2 * (x \%*\% b2) + 0.25 * rnorm(100) cve.obj <- cve(y ~ x, max.dim = 5) # get cve-estimate for B with dimensions (p, k = 2) B2 <- coef(cve.obj, k = 2) - + # Projection matrix on span(B) # equivalent to `B \%*\% t(B)` since B is semi-orthonormal PB <- B \%*\% solve(t(B) \%*\% B) \%*\% t(B) diff --git a/CVE/man/cve.Rd b/CVE/man/cve.Rd index 94f500c..1b4fd42 100644 --- a/CVE/man/cve.Rd +++ b/CVE/man/cve.Rd @@ -20,14 +20,42 @@ the environment from which \code{cve} is called.} \item{method}{This character string specifies the method of fitting. The options are \itemize{ - \item "simple" implementation as described in the paper. + \item "simple" implementation, \item "weighted" variation with adaptive weighting of slices. } -see paper.} +see Fertl, L. and Bura, E. (2019).} \item{max.dim}{upper bounds for \code{k}, (ignored if \code{k} is supplied).} -\item{...}{optional parameters passed on to \code{cve.call}.} +\item{...}{optional parameters passed on to \code{cve.call}. + + +Conditional Variance Estimation (CVE) is a sufficient dimension reduction +(SDR) method for regressions studying \eqn{E(Y|X)}, the conditional +expectation of a response \eqn{Y} given a set of predictors \eqn{X}. This +function provides methods for estimating the dimension and the subspace +spanned by the columns of a \eqn{p\times k}{p x k} matrix \eqn{B} of minimal +rank \eqn{k} such that +\deqn{% + E(Y|X) = E(Y|B'X) % +} +or, equivalently, +\deqn{% + Y = g(B'X) + \epsilon % +} +where \eqn{X} is independent of \eqn{\epsilon} with positive definite +variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean +zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} +is an unknown, continuous non-constant function, and \eqn{B = (b_1,..., b_k)} +is a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}. + +Both the dimension \eqn{k} and the subspace \eqn{span(B)} are unknown. The +CVE method makes very few assumptions. + +A kernel matrix \eqn{\hat{B}}{Bhat} is estimated such that the column space +of \eqn{\hat{B}}{Bhat} should be close to the mean subspace \eqn{span(B)}. +The primary output from this method is a set of orthonormal vectors, +\eqn{\hat{B}}{Bhat}, whose span estimates \eqn{span(B)}.} } \value{ an S3 object of class \code{cve} with components: @@ -56,28 +84,10 @@ an S3 object of class \code{cve} with components: } } \description{ -Conditional Variance Estimation (CVE) is a novel sufficient dimension -reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, -where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, -similarly to its main competitor, the mean average variance estimation -(MAVE), is not based on inverse regression, and does not require the -restrictive linearity and constant variance conditions of moment based SDR -methods. CVE is data-driven and applies to additive error regressions with -continuous predictors and link function. The effectiveness and accuracy of -CVE compared to MAVE and other SDR techniques is demonstrated in simulation -studies. CVE is shown to outperform MAVE in some model set-ups, while it -remains largely on par under most others. -Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real -\eqn{p}-dimensional covariate vector. We assume that the dependence of -\eqn{Y} and \eqn{X} is modelled by -\deqn{Y = g(B'X) + \epsilon} -where \eqn{X} is independent of \eqn{\epsilon} with positive definite -variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean -zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} -is an unknown, continuous non-constant function, -and \eqn{B = (b_1, ..., b_k)} is -a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}. -Without loss of generality \eqn{B} is assumed to be orthonormal. +This is the main function in the \code{CVE} package. It creates objects of +class \code{"cve"} to estimate the mean subspace. Helper functions that +require a \code{"cve"} object can then be applied to the output from this +function. } \examples{ # set dimensions for simulation model @@ -131,7 +141,7 @@ norm(PB - PB.w, type = 'F') } \references{ -Fertl Lukas, Bura Efstathia. (2019), Conditional Variance +Fertl, L. and Bura, E. (2019), Conditional Variance Estimation for Sufficient Dimension Reduction. Working Paper. } \seealso{ diff --git a/CVE/man/cve.call.Rd b/CVE/man/cve.call.Rd index f3bef08..5282136 100644 --- a/CVE/man/cve.call.Rd +++ b/CVE/man/cve.call.Rd @@ -10,9 +10,9 @@ cve.call(X, Y, method = "simple", nObs = sqrt(nrow(X)), h = NULL, max.iter = 50L, attempts = 10L, logger = NULL) } \arguments{ -\item{X}{Design matrix with dimension \eqn{n\times p}{n x p}.} +\item{X}{Design predictor matrix.} -\item{Y}{numeric array of length \eqn{n} of Responses.} +\item{Y}{\eqn{n}-dimensional vector of responces.} \item{method}{specifies the CVE method variation as one of \itemize{ @@ -60,7 +60,7 @@ used as starting value in the optimization. (If supplied, out \code{attempts} times with starting values drawn from the invariant measure on the Stiefel manifold (see \code{\link{rStiefel}}).} -\item{logger}{a logger function (only for advanced user, slows down the +\item{logger}{a logger function (only for advanced users, slows down the computation).} } \value{ @@ -90,28 +90,10 @@ an S3 object of class \code{cve} with components: } } \description{ -Conditional Variance Estimation (CVE) is a novel sufficient dimension -reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, -where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, -similarly to its main competitor, the mean average variance estimation -(MAVE), is not based on inverse regression, and does not require the -restrictive linearity and constant variance conditions of moment based SDR -methods. CVE is data-driven and applies to additive error regressions with -continuous predictors and link function. The effectiveness and accuracy of -CVE compared to MAVE and other SDR techniques is demonstrated in simulation -studies. CVE is shown to outperform MAVE in some model set-ups, while it -remains largely on par under most others. -Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real -\eqn{p}-dimensional covariate vector. We assume that the dependence of -\eqn{Y} and \eqn{X} is modelled by -\deqn{Y = g(B'X) + \epsilon} -where \eqn{X} is independent of \eqn{\epsilon} with positive definite -variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean -zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} -is an unknown, continuous non-constant function, -and \eqn{B = (b_1, ..., b_k)} is -a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}. -Without loss of generality \eqn{B} is assumed to be orthonormal. +This is the main function in the \code{CVE} package. It creates objects of +class \code{"cve"} to estimate the mean subspace. Helper functions that +require a \code{"cve"} object can then be applied to the output from this +function. } \examples{ # create B for simulation (k = 1) diff --git a/CVE/man/directions.cve.Rd b/CVE/man/directions.cve.Rd index acc8164..3d359f7 100644 --- a/CVE/man/directions.cve.Rd +++ b/CVE/man/directions.cve.Rd @@ -5,20 +5,23 @@ \alias{directions} \title{Computes projected training data \code{X} for given dimension `k`.} \usage{ -\method{directions}{cve}(dr, k) +\method{directions}{cve}(object, k, ...) } \arguments{ -\item{dr}{Instance of \code{'cve'} as returned by \code{\link{cve}}.} +\item{object}{an object of class \code{"cve"}, usually, a result of a call to +\code{\link{cve}} or \code{\link{cve.call}}.} \item{k}{SDR dimension to use for projection.} + +\item{...}{ignored (no additional arguments).} } \value{ the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B} is the cve-estimate for dimension \eqn{k}. } \description{ -Projects the dimensional design matrix \eqn{X} on the columnspace of the -cve-estimate for given dimension \eqn{k}. +Returns \eqn{B'X}. That is the dimensional design matrix \eqn{X} on the +columnspace of the cve-estimate for given dimension \eqn{k}. } \examples{ # create B for simulation (k = 1) @@ -39,3 +42,6 @@ x.proj <- directions(cve.obj.simple, k = 1) plot(x.proj, y) } +\seealso{ +\code{\link{cve}} +} diff --git a/CVE/man/estimate.bandwidth.Rd b/CVE/man/estimate.bandwidth.Rd index d7f6cbe..953335d 100644 --- a/CVE/man/estimate.bandwidth.Rd +++ b/CVE/man/estimate.bandwidth.Rd @@ -7,9 +7,9 @@ estimate.bandwidth(X, k, nObs, version = 1L) } \arguments{ -\item{X}{a \eqn{n\times p}{n x p} matrix with samples in its rows.} +\item{X}{the \eqn{n\times p}{n x p} matrix of predictor values.} -\item{k}{Dimension of lower dimensional projection.} +\item{k}{the SDR dimension.} \item{nObs}{number of points in a slice, only for version 2.} @@ -26,11 +26,11 @@ defaults to using the following formula (version 1) h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2} Alternative version 2 is used for dimension prediction which is given by \deqn{% - h = (2 * tr(\Sigma) / p) * \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{% + h = \frac{2 tr(\Sigma)}{p} \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{% h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))} -with \eqn{n} the sample size, \eqn{p} its dimension and the -covariance-matrix \eqn{\Sigma}, which is \code{(n-1)/n} times the sample -covariance estimate. +with \eqn{n} the sample size, \eqn{p} the dimension of \eqn{X} and +\eqn{\Sigma} is \eqn{(n - 1) / n} times the sample covariance matrix of +\eqn{X}. } \examples{ # set dimensions for simulation model diff --git a/CVE/man/plot.cve.Rd b/CVE/man/plot.cve.Rd index 18593ee..ef456e8 100644 --- a/CVE/man/plot.cve.Rd +++ b/CVE/man/plot.cve.Rd @@ -2,12 +2,13 @@ % Please edit documentation in R/plot.R \name{plot.cve} \alias{plot.cve} -\title{Loss distribution elbow plot.} +\title{Elbow plot of the loss function.} \usage{ \method{plot}{cve}(x, ...) } \arguments{ -\item{x}{Object of class \code{"cve"} (result of [\code{\link{cve}}]).} +\item{x}{an object of class \code{"cve"}, usually, a result of a call to +\code{\link{cve}} or \code{\link{cve.call}}.} \item{...}{Pass through parameters to [\code{\link{plot}}] and [\code{\link{lines}}]} @@ -15,8 +16,9 @@ \description{ Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from \code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds -to \eqn{L_n(V, X_i)} where \eqn{V \in S(p, p - k)}{V} is the minimizer of -\eqn{L_n(V)}, for further details see the paper. +to \eqn{L_n(V, X_i)} where \eqn{V} is a stiefel manifold element as +minimizer of +\eqn{L_n(V)}, for further details see Fertl, L. and Bura, E. (2019). } \examples{ # create B for simulation @@ -46,7 +48,7 @@ plot(cve.obj.simple) } \references{ -Fertl Lukas, Bura Efstathia. (2019), Conditional Variance +Fertl, L. and Bura, E. (2019), Conditional Variance Estimation for Sufficient Dimension Reduction. Working Paper. } \seealso{ diff --git a/CVE/man/predict.cve.Rd b/CVE/man/predict.cve.Rd index 95813e6..64759a1 100644 --- a/CVE/man/predict.cve.Rd +++ b/CVE/man/predict.cve.Rd @@ -4,24 +4,24 @@ \alias{predict.cve} \title{Predict method for CVE Fits.} \usage{ -\method{predict}{cve}(object, newdata, dim, ...) +\method{predict}{cve}(object, newdata, k, ...) } \arguments{ -\item{object}{instance of class \code{cve} (result of \code{cve}, -\code{cve.call}).} +\item{object}{an object of class \code{"cve"}, usually, a result of a call to +\code{\link{cve}} or \code{\link{cve.call}}.} -\item{newdata}{Matrix of the new data to be predicted.} +\item{newdata}{Matrix of new predictor values, \eqn{C}.} -\item{dim}{dimension of SDR space to be used for data projecition.} +\item{k}{dimension of SDR space to be used for data projection.} \item{...}{further arguments passed to \code{\link{mars}}.} } \value{ -prediced response of data \code{newdata}. +prediced response at \code{newdata}. } \description{ -Predict response using projected data where the forward model \eqn{g(B'X)} -is estimated using \code{\link{mars}}. +Predict response using projected data \eqn{B'C} by fitting +\eqn{g(B'C) + \epsilon} using \code{\link{mars}}. } \examples{ # create B for simulation diff --git a/CVE/man/predict_dim.Rd b/CVE/man/predict_dim.Rd index 45f4353..50475f6 100644 --- a/CVE/man/predict_dim.Rd +++ b/CVE/man/predict_dim.Rd @@ -7,8 +7,8 @@ predict_dim(object, ..., method = "CV") } \arguments{ -\item{object}{instance of class \code{cve} (result of \code{\link{cve}}, -\code{\link{cve.call}}).} +\item{object}{an object of class \code{"cve"}, usually, a result of a call to +\code{\link{cve}} or \code{\link{cve.call}}.} \item{...}{ignored.} @@ -26,16 +26,15 @@ list with \description{ This function estimates the dimension of the mean dimension reduction space, i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'} -performs cross-validation using \code{mars}. Given +performs l.o.o cross-validation using \code{mars}. Given \code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is -performed on the dataset \eqn{(Y i, B_k' X_i)_{i = 1, ..., n}} where -\eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate given -\eqn{k}. The estimated SDR dimension is the \eqn{k} where the -cross-validation mean squared error is the lowest. The method \code{'elbow'} -estimates the dimension via \eqn{k = argmin_k L_n(V_{p − k})} where -\eqn{V_{p − k}} is the CVE estimate of the orthogonal columnspace of -\eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} but finds -the minimum using the wilcoxon-test. +performed on the dataset \eqn{(Y_i, B_k' X_i)_{i = 1, ..., n}} where +\eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate. The +estimated SDR dimension is the \eqn{k} where the +cross-validation mean squared error is minimal. The method \code{'elbow'} +estimates the dimension via \eqn{k = argmin_k L_n(V_{p - k})} where +\eqn{V_{p - k}} is space that is orthogonal to the columns-space of the CVE estimate of \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} +but finds the minimum using the wilcoxon-test. } \examples{ # create B for simulation