% Generated by roxygen2: do not edit by hand % Please edit documentation in R/CVE.R \name{cve} \alias{cve} \title{Conditional Variance Estimator (CVE).} \usage{ cve(formula, data, method = "mean", max.dim = 10L, ...) } \arguments{ \item{formula}{an object of class \code{"formula"} which is a symbolic description of the model to be fitted like \eqn{Y\sim X}{Y ~ X} where \eqn{Y} is a \eqn{n}-dimensional vector of the response variable and \eqn{X} is a \eqn{n\times p}{n x p} matrix of the predictors.} \item{data}{an optional data frame, containing the data for the formula if supplied like \code{data <- data.frame(Y, X)} with dimension \eqn{n \times (p + 1)}{n x (p + 1)}. By default the variables are taken from the environment from which \code{cve} is called.} \item{method}{This character string specifies the method of fitting. The options are \itemize{ \item \code{"mean"} method to estimate the mean subspace, see [1]. \item \code{"central"} ensemble method to estimate the central subspace, see [2]. \item \code{"weighted.mean"} variation of \code{"mean"} method with adaptive weighting of slices, see [1]. \item \code{"weighted.central"} variation of \code{"central"} method with adaptive weighting of slices, see [2]. }} \item{max.dim}{upper bounds for \code{k}, (ignored if \code{k} is supplied).} \item{...}{optional parameters passed on to \code{\link{cve.call}}.} } \value{ an S3 object of class \code{cve} with components: \describe{ \item{X}{design matrix of predictor vector used for calculating cve-estimate,} \item{Y}{\eqn{n}-dimensional vector of responses used for calculating cve-estimate,} \item{method}{Name of used method,} \item{call}{the matched call,} \item{res}{list of components \code{V, L, B, loss, h} for each \code{k = min.dim, ..., max.dim}. If \code{k} was supplied in the call \code{min.dim = max.dim = k}. \itemize{ \item \code{B} is the cve-estimate with dimension \eqn{p\times k}{p x k}. \item \code{V} is the orthogonal complement of \eqn{B}. \item \code{L} is the loss for each sample seperatels such that it's mean is \code{loss}. \item \code{loss} is the value of the target function that is minimized, evaluated at \eqn{V}. \item \code{h} bandwidth parameter used to calculate \code{B, V, loss, L}. } } } } \description{ This is the main function in the \code{CVE} package. It creates objects of class \code{"cve"} to estimate the mean subspace. Helper functions that require a \code{"cve"} object can then be applied to the output from this function. Conditional Variance Estimation (CVE) is a sufficient dimension reduction (SDR) method for regressions studying \eqn{E(Y|X)}, the conditional expectation of a response \eqn{Y} given a set of predictors \eqn{X}. This function provides methods for estimating the dimension and the subspace spanned by the columns of a \eqn{p\times k}{p x k} matrix \eqn{B} of minimal rank \eqn{k} such that \deqn{E(Y|X) = E(Y|B'X)} or, equivalently, \deqn{Y = g(B'X) + \epsilon} where \eqn{X} is independent of \eqn{\epsilon} with positive definite variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} is an unknown, continuous non-constant function, and \eqn{B = (b_1,..., b_k)} is a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}. Both the dimension \eqn{k} and the subspace \eqn{span(B)} are unknown. The CVE method makes very few assumptions. A kernel matrix \eqn{\hat{B}}{Bhat} is estimated such that the column space of \eqn{\hat{B}}{Bhat} should be close to the mean subspace \eqn{span(B)}. The primary output from this method is a set of orthonormal vectors, \eqn{\hat{B}}{Bhat}, whose span estimates \eqn{span(B)}. The method central implements the Ensemble Conditional Variance Estimation (ECVE) as described in [2]. It augments the CVE method by applying an ensemble of functions (parameter \code{func_list}) to the response to estimate the central subspace. This corresponds to the generalization \deqn{F(Y|X) = F(Y|B'X)} or, equivalently, \deqn{Y = g(B'X, \epsilon)} where \eqn{F} is the conditional cumulative distribution function. } \examples{ # set dimensions for simulation model p <- 5 k <- 2 # create B for simulation b1 <- rep(1 / sqrt(p), p) b2 <- (-1)^seq(1, p) / sqrt(p) B <- cbind(b1, b2) # sample size n <- 100 set.seed(21) # creat predictor data x ~ N(0, I_p) x <- matrix(rnorm(n * p), n, p) # simulate response variable # y = f(B'x) + err # with f(x1, x2) = x1^2 + 2 * x2 and err ~ N(0, 0.25^2) y <- (x \%*\% b1)^2 + 2 * (x \%*\% b2) + 0.25 * rnorm(n) # calculate cve with method 'mean' for k unknown in 1, ..., 3 cve.obj.s <- cve(y ~ x, max.dim = 2) # default method 'mean' # calculate cve with method 'weighed' for k = 2 cve.obj.w <- cve(y ~ x, k = 2, method = 'weighted.mean') B2 <- coef(cve.obj.s, k = 2) # get projected X data (same as cve.obj.s$X \%*\% B2) proj.X <- directions(cve.obj.s, k = 2) # plot y against projected data plot(proj.X[, 1], y) plot(proj.X[, 2], y) # creat 10 new x points and y according to model x.new <- matrix(rnorm(10 * p), 10, p) y.new <- (x.new \%*\% b1)^2 + 2 * (x.new \%*\% b2) + 0.25 * rnorm(10) # predict y.new yhat <- predict(cve.obj.s, x.new, 2) plot(y.new, yhat) # projection matrix on span(B) # same as B \%*\% t(B) since B is semi-orthogonal PB <- B \%*\% solve(t(B) \%*\% B) \%*\% t(B) # cve estimates for B with mean and weighted method B.s <- coef(cve.obj.s, k = 2) B.w <- coef(cve.obj.w, k = 2) # same as B.s \%*\% t(B.s) since B.s is semi-orthogonal (same vor B.w) PB.s <- B.s \%*\% solve(t(B.s) \%*\% B.s) \%*\% t(B.s) PB.w <- B.w \%*\% solve(t(B.w) \%*\% B.w) \%*\% t(B.w) # compare estimation accuracy of mean and weighted cve estimate by # Frobenius norm of difference of projections. norm(PB - PB.s, type = 'F') norm(PB - PB.w, type = 'F') } \references{ [1] Fertl, L. and Bura, E. (2021) "Conditional Variance Estimation for Sufficient Dimension Reduction" [2] Fertl, L. and Bura, E. (2021) "Ensemble Conditional Variance Estimation for Sufficient Dimension Reduction" } \seealso{ For a detailed description of \code{formula} see \code{\link{formula}}. }