2
0
Fork 0

- wip: reviewing and fixing doc.

This commit is contained in:
Daniel Kapla 2019-12-20 09:40:46 +01:00
parent b71898a5bc
commit 5d3a0ca18d
17 changed files with 196 additions and 166 deletions

View File

@ -2,7 +2,7 @@ Package: CVE
Type: Package Type: Package
Title: Conditional Variance Estimator for Sufficient Dimension Reduction Title: Conditional Variance Estimator for Sufficient Dimension Reduction
Version: 0.2 Version: 0.2
Date: 2019-11-13 Date: 2019-12-20
Author: Daniel Kapla <daniel@kapla.at>, Lukas Fertl <lukas.fertl@chello.at> Author: Daniel Kapla <daniel@kapla.at>, Lukas Fertl <lukas.fertl@chello.at>
Maintainer: Daniel Kapla <daniel@kapla.at> Maintainer: Daniel Kapla <daniel@kapla.at>
Description: Implementation of the Conditional Variance Estimation (CVE) method. Description: Implementation of the Conditional Variance Estimation (CVE) method.

View File

@ -2,16 +2,13 @@
#' #'
#' Conditional Variance Estimation (CVE) is a novel sufficient dimension #' Conditional Variance Estimation (CVE) is a novel sufficient dimension
#' reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, #' reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)},
#' where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, #' where \eqn{B'X} is a lower dimensional projection of the predictors and
#' \eqn{Y} is a univariate responce. CVE,
#' similarly to its main competitor, the mean average variance estimation #' similarly to its main competitor, the mean average variance estimation
#' (MAVE), is not based on inverse regression, and does not require the #' (MAVE), is not based on inverse regression, and does not require the
#' restrictive linearity and constant variance conditions of moment based SDR #' restrictive linearity and constant variance conditions of moment based SDR
#' methods. CVE is data-driven and applies to additive error regressions with #' methods. CVE is data-driven and applies to additive error regressions with
#' continuous predictors and link function. The effectiveness and accuracy of #' continuous predictors and link function. Let \eqn{X} be a real
#' CVE compared to MAVE and other SDR techniques is demonstrated in simulation
#' studies. CVE is shown to outperform MAVE in some model set-ups, while it
#' remains largely on par under most others.
#' Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
#' \eqn{p}-dimensional covariate vector. We assume that the dependence of #' \eqn{p}-dimensional covariate vector. We assume that the dependence of
#' \eqn{Y} and \eqn{X} is modelled by #' \eqn{Y} and \eqn{X} is modelled by
#' \deqn{Y = g(B'X) + \epsilon} #' \deqn{Y = g(B'X) + \epsilon}
@ -20,11 +17,11 @@
#' zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} #' zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
#' is an unknown, continuous non-constant function, #' is an unknown, continuous non-constant function,
#' and \eqn{B = (b_1, ..., b_k)} is #' and \eqn{B = (b_1, ..., b_k)} is
#' a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}. #' a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
#' Without loss of generality \eqn{B} is assumed to be orthonormal. #' Without loss of generality \eqn{B} is assumed to be orthonormal.
#' #'
#' @author Daniel Kapla, Lukas Fertl, Bura Efstathia #' @author Daniel Kapla, Lukas Fertl, Bura Efstathia
#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance #' @references Fertl, L. and Bura, E. (2019), Conditional Variance
#' Estimation for Sufficient Dimension Reduction. Working Paper. #' Estimation for Sufficient Dimension Reduction. Working Paper.
#' #'
#' @docType package #' @docType package
@ -33,7 +30,10 @@
#' Conditional Variance Estimator (CVE). #' Conditional Variance Estimator (CVE).
#' #'
#' @inherit CVE-package description #' This is the main function in the \code{CVE} package. It creates objects of
#' class \code{"cve"} to estimate the mean subspace. Helper functions that
#' require a \code{"cve"} object can then be applied to the output from this
#' function.
#' #'
#' @param formula an object of class \code{"formula"} which is a symbolic #' @param formula an object of class \code{"formula"} which is a symbolic
#' description of the model to be fitted like \eqn{Y\sim X}{Y ~ X} where #' description of the model to be fitted like \eqn{Y\sim X}{Y ~ X} where
@ -46,13 +46,41 @@
#' @param method This character string specifies the method of fitting. The #' @param method This character string specifies the method of fitting. The
#' options are #' options are
#' \itemize{ #' \itemize{
#' \item "simple" implementation as described in the paper. #' \item "simple" implementation,
#' \item "weighted" variation with adaptive weighting of slices. #' \item "weighted" variation with adaptive weighting of slices.
#' } #' }
#' see paper. #' see Fertl, L. and Bura, E. (2019).
#' @param max.dim upper bounds for \code{k}, (ignored if \code{k} is supplied). #' @param max.dim upper bounds for \code{k}, (ignored if \code{k} is supplied).
#' @param ... optional parameters passed on to \code{cve.call}. #' @param ... optional parameters passed on to \code{cve.call}.
#' #'
#'
#' Conditional Variance Estimation (CVE) is a sufficient dimension reduction
#' (SDR) method for regressions studying \eqn{E(Y|X)}, the conditional
#' expectation of a response \eqn{Y} given a set of predictors \eqn{X}. This
#' function provides methods for estimating the dimension and the subspace
#' spanned by the columns of a \eqn{p\times k}{p x k} matrix \eqn{B} of minimal
#' rank \eqn{k} such that
#' \deqn{%
#' E(Y|X) = E(Y|B'X) %
#' }
#' or, equivalently,
#' \deqn{%
#' Y = g(B'X) + \epsilon %
#' }
#' where \eqn{X} is independent of \eqn{\epsilon} with positive definite
#' variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
#' zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
#' is an unknown, continuous non-constant function, and \eqn{B = (b_1,..., b_k)}
#' is a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
#'
#' Both the dimension \eqn{k} and the subspace \eqn{span(B)} are unknown. The
#' CVE method makes very few assumptions.
#'
#' A kernel matrix \eqn{\hat{B}}{Bhat} is estimated such that the column space
#' of \eqn{\hat{B}}{Bhat} should be close to the mean subspace \eqn{span(B)}.
#' The primary output from this method is a set of orthonormal vectors,
#' \eqn{\hat{B}}{Bhat}, whose span estimates \eqn{span(B)}.
#'
#' @return an S3 object of class \code{cve} with components: #' @return an S3 object of class \code{cve} with components:
#' \describe{ #' \describe{
#' \item{X}{design matrix of predictor vector used for calculating #' \item{X}{design matrix of predictor vector used for calculating
@ -130,7 +158,7 @@
#' #'
#' @seealso For a detailed description of \code{formula} see #' @seealso For a detailed description of \code{formula} see
#' \code{\link{formula}}. #' \code{\link{formula}}.
#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance #' @references Fertl, L. and Bura, E. (2019), Conditional Variance
#' Estimation for Sufficient Dimension Reduction. Working Paper. #' Estimation for Sufficient Dimension Reduction. Working Paper.
#' #'
#' @importFrom stats model.frame #' @importFrom stats model.frame
@ -159,8 +187,8 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) {
#' @inherit cve title #' @inherit cve title
#' @inherit cve description #' @inherit cve description
#' #'
#' @param X Design matrix with dimension \eqn{n\times p}{n x p}. #' @param X Design predictor matrix.
#' @param Y numeric array of length \eqn{n} of Responses. #' @param Y \eqn{n}-dimensional vector of responces.
#' @param h bandwidth or function to estimate bandwidth, defaults to internaly #' @param h bandwidth or function to estimate bandwidth, defaults to internaly
#' estimated bandwidth. #' estimated bandwidth.
#' @param nObs parameter for choosing bandwidth \code{h} using #' @param nObs parameter for choosing bandwidth \code{h} using
@ -193,7 +221,7 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) {
#' @param V.init Semi-orthogonal matrix of dimensions `(ncol(X), ncol(X) - k) #' @param V.init Semi-orthogonal matrix of dimensions `(ncol(X), ncol(X) - k)
#' used as starting value in the optimization. (If supplied, #' used as starting value in the optimization. (If supplied,
#' \code{attempts} is set to 0 and \code{k} to match dimension). #' \code{attempts} is set to 0 and \code{k} to match dimension).
#' @param logger a logger function (only for advanced user, slows down the #' @param logger a logger function (only for advanced users, slows down the
#' computation). #' computation).
#' #'
#' @inherit cve return #' @inherit cve return
@ -209,11 +237,11 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) {
#' # Y = f(B'X) + err #' # Y = f(B'X) + err
#' # with f(x1) = x1 and err ~ N(0, 0.25^2) #' # with f(x1) = x1 and err ~ N(0, 0.25^2)
#' Y <- X %*% B + 0.25 * rnorm(100) #' Y <- X %*% B + 0.25 * rnorm(100)
#' #'
#' # calculate cve with method 'simple' for k = 1 #' # calculate cve with method 'simple' for k = 1
#' set.seed(21) #' set.seed(21)
#' cve.obj.simple1 <- cve(Y ~ X, k = 1) #' cve.obj.simple1 <- cve(Y ~ X, k = 1)
#' #'
#' # same as #' # same as
#' set.seed(21) #' set.seed(21)
#' cve.obj.simple2 <- cve.call(X, Y, k = 1) #' cve.obj.simple2 <- cve.call(X, Y, k = 1)

View File

@ -1,14 +1,14 @@
#' Gets estimated SDR basis. #' Extracts estimated SDR basis.
#' #'
#' Returns the SDR basis matrix for dimension \code{k}, i.e. returns the #' Returns the SDR basis matrix for dimension \code{k}, i.e. returns the
#' cve-estimate with dimension \eqn{p\times k}{p x k}. #' cve-estimate of \eqn{B} with dimension \eqn{p\times k}{p x k}.
#' #'
#' @param object instance of \code{cve} as output from \code{\link{cve}} or #' @param object an object of class \code{"cve"}, usually, a result of a call to
#' \code{\link{cve.call}}. #' \code{\link{cve}} or \code{\link{cve.call}}.
#' @param k the SDR dimension. #' @param k the SDR dimension.
#' @param ... ignored. #' @param ... ignored (no additional arguments).
#' #'
#' @return dir the matrix of CS or CMS of given dimension #' @return The matrix \eqn{B} of dimensions \eqn{p\times k}{p x k}.
#' #'
#' @examples #' @examples
#' # set dimensions for simulation model #' # set dimensions for simulation model
@ -19,7 +19,7 @@
#' b1 <- rep(1 / sqrt(p), p) #' b1 <- rep(1 / sqrt(p), p)
#' b2 <- (-1)^seq(1, p) / sqrt(p) #' b2 <- (-1)^seq(1, p) / sqrt(p)
#' B <- cbind(b1, b2) #' B <- cbind(b1, b2)
#' #'
#' set.seed(21) #' set.seed(21)
#' # creat predictor data x ~ N(0, I_p) #' # creat predictor data x ~ N(0, I_p)
#' x <- matrix(rnorm(n * p), n, p) #' x <- matrix(rnorm(n * p), n, p)
@ -31,7 +31,7 @@
#' cve.obj <- cve(y ~ x, max.dim = 5) #' cve.obj <- cve(y ~ x, max.dim = 5)
#' # get cve-estimate for B with dimensions (p, k = 2) #' # get cve-estimate for B with dimensions (p, k = 2)
#' B2 <- coef(cve.obj, k = 2) #' B2 <- coef(cve.obj, k = 2)
#' #'
#' # Projection matrix on span(B) #' # Projection matrix on span(B)
#' # equivalent to `B %*% t(B)` since B is semi-orthonormal #' # equivalent to `B %*% t(B)` since B is semi-orthonormal
#' PB <- B %*% solve(t(B) %*% B) %*% t(B) #' PB <- B %*% solve(t(B) %*% B) %*% t(B)

View File

@ -1,15 +1,17 @@
#' @export #' @export
directions <- function(dr, k) { directions <- function(object, k, ...) {
UseMethod("directions") UseMethod("directions")
} }
#' Computes projected training data \code{X} for given dimension `k`. #' Computes projected training data \code{X} for given dimension `k`.
#' #'
#' Projects the dimensional design matrix \eqn{X} on the columnspace of the #' Returns \eqn{B'X}. That is the dimensional design matrix \eqn{X} on the
#' cve-estimate for given dimension \eqn{k}. #' columnspace of the cve-estimate for given dimension \eqn{k}.
#' #'
#' @param dr Instance of \code{'cve'} as returned by \code{\link{cve}}. #' @param object an object of class \code{"cve"}, usually, a result of a call to
#' \code{\link{cve}} or \code{\link{cve.call}}.
#' @param k SDR dimension to use for projection. #' @param k SDR dimension to use for projection.
#' @param ... ignored (no additional arguments).
#' #'
#' @return the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B} #' @return the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B}
#' is the cve-estimate for dimension \eqn{k}. #' is the cve-estimate for dimension \eqn{k}.
@ -32,12 +34,14 @@ directions <- function(dr, k) {
#' # plot y against projected data #' # plot y against projected data
#' plot(x.proj, y) #' plot(x.proj, y)
#' #'
#' @seealso \code{\link{cve}}
#'
#' @method directions cve #' @method directions cve
#' @aliases directions directions.cve #' @aliases directions directions.cve
#' @export #' @export
directions.cve <- function(dr, k) { directions.cve <- function(object, k, ...) {
if (!(k %in% names(dr$res))) { if (!(k %in% names(object$res))) {
stop("SDR directions for requested dimension `k` not computed.") stop("SDR directions for requested dimension `k` not computed.")
} }
return(dr$X %*% dr$res[[as.character(k)]]$B) return(object$X %*% object$res[[as.character(k)]]$B)
} }

View File

@ -7,14 +7,14 @@
#' h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2} #' h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2}
#' Alternative version 2 is used for dimension prediction which is given by #' Alternative version 2 is used for dimension prediction which is given by
#' \deqn{% #' \deqn{%
#' h = (2 * tr(\Sigma) / p) * \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{% #' h = \frac{2 tr(\Sigma)}{p} \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{%
#' h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))} #' h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))}
#' with \eqn{n} the sample size, \eqn{p} its dimension and the #' with \eqn{n} the sample size, \eqn{p} the dimension of \eqn{X} and
#' covariance-matrix \eqn{\Sigma}, which is \code{(n-1)/n} times the sample #' \eqn{\Sigma} is \eqn{(n - 1) / n} times the sample covariance matrix of
#' covariance estimate. #' \eqn{X}.
#' #'
#' @param X a \eqn{n\times p}{n x p} matrix with samples in its rows. #' @param X the \eqn{n\times p}{n x p} matrix of predictor values.
#' @param k Dimension of lower dimensional projection. #' @param k the SDR dimension.
#' @param nObs number of points in a slice, only for version 2. #' @param nObs number of points in a slice, only for version 2.
#' @param version either \code{1} or \code{2}. #' @param version either \code{1} or \code{2}.
#' #'

View File

@ -1,13 +1,16 @@
#' Loss distribution elbow plot. #' Elbow plot of the loss function.
#' #'
#' Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from #' Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from
#' \code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds #' \code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds
#' to \eqn{L_n(V, X_i)} where \eqn{V \in S(p, p - k)}{V} is the minimizer of #' to \eqn{L_n(V, X_i)} where \eqn{V} is a stiefel manifold element as
#' \eqn{L_n(V)}, for further details see the paper. #' minimizer of
#' \eqn{L_n(V)}, for further details see Fertl, L. and Bura, E. (2019).
#' #'
#' @param x Object of class \code{"cve"} (result of [\code{\link{cve}}]). #' @param x an object of class \code{"cve"}, usually, a result of a call to
#' \code{\link{cve}} or \code{\link{cve.call}}.
#' @param ... Pass through parameters to [\code{\link{plot}}] and #' @param ... Pass through parameters to [\code{\link{plot}}] and
#' [\code{\link{lines}}] #' [\code{\link{lines}}]
#'
#' @examples #' @examples
#' # create B for simulation #' # create B for simulation
#' B <- cbind(rep(1, 6), (-1)^seq(6)) / sqrt(6) #' B <- cbind(rep(1, 6), (-1)^seq(6)) / sqrt(6)
@ -34,7 +37,7 @@
#' # elbow plot #' # elbow plot
#' plot(cve.obj.simple) #' plot(cve.obj.simple)
#' #'
#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance #' @references Fertl, L. and Bura, E. (2019), Conditional Variance
#' Estimation for Sufficient Dimension Reduction. Working Paper. #' Estimation for Sufficient Dimension Reduction. Working Paper.
#' #'
#' @seealso see \code{\link{par}} for graphical parameters to pass through #' @seealso see \code{\link{par}} for graphical parameters to pass through

View File

@ -1,15 +1,15 @@
#' Predict method for CVE Fits. #' Predict method for CVE Fits.
#' #'
#' Predict response using projected data where the forward model \eqn{g(B'X)} #' Predict response using projected data \eqn{B'C} by fitting
#' is estimated using \code{\link{mars}}. #' \eqn{g(B'C) + \epsilon} using \code{\link{mars}}.
#' #'
#' @param object instance of class \code{cve} (result of \code{cve}, #' @param object an object of class \code{"cve"}, usually, a result of a call to
#' \code{cve.call}). #' \code{\link{cve}} or \code{\link{cve.call}}.
#' @param newdata Matrix of the new data to be predicted. #' @param newdata Matrix of new predictor values, \eqn{C}.
#' @param dim dimension of SDR space to be used for data projecition. #' @param k dimension of SDR space to be used for data projection.
#' @param ... further arguments passed to \code{\link{mars}}. #' @param ... further arguments passed to \code{\link{mars}}.
#' #'
#' @return prediced response of data \code{newdata}. #' @return prediced response at \code{newdata}.
#' #'
#' @examples #' @examples
#' # create B for simulation #' # create B for simulation
@ -44,11 +44,11 @@
#' @importFrom mda mars #' @importFrom mda mars
#' @method predict cve #' @method predict cve
#' @export #' @export
predict.cve <- function(object, newdata, dim, ...) { predict.cve <- function(object, newdata, k, ...) {
if (missing(newdata)) { if (missing(newdata)) {
stop("No data supplied.") stop("No data supplied.")
} }
if (missing(dim)) { if (missing(k)) {
stop("No dimension supplied.") stop("No dimension supplied.")
} }
@ -56,7 +56,7 @@ predict.cve <- function(object, newdata, dim, ...) {
newdata <- matrix(newdata, nrow = 1L) newdata <- matrix(newdata, nrow = 1L)
} }
B <- object$res[[as.character(dim)]]$B B <- object$res[[as.character(k)]]$B
model <- mda::mars(object$X %*% B, object$Y) model <- mda::mars(object$X %*% B, object$Y)
predict(model, newdata %*% B) predict(model, newdata %*% B)

View File

@ -126,19 +126,18 @@ predict_dim_wilcoxon <- function(object, p.value = 0.05) {
#' #'
#' This function estimates the dimension of the mean dimension reduction space, #' This function estimates the dimension of the mean dimension reduction space,
#' i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'} #' i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'}
#' performs cross-validation using \code{mars}. Given #' performs l.o.o cross-validation using \code{mars}. Given
#' \code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is #' \code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is
#' performed on the dataset \eqn{(Y i, B_k' X_i)_{i = 1, ..., n}} where #' performed on the dataset \eqn{(Y_i, B_k' X_i)_{i = 1, ..., n}} where
#' \eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate given #' \eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate. The
#' \eqn{k}. The estimated SDR dimension is the \eqn{k} where the #' estimated SDR dimension is the \eqn{k} where the
#' cross-validation mean squared error is the lowest. The method \code{'elbow'} #' cross-validation mean squared error is minimal. The method \code{'elbow'}
#' estimates the dimension via \eqn{k = argmin_k L_n(V_{p k})} where #' estimates the dimension via \eqn{k = argmin_k L_n(V_{p - k})} where
#' \eqn{V_{p k}} is the CVE estimate of the orthogonal columnspace of #' \eqn{V_{p - k}} is space that is orthogonal to the columns-space of the CVE estimate of \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'}
#' \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} but finds #' but finds the minimum using the wilcoxon-test.
#' the minimum using the wilcoxon-test.
#' #'
#' @param object instance of class \code{cve} (result of \code{\link{cve}}, #' @param object an object of class \code{"cve"}, usually, a result of a call to
#' \code{\link{cve.call}}). #' \code{\link{cve}} or \code{\link{cve.call}}.
#' @param method This parameter specify which method will be used in dimension #' @param method This parameter specify which method will be used in dimension
#' estimation. It provides three methods \code{'CV'} (default), \code{'elbow'}, #' estimation. It provides three methods \code{'CV'} (default), \code{'elbow'},
#' and \code{'wilcoxon'} to estimate the dimension of the SDR. #' and \code{'wilcoxon'} to estimate the dimension of the SDR.

View File

@ -8,16 +8,13 @@
\description{ \description{
Conditional Variance Estimation (CVE) is a novel sufficient dimension Conditional Variance Estimation (CVE) is a novel sufficient dimension
reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)},
where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, where \eqn{B'X} is a lower dimensional projection of the predictors and
\eqn{Y} is a univariate responce. CVE,
similarly to its main competitor, the mean average variance estimation similarly to its main competitor, the mean average variance estimation
(MAVE), is not based on inverse regression, and does not require the (MAVE), is not based on inverse regression, and does not require the
restrictive linearity and constant variance conditions of moment based SDR restrictive linearity and constant variance conditions of moment based SDR
methods. CVE is data-driven and applies to additive error regressions with methods. CVE is data-driven and applies to additive error regressions with
continuous predictors and link function. The effectiveness and accuracy of continuous predictors and link function. Let \eqn{X} be a real
CVE compared to MAVE and other SDR techniques is demonstrated in simulation
studies. CVE is shown to outperform MAVE in some model set-ups, while it
remains largely on par under most others.
Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
\eqn{p}-dimensional covariate vector. We assume that the dependence of \eqn{p}-dimensional covariate vector. We assume that the dependence of
\eqn{Y} and \eqn{X} is modelled by \eqn{Y} and \eqn{X} is modelled by
\deqn{Y = g(B'X) + \epsilon} \deqn{Y = g(B'X) + \epsilon}
@ -26,11 +23,11 @@ variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g} zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
is an unknown, continuous non-constant function, is an unknown, continuous non-constant function,
and \eqn{B = (b_1, ..., b_k)} is and \eqn{B = (b_1, ..., b_k)} is
a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}. a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
Without loss of generality \eqn{B} is assumed to be orthonormal. Without loss of generality \eqn{B} is assumed to be orthonormal.
} }
\references{ \references{
Fertl Lukas, Bura Efstathia. (2019), Conditional Variance Fertl, L. and Bura, E. (2019), Conditional Variance
Estimation for Sufficient Dimension Reduction. Working Paper. Estimation for Sufficient Dimension Reduction. Working Paper.
} }
\author{ \author{

View File

@ -2,24 +2,24 @@
% Please edit documentation in R/coef.R % Please edit documentation in R/coef.R
\name{coef.cve} \name{coef.cve}
\alias{coef.cve} \alias{coef.cve}
\title{Gets estimated SDR basis.} \title{Extracts estimated SDR basis.}
\usage{ \usage{
\method{coef}{cve}(object, k, ...) \method{coef}{cve}(object, k, ...)
} }
\arguments{ \arguments{
\item{object}{instance of \code{cve} as output from \code{\link{cve}} or \item{object}{an object of class \code{"cve"}, usually, a result of a call to
\code{\link{cve.call}}.} \code{\link{cve}} or \code{\link{cve.call}}.}
\item{k}{the SDR dimension.} \item{k}{the SDR dimension.}
\item{...}{ignored.} \item{...}{ignored (no additional arguments).}
} }
\value{ \value{
dir the matrix of CS or CMS of given dimension The matrix \eqn{B} of dimensions \eqn{p\times k}{p x k}.
} }
\description{ \description{
Returns the SDR basis matrix for dimension \code{k}, i.e. returns the Returns the SDR basis matrix for dimension \code{k}, i.e. returns the
cve-estimate with dimension \eqn{p\times k}{p x k}. cve-estimate of \eqn{B} with dimension \eqn{p\times k}{p x k}.
} }
\examples{ \examples{
# set dimensions for simulation model # set dimensions for simulation model
@ -30,7 +30,7 @@ n <- 200 # samplesize
b1 <- rep(1 / sqrt(p), p) b1 <- rep(1 / sqrt(p), p)
b2 <- (-1)^seq(1, p) / sqrt(p) b2 <- (-1)^seq(1, p) / sqrt(p)
B <- cbind(b1, b2) B <- cbind(b1, b2)
set.seed(21) set.seed(21)
# creat predictor data x ~ N(0, I_p) # creat predictor data x ~ N(0, I_p)
x <- matrix(rnorm(n * p), n, p) x <- matrix(rnorm(n * p), n, p)
@ -42,7 +42,7 @@ y <- (x \%*\% b1)^2 + 2 * (x \%*\% b2) + 0.25 * rnorm(100)
cve.obj <- cve(y ~ x, max.dim = 5) cve.obj <- cve(y ~ x, max.dim = 5)
# get cve-estimate for B with dimensions (p, k = 2) # get cve-estimate for B with dimensions (p, k = 2)
B2 <- coef(cve.obj, k = 2) B2 <- coef(cve.obj, k = 2)
# Projection matrix on span(B) # Projection matrix on span(B)
# equivalent to `B \%*\% t(B)` since B is semi-orthonormal # equivalent to `B \%*\% t(B)` since B is semi-orthonormal
PB <- B \%*\% solve(t(B) \%*\% B) \%*\% t(B) PB <- B \%*\% solve(t(B) \%*\% B) \%*\% t(B)

View File

@ -20,14 +20,42 @@ the environment from which \code{cve} is called.}
\item{method}{This character string specifies the method of fitting. The \item{method}{This character string specifies the method of fitting. The
options are options are
\itemize{ \itemize{
\item "simple" implementation as described in the paper. \item "simple" implementation,
\item "weighted" variation with adaptive weighting of slices. \item "weighted" variation with adaptive weighting of slices.
} }
see paper.} see Fertl, L. and Bura, E. (2019).}
\item{max.dim}{upper bounds for \code{k}, (ignored if \code{k} is supplied).} \item{max.dim}{upper bounds for \code{k}, (ignored if \code{k} is supplied).}
\item{...}{optional parameters passed on to \code{cve.call}.} \item{...}{optional parameters passed on to \code{cve.call}.
Conditional Variance Estimation (CVE) is a sufficient dimension reduction
(SDR) method for regressions studying \eqn{E(Y|X)}, the conditional
expectation of a response \eqn{Y} given a set of predictors \eqn{X}. This
function provides methods for estimating the dimension and the subspace
spanned by the columns of a \eqn{p\times k}{p x k} matrix \eqn{B} of minimal
rank \eqn{k} such that
\deqn{%
E(Y|X) = E(Y|B'X) %
}
or, equivalently,
\deqn{%
Y = g(B'X) + \epsilon %
}
where \eqn{X} is independent of \eqn{\epsilon} with positive definite
variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
is an unknown, continuous non-constant function, and \eqn{B = (b_1,..., b_k)}
is a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
Both the dimension \eqn{k} and the subspace \eqn{span(B)} are unknown. The
CVE method makes very few assumptions.
A kernel matrix \eqn{\hat{B}}{Bhat} is estimated such that the column space
of \eqn{\hat{B}}{Bhat} should be close to the mean subspace \eqn{span(B)}.
The primary output from this method is a set of orthonormal vectors,
\eqn{\hat{B}}{Bhat}, whose span estimates \eqn{span(B)}.}
} }
\value{ \value{
an S3 object of class \code{cve} with components: an S3 object of class \code{cve} with components:
@ -56,28 +84,10 @@ an S3 object of class \code{cve} with components:
} }
} }
\description{ \description{
Conditional Variance Estimation (CVE) is a novel sufficient dimension This is the main function in the \code{CVE} package. It creates objects of
reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, class \code{"cve"} to estimate the mean subspace. Helper functions that
where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, require a \code{"cve"} object can then be applied to the output from this
similarly to its main competitor, the mean average variance estimation function.
(MAVE), is not based on inverse regression, and does not require the
restrictive linearity and constant variance conditions of moment based SDR
methods. CVE is data-driven and applies to additive error regressions with
continuous predictors and link function. The effectiveness and accuracy of
CVE compared to MAVE and other SDR techniques is demonstrated in simulation
studies. CVE is shown to outperform MAVE in some model set-ups, while it
remains largely on par under most others.
Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
\eqn{p}-dimensional covariate vector. We assume that the dependence of
\eqn{Y} and \eqn{X} is modelled by
\deqn{Y = g(B'X) + \epsilon}
where \eqn{X} is independent of \eqn{\epsilon} with positive definite
variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
is an unknown, continuous non-constant function,
and \eqn{B = (b_1, ..., b_k)} is
a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}.
Without loss of generality \eqn{B} is assumed to be orthonormal.
} }
\examples{ \examples{
# set dimensions for simulation model # set dimensions for simulation model
@ -131,7 +141,7 @@ norm(PB - PB.w, type = 'F')
} }
\references{ \references{
Fertl Lukas, Bura Efstathia. (2019), Conditional Variance Fertl, L. and Bura, E. (2019), Conditional Variance
Estimation for Sufficient Dimension Reduction. Working Paper. Estimation for Sufficient Dimension Reduction. Working Paper.
} }
\seealso{ \seealso{

View File

@ -10,9 +10,9 @@ cve.call(X, Y, method = "simple", nObs = sqrt(nrow(X)), h = NULL,
max.iter = 50L, attempts = 10L, logger = NULL) max.iter = 50L, attempts = 10L, logger = NULL)
} }
\arguments{ \arguments{
\item{X}{Design matrix with dimension \eqn{n\times p}{n x p}.} \item{X}{Design predictor matrix.}
\item{Y}{numeric array of length \eqn{n} of Responses.} \item{Y}{\eqn{n}-dimensional vector of responces.}
\item{method}{specifies the CVE method variation as one of \item{method}{specifies the CVE method variation as one of
\itemize{ \itemize{
@ -60,7 +60,7 @@ used as starting value in the optimization. (If supplied,
out \code{attempts} times with starting values drawn from the invariant out \code{attempts} times with starting values drawn from the invariant
measure on the Stiefel manifold (see \code{\link{rStiefel}}).} measure on the Stiefel manifold (see \code{\link{rStiefel}}).}
\item{logger}{a logger function (only for advanced user, slows down the \item{logger}{a logger function (only for advanced users, slows down the
computation).} computation).}
} }
\value{ \value{
@ -90,28 +90,10 @@ an S3 object of class \code{cve} with components:
} }
} }
\description{ \description{
Conditional Variance Estimation (CVE) is a novel sufficient dimension This is the main function in the \code{CVE} package. It creates objects of
reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)}, class \code{"cve"} to estimate the mean subspace. Helper functions that
where \eqn{B'X} is a lower dimensional projection of the predictors. CVE, require a \code{"cve"} object can then be applied to the output from this
similarly to its main competitor, the mean average variance estimation function.
(MAVE), is not based on inverse regression, and does not require the
restrictive linearity and constant variance conditions of moment based SDR
methods. CVE is data-driven and applies to additive error regressions with
continuous predictors and link function. The effectiveness and accuracy of
CVE compared to MAVE and other SDR techniques is demonstrated in simulation
studies. CVE is shown to outperform MAVE in some model set-ups, while it
remains largely on par under most others.
Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
\eqn{p}-dimensional covariate vector. We assume that the dependence of
\eqn{Y} and \eqn{X} is modelled by
\deqn{Y = g(B'X) + \epsilon}
where \eqn{X} is independent of \eqn{\epsilon} with positive definite
variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
is an unknown, continuous non-constant function,
and \eqn{B = (b_1, ..., b_k)} is
a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}.
Without loss of generality \eqn{B} is assumed to be orthonormal.
} }
\examples{ \examples{
# create B for simulation (k = 1) # create B for simulation (k = 1)

View File

@ -5,20 +5,23 @@
\alias{directions} \alias{directions}
\title{Computes projected training data \code{X} for given dimension `k`.} \title{Computes projected training data \code{X} for given dimension `k`.}
\usage{ \usage{
\method{directions}{cve}(dr, k) \method{directions}{cve}(object, k, ...)
} }
\arguments{ \arguments{
\item{dr}{Instance of \code{'cve'} as returned by \code{\link{cve}}.} \item{object}{an object of class \code{"cve"}, usually, a result of a call to
\code{\link{cve}} or \code{\link{cve.call}}.}
\item{k}{SDR dimension to use for projection.} \item{k}{SDR dimension to use for projection.}
\item{...}{ignored (no additional arguments).}
} }
\value{ \value{
the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B} the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B}
is the cve-estimate for dimension \eqn{k}. is the cve-estimate for dimension \eqn{k}.
} }
\description{ \description{
Projects the dimensional design matrix \eqn{X} on the columnspace of the Returns \eqn{B'X}. That is the dimensional design matrix \eqn{X} on the
cve-estimate for given dimension \eqn{k}. columnspace of the cve-estimate for given dimension \eqn{k}.
} }
\examples{ \examples{
# create B for simulation (k = 1) # create B for simulation (k = 1)
@ -39,3 +42,6 @@ x.proj <- directions(cve.obj.simple, k = 1)
plot(x.proj, y) plot(x.proj, y)
} }
\seealso{
\code{\link{cve}}
}

View File

@ -7,9 +7,9 @@
estimate.bandwidth(X, k, nObs, version = 1L) estimate.bandwidth(X, k, nObs, version = 1L)
} }
\arguments{ \arguments{
\item{X}{a \eqn{n\times p}{n x p} matrix with samples in its rows.} \item{X}{the \eqn{n\times p}{n x p} matrix of predictor values.}
\item{k}{Dimension of lower dimensional projection.} \item{k}{the SDR dimension.}
\item{nObs}{number of points in a slice, only for version 2.} \item{nObs}{number of points in a slice, only for version 2.}
@ -26,11 +26,11 @@ defaults to using the following formula (version 1)
h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2} h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2}
Alternative version 2 is used for dimension prediction which is given by Alternative version 2 is used for dimension prediction which is given by
\deqn{% \deqn{%
h = (2 * tr(\Sigma) / p) * \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{% h = \frac{2 tr(\Sigma)}{p} \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{%
h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))} h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))}
with \eqn{n} the sample size, \eqn{p} its dimension and the with \eqn{n} the sample size, \eqn{p} the dimension of \eqn{X} and
covariance-matrix \eqn{\Sigma}, which is \code{(n-1)/n} times the sample \eqn{\Sigma} is \eqn{(n - 1) / n} times the sample covariance matrix of
covariance estimate. \eqn{X}.
} }
\examples{ \examples{
# set dimensions for simulation model # set dimensions for simulation model

View File

@ -2,12 +2,13 @@
% Please edit documentation in R/plot.R % Please edit documentation in R/plot.R
\name{plot.cve} \name{plot.cve}
\alias{plot.cve} \alias{plot.cve}
\title{Loss distribution elbow plot.} \title{Elbow plot of the loss function.}
\usage{ \usage{
\method{plot}{cve}(x, ...) \method{plot}{cve}(x, ...)
} }
\arguments{ \arguments{
\item{x}{Object of class \code{"cve"} (result of [\code{\link{cve}}]).} \item{x}{an object of class \code{"cve"}, usually, a result of a call to
\code{\link{cve}} or \code{\link{cve.call}}.}
\item{...}{Pass through parameters to [\code{\link{plot}}] and \item{...}{Pass through parameters to [\code{\link{plot}}] and
[\code{\link{lines}}]} [\code{\link{lines}}]}
@ -15,8 +16,9 @@
\description{ \description{
Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from
\code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds \code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds
to \eqn{L_n(V, X_i)} where \eqn{V \in S(p, p - k)}{V} is the minimizer of to \eqn{L_n(V, X_i)} where \eqn{V} is a stiefel manifold element as
\eqn{L_n(V)}, for further details see the paper. minimizer of
\eqn{L_n(V)}, for further details see Fertl, L. and Bura, E. (2019).
} }
\examples{ \examples{
# create B for simulation # create B for simulation
@ -46,7 +48,7 @@ plot(cve.obj.simple)
} }
\references{ \references{
Fertl Lukas, Bura Efstathia. (2019), Conditional Variance Fertl, L. and Bura, E. (2019), Conditional Variance
Estimation for Sufficient Dimension Reduction. Working Paper. Estimation for Sufficient Dimension Reduction. Working Paper.
} }
\seealso{ \seealso{

View File

@ -4,24 +4,24 @@
\alias{predict.cve} \alias{predict.cve}
\title{Predict method for CVE Fits.} \title{Predict method for CVE Fits.}
\usage{ \usage{
\method{predict}{cve}(object, newdata, dim, ...) \method{predict}{cve}(object, newdata, k, ...)
} }
\arguments{ \arguments{
\item{object}{instance of class \code{cve} (result of \code{cve}, \item{object}{an object of class \code{"cve"}, usually, a result of a call to
\code{cve.call}).} \code{\link{cve}} or \code{\link{cve.call}}.}
\item{newdata}{Matrix of the new data to be predicted.} \item{newdata}{Matrix of new predictor values, \eqn{C}.}
\item{dim}{dimension of SDR space to be used for data projecition.} \item{k}{dimension of SDR space to be used for data projection.}
\item{...}{further arguments passed to \code{\link{mars}}.} \item{...}{further arguments passed to \code{\link{mars}}.}
} }
\value{ \value{
prediced response of data \code{newdata}. prediced response at \code{newdata}.
} }
\description{ \description{
Predict response using projected data where the forward model \eqn{g(B'X)} Predict response using projected data \eqn{B'C} by fitting
is estimated using \code{\link{mars}}. \eqn{g(B'C) + \epsilon} using \code{\link{mars}}.
} }
\examples{ \examples{
# create B for simulation # create B for simulation

View File

@ -7,8 +7,8 @@
predict_dim(object, ..., method = "CV") predict_dim(object, ..., method = "CV")
} }
\arguments{ \arguments{
\item{object}{instance of class \code{cve} (result of \code{\link{cve}}, \item{object}{an object of class \code{"cve"}, usually, a result of a call to
\code{\link{cve.call}}).} \code{\link{cve}} or \code{\link{cve.call}}.}
\item{...}{ignored.} \item{...}{ignored.}
@ -26,16 +26,15 @@ list with
\description{ \description{
This function estimates the dimension of the mean dimension reduction space, This function estimates the dimension of the mean dimension reduction space,
i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'} i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'}
performs cross-validation using \code{mars}. Given performs l.o.o cross-validation using \code{mars}. Given
\code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is \code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is
performed on the dataset \eqn{(Y i, B_k' X_i)_{i = 1, ..., n}} where performed on the dataset \eqn{(Y_i, B_k' X_i)_{i = 1, ..., n}} where
\eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate given \eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate. The
\eqn{k}. The estimated SDR dimension is the \eqn{k} where the estimated SDR dimension is the \eqn{k} where the
cross-validation mean squared error is the lowest. The method \code{'elbow'} cross-validation mean squared error is minimal. The method \code{'elbow'}
estimates the dimension via \eqn{k = argmin_k L_n(V_{p k})} where estimates the dimension via \eqn{k = argmin_k L_n(V_{p - k})} where
\eqn{V_{p k}} is the CVE estimate of the orthogonal columnspace of \eqn{V_{p - k}} is space that is orthogonal to the columns-space of the CVE estimate of \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'}
\eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} but finds but finds the minimum using the wilcoxon-test.
the minimum using the wilcoxon-test.
} }
\examples{ \examples{
# create B for simulation # create B for simulation