- wip: reviewing and fixing doc.
This commit is contained in:
parent
b71898a5bc
commit
5d3a0ca18d
|
@ -2,7 +2,7 @@ Package: CVE
|
|||
Type: Package
|
||||
Title: Conditional Variance Estimator for Sufficient Dimension Reduction
|
||||
Version: 0.2
|
||||
Date: 2019-11-13
|
||||
Date: 2019-12-20
|
||||
Author: Daniel Kapla <daniel@kapla.at>, Lukas Fertl <lukas.fertl@chello.at>
|
||||
Maintainer: Daniel Kapla <daniel@kapla.at>
|
||||
Description: Implementation of the Conditional Variance Estimation (CVE) method.
|
||||
|
|
62
CVE/R/CVE.R
62
CVE/R/CVE.R
|
@ -2,16 +2,13 @@
|
|||
#'
|
||||
#' Conditional Variance Estimation (CVE) is a novel sufficient dimension
|
||||
#' reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)},
|
||||
#' where \eqn{B'X} is a lower dimensional projection of the predictors. CVE,
|
||||
#' where \eqn{B'X} is a lower dimensional projection of the predictors and
|
||||
#' \eqn{Y} is a univariate responce. CVE,
|
||||
#' similarly to its main competitor, the mean average variance estimation
|
||||
#' (MAVE), is not based on inverse regression, and does not require the
|
||||
#' restrictive linearity and constant variance conditions of moment based SDR
|
||||
#' methods. CVE is data-driven and applies to additive error regressions with
|
||||
#' continuous predictors and link function. The effectiveness and accuracy of
|
||||
#' CVE compared to MAVE and other SDR techniques is demonstrated in simulation
|
||||
#' studies. CVE is shown to outperform MAVE in some model set-ups, while it
|
||||
#' remains largely on par under most others.
|
||||
#' Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
|
||||
#' continuous predictors and link function. Let \eqn{X} be a real
|
||||
#' \eqn{p}-dimensional covariate vector. We assume that the dependence of
|
||||
#' \eqn{Y} and \eqn{X} is modelled by
|
||||
#' \deqn{Y = g(B'X) + \epsilon}
|
||||
|
@ -20,11 +17,11 @@
|
|||
#' zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
|
||||
#' is an unknown, continuous non-constant function,
|
||||
#' and \eqn{B = (b_1, ..., b_k)} is
|
||||
#' a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}.
|
||||
#' a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
|
||||
#' Without loss of generality \eqn{B} is assumed to be orthonormal.
|
||||
#'
|
||||
#' @author Daniel Kapla, Lukas Fertl, Bura Efstathia
|
||||
#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance
|
||||
#' @references Fertl, L. and Bura, E. (2019), Conditional Variance
|
||||
#' Estimation for Sufficient Dimension Reduction. Working Paper.
|
||||
#'
|
||||
#' @docType package
|
||||
|
@ -33,7 +30,10 @@
|
|||
|
||||
#' Conditional Variance Estimator (CVE).
|
||||
#'
|
||||
#' @inherit CVE-package description
|
||||
#' This is the main function in the \code{CVE} package. It creates objects of
|
||||
#' class \code{"cve"} to estimate the mean subspace. Helper functions that
|
||||
#' require a \code{"cve"} object can then be applied to the output from this
|
||||
#' function.
|
||||
#'
|
||||
#' @param formula an object of class \code{"formula"} which is a symbolic
|
||||
#' description of the model to be fitted like \eqn{Y\sim X}{Y ~ X} where
|
||||
|
@ -46,13 +46,41 @@
|
|||
#' @param method This character string specifies the method of fitting. The
|
||||
#' options are
|
||||
#' \itemize{
|
||||
#' \item "simple" implementation as described in the paper.
|
||||
#' \item "simple" implementation,
|
||||
#' \item "weighted" variation with adaptive weighting of slices.
|
||||
#' }
|
||||
#' see paper.
|
||||
#' see Fertl, L. and Bura, E. (2019).
|
||||
#' @param max.dim upper bounds for \code{k}, (ignored if \code{k} is supplied).
|
||||
#' @param ... optional parameters passed on to \code{cve.call}.
|
||||
#'
|
||||
#'
|
||||
#' Conditional Variance Estimation (CVE) is a sufficient dimension reduction
|
||||
#' (SDR) method for regressions studying \eqn{E(Y|X)}, the conditional
|
||||
#' expectation of a response \eqn{Y} given a set of predictors \eqn{X}. This
|
||||
#' function provides methods for estimating the dimension and the subspace
|
||||
#' spanned by the columns of a \eqn{p\times k}{p x k} matrix \eqn{B} of minimal
|
||||
#' rank \eqn{k} such that
|
||||
#' \deqn{%
|
||||
#' E(Y|X) = E(Y|B'X) %
|
||||
#' }
|
||||
#' or, equivalently,
|
||||
#' \deqn{%
|
||||
#' Y = g(B'X) + \epsilon %
|
||||
#' }
|
||||
#' where \eqn{X} is independent of \eqn{\epsilon} with positive definite
|
||||
#' variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
|
||||
#' zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
|
||||
#' is an unknown, continuous non-constant function, and \eqn{B = (b_1,..., b_k)}
|
||||
#' is a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
|
||||
#'
|
||||
#' Both the dimension \eqn{k} and the subspace \eqn{span(B)} are unknown. The
|
||||
#' CVE method makes very few assumptions.
|
||||
#'
|
||||
#' A kernel matrix \eqn{\hat{B}}{Bhat} is estimated such that the column space
|
||||
#' of \eqn{\hat{B}}{Bhat} should be close to the mean subspace \eqn{span(B)}.
|
||||
#' The primary output from this method is a set of orthonormal vectors,
|
||||
#' \eqn{\hat{B}}{Bhat}, whose span estimates \eqn{span(B)}.
|
||||
#'
|
||||
#' @return an S3 object of class \code{cve} with components:
|
||||
#' \describe{
|
||||
#' \item{X}{design matrix of predictor vector used for calculating
|
||||
|
@ -130,7 +158,7 @@
|
|||
#'
|
||||
#' @seealso For a detailed description of \code{formula} see
|
||||
#' \code{\link{formula}}.
|
||||
#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance
|
||||
#' @references Fertl, L. and Bura, E. (2019), Conditional Variance
|
||||
#' Estimation for Sufficient Dimension Reduction. Working Paper.
|
||||
#'
|
||||
#' @importFrom stats model.frame
|
||||
|
@ -159,8 +187,8 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) {
|
|||
#' @inherit cve title
|
||||
#' @inherit cve description
|
||||
#'
|
||||
#' @param X Design matrix with dimension \eqn{n\times p}{n x p}.
|
||||
#' @param Y numeric array of length \eqn{n} of Responses.
|
||||
#' @param X Design predictor matrix.
|
||||
#' @param Y \eqn{n}-dimensional vector of responces.
|
||||
#' @param h bandwidth or function to estimate bandwidth, defaults to internaly
|
||||
#' estimated bandwidth.
|
||||
#' @param nObs parameter for choosing bandwidth \code{h} using
|
||||
|
@ -193,7 +221,7 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) {
|
|||
#' @param V.init Semi-orthogonal matrix of dimensions `(ncol(X), ncol(X) - k)
|
||||
#' used as starting value in the optimization. (If supplied,
|
||||
#' \code{attempts} is set to 0 and \code{k} to match dimension).
|
||||
#' @param logger a logger function (only for advanced user, slows down the
|
||||
#' @param logger a logger function (only for advanced users, slows down the
|
||||
#' computation).
|
||||
#'
|
||||
#' @inherit cve return
|
||||
|
@ -209,11 +237,11 @@ cve <- function(formula, data, method = "simple", max.dim = 10L, ...) {
|
|||
#' # Y = f(B'X) + err
|
||||
#' # with f(x1) = x1 and err ~ N(0, 0.25^2)
|
||||
#' Y <- X %*% B + 0.25 * rnorm(100)
|
||||
#'
|
||||
#'
|
||||
#' # calculate cve with method 'simple' for k = 1
|
||||
#' set.seed(21)
|
||||
#' cve.obj.simple1 <- cve(Y ~ X, k = 1)
|
||||
#'
|
||||
#'
|
||||
#' # same as
|
||||
#' set.seed(21)
|
||||
#' cve.obj.simple2 <- cve.call(X, Y, k = 1)
|
||||
|
|
16
CVE/R/coef.R
16
CVE/R/coef.R
|
@ -1,14 +1,14 @@
|
|||
#' Gets estimated SDR basis.
|
||||
#' Extracts estimated SDR basis.
|
||||
#'
|
||||
#' Returns the SDR basis matrix for dimension \code{k}, i.e. returns the
|
||||
#' cve-estimate with dimension \eqn{p\times k}{p x k}.
|
||||
#' cve-estimate of \eqn{B} with dimension \eqn{p\times k}{p x k}.
|
||||
#'
|
||||
#' @param object instance of \code{cve} as output from \code{\link{cve}} or
|
||||
#' \code{\link{cve.call}}.
|
||||
#' @param object an object of class \code{"cve"}, usually, a result of a call to
|
||||
#' \code{\link{cve}} or \code{\link{cve.call}}.
|
||||
#' @param k the SDR dimension.
|
||||
#' @param ... ignored.
|
||||
#' @param ... ignored (no additional arguments).
|
||||
#'
|
||||
#' @return dir the matrix of CS or CMS of given dimension
|
||||
#' @return The matrix \eqn{B} of dimensions \eqn{p\times k}{p x k}.
|
||||
#'
|
||||
#' @examples
|
||||
#' # set dimensions for simulation model
|
||||
|
@ -19,7 +19,7 @@
|
|||
#' b1 <- rep(1 / sqrt(p), p)
|
||||
#' b2 <- (-1)^seq(1, p) / sqrt(p)
|
||||
#' B <- cbind(b1, b2)
|
||||
#'
|
||||
#'
|
||||
#' set.seed(21)
|
||||
#' # creat predictor data x ~ N(0, I_p)
|
||||
#' x <- matrix(rnorm(n * p), n, p)
|
||||
|
@ -31,7 +31,7 @@
|
|||
#' cve.obj <- cve(y ~ x, max.dim = 5)
|
||||
#' # get cve-estimate for B with dimensions (p, k = 2)
|
||||
#' B2 <- coef(cve.obj, k = 2)
|
||||
#'
|
||||
#'
|
||||
#' # Projection matrix on span(B)
|
||||
#' # equivalent to `B %*% t(B)` since B is semi-orthonormal
|
||||
#' PB <- B %*% solve(t(B) %*% B) %*% t(B)
|
||||
|
|
|
@ -1,15 +1,17 @@
|
|||
#' @export
|
||||
directions <- function(dr, k) {
|
||||
directions <- function(object, k, ...) {
|
||||
UseMethod("directions")
|
||||
}
|
||||
|
||||
#' Computes projected training data \code{X} for given dimension `k`.
|
||||
#'
|
||||
#' Projects the dimensional design matrix \eqn{X} on the columnspace of the
|
||||
#' cve-estimate for given dimension \eqn{k}.
|
||||
#' Returns \eqn{B'X}. That is the dimensional design matrix \eqn{X} on the
|
||||
#' columnspace of the cve-estimate for given dimension \eqn{k}.
|
||||
#'
|
||||
#' @param dr Instance of \code{'cve'} as returned by \code{\link{cve}}.
|
||||
#' @param object an object of class \code{"cve"}, usually, a result of a call to
|
||||
#' \code{\link{cve}} or \code{\link{cve.call}}.
|
||||
#' @param k SDR dimension to use for projection.
|
||||
#' @param ... ignored (no additional arguments).
|
||||
#'
|
||||
#' @return the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B}
|
||||
#' is the cve-estimate for dimension \eqn{k}.
|
||||
|
@ -32,12 +34,14 @@ directions <- function(dr, k) {
|
|||
#' # plot y against projected data
|
||||
#' plot(x.proj, y)
|
||||
#'
|
||||
#' @seealso \code{\link{cve}}
|
||||
#'
|
||||
#' @method directions cve
|
||||
#' @aliases directions directions.cve
|
||||
#' @export
|
||||
directions.cve <- function(dr, k) {
|
||||
if (!(k %in% names(dr$res))) {
|
||||
directions.cve <- function(object, k, ...) {
|
||||
if (!(k %in% names(object$res))) {
|
||||
stop("SDR directions for requested dimension `k` not computed.")
|
||||
}
|
||||
return(dr$X %*% dr$res[[as.character(k)]]$B)
|
||||
return(object$X %*% object$res[[as.character(k)]]$B)
|
||||
}
|
||||
|
|
|
@ -7,14 +7,14 @@
|
|||
#' h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2}
|
||||
#' Alternative version 2 is used for dimension prediction which is given by
|
||||
#' \deqn{%
|
||||
#' h = (2 * tr(\Sigma) / p) * \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{%
|
||||
#' h = \frac{2 tr(\Sigma)}{p} \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{%
|
||||
#' h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))}
|
||||
#' with \eqn{n} the sample size, \eqn{p} its dimension and the
|
||||
#' covariance-matrix \eqn{\Sigma}, which is \code{(n-1)/n} times the sample
|
||||
#' covariance estimate.
|
||||
#' with \eqn{n} the sample size, \eqn{p} the dimension of \eqn{X} and
|
||||
#' \eqn{\Sigma} is \eqn{(n - 1) / n} times the sample covariance matrix of
|
||||
#' \eqn{X}.
|
||||
#'
|
||||
#' @param X a \eqn{n\times p}{n x p} matrix with samples in its rows.
|
||||
#' @param k Dimension of lower dimensional projection.
|
||||
#' @param X the \eqn{n\times p}{n x p} matrix of predictor values.
|
||||
#' @param k the SDR dimension.
|
||||
#' @param nObs number of points in a slice, only for version 2.
|
||||
#' @param version either \code{1} or \code{2}.
|
||||
#'
|
||||
|
|
13
CVE/R/plot.R
13
CVE/R/plot.R
|
@ -1,13 +1,16 @@
|
|||
#' Loss distribution elbow plot.
|
||||
#' Elbow plot of the loss function.
|
||||
#'
|
||||
#' Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from
|
||||
#' \code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds
|
||||
#' to \eqn{L_n(V, X_i)} where \eqn{V \in S(p, p - k)}{V} is the minimizer of
|
||||
#' \eqn{L_n(V)}, for further details see the paper.
|
||||
#' to \eqn{L_n(V, X_i)} where \eqn{V} is a stiefel manifold element as
|
||||
#' minimizer of
|
||||
#' \eqn{L_n(V)}, for further details see Fertl, L. and Bura, E. (2019).
|
||||
#'
|
||||
#' @param x Object of class \code{"cve"} (result of [\code{\link{cve}}]).
|
||||
#' @param x an object of class \code{"cve"}, usually, a result of a call to
|
||||
#' \code{\link{cve}} or \code{\link{cve.call}}.
|
||||
#' @param ... Pass through parameters to [\code{\link{plot}}] and
|
||||
#' [\code{\link{lines}}]
|
||||
#'
|
||||
#' @examples
|
||||
#' # create B for simulation
|
||||
#' B <- cbind(rep(1, 6), (-1)^seq(6)) / sqrt(6)
|
||||
|
@ -34,7 +37,7 @@
|
|||
#' # elbow plot
|
||||
#' plot(cve.obj.simple)
|
||||
#'
|
||||
#' @references Fertl Lukas, Bura Efstathia. (2019), Conditional Variance
|
||||
#' @references Fertl, L. and Bura, E. (2019), Conditional Variance
|
||||
#' Estimation for Sufficient Dimension Reduction. Working Paper.
|
||||
#'
|
||||
#' @seealso see \code{\link{par}} for graphical parameters to pass through
|
||||
|
|
|
@ -1,15 +1,15 @@
|
|||
#' Predict method for CVE Fits.
|
||||
#'
|
||||
#' Predict response using projected data where the forward model \eqn{g(B'X)}
|
||||
#' is estimated using \code{\link{mars}}.
|
||||
#' Predict response using projected data \eqn{B'C} by fitting
|
||||
#' \eqn{g(B'C) + \epsilon} using \code{\link{mars}}.
|
||||
#'
|
||||
#' @param object instance of class \code{cve} (result of \code{cve},
|
||||
#' \code{cve.call}).
|
||||
#' @param newdata Matrix of the new data to be predicted.
|
||||
#' @param dim dimension of SDR space to be used for data projecition.
|
||||
#' @param object an object of class \code{"cve"}, usually, a result of a call to
|
||||
#' \code{\link{cve}} or \code{\link{cve.call}}.
|
||||
#' @param newdata Matrix of new predictor values, \eqn{C}.
|
||||
#' @param k dimension of SDR space to be used for data projection.
|
||||
#' @param ... further arguments passed to \code{\link{mars}}.
|
||||
#'
|
||||
#' @return prediced response of data \code{newdata}.
|
||||
#' @return prediced response at \code{newdata}.
|
||||
#'
|
||||
#' @examples
|
||||
#' # create B for simulation
|
||||
|
@ -44,11 +44,11 @@
|
|||
#' @importFrom mda mars
|
||||
#' @method predict cve
|
||||
#' @export
|
||||
predict.cve <- function(object, newdata, dim, ...) {
|
||||
predict.cve <- function(object, newdata, k, ...) {
|
||||
if (missing(newdata)) {
|
||||
stop("No data supplied.")
|
||||
}
|
||||
if (missing(dim)) {
|
||||
if (missing(k)) {
|
||||
stop("No dimension supplied.")
|
||||
}
|
||||
|
||||
|
@ -56,7 +56,7 @@ predict.cve <- function(object, newdata, dim, ...) {
|
|||
newdata <- matrix(newdata, nrow = 1L)
|
||||
}
|
||||
|
||||
B <- object$res[[as.character(dim)]]$B
|
||||
B <- object$res[[as.character(k)]]$B
|
||||
|
||||
model <- mda::mars(object$X %*% B, object$Y)
|
||||
predict(model, newdata %*% B)
|
||||
|
|
|
@ -126,19 +126,18 @@ predict_dim_wilcoxon <- function(object, p.value = 0.05) {
|
|||
#'
|
||||
#' This function estimates the dimension of the mean dimension reduction space,
|
||||
#' i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'}
|
||||
#' performs cross-validation using \code{mars}. Given
|
||||
#' performs l.o.o cross-validation using \code{mars}. Given
|
||||
#' \code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is
|
||||
#' performed on the dataset \eqn{(Y i, B_k' X_i)_{i = 1, ..., n}} where
|
||||
#' \eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate given
|
||||
#' \eqn{k}. The estimated SDR dimension is the \eqn{k} where the
|
||||
#' cross-validation mean squared error is the lowest. The method \code{'elbow'}
|
||||
#' estimates the dimension via \eqn{k = argmin_k L_n(V_{p − k})} where
|
||||
#' \eqn{V_{p − k}} is the CVE estimate of the orthogonal columnspace of
|
||||
#' \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} but finds
|
||||
#' the minimum using the wilcoxon-test.
|
||||
#' performed on the dataset \eqn{(Y_i, B_k' X_i)_{i = 1, ..., n}} where
|
||||
#' \eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate. The
|
||||
#' estimated SDR dimension is the \eqn{k} where the
|
||||
#' cross-validation mean squared error is minimal. The method \code{'elbow'}
|
||||
#' estimates the dimension via \eqn{k = argmin_k L_n(V_{p - k})} where
|
||||
#' \eqn{V_{p - k}} is space that is orthogonal to the columns-space of the CVE estimate of \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'}
|
||||
#' but finds the minimum using the wilcoxon-test.
|
||||
#'
|
||||
#' @param object instance of class \code{cve} (result of \code{\link{cve}},
|
||||
#' \code{\link{cve.call}}).
|
||||
#' @param object an object of class \code{"cve"}, usually, a result of a call to
|
||||
#' \code{\link{cve}} or \code{\link{cve.call}}.
|
||||
#' @param method This parameter specify which method will be used in dimension
|
||||
#' estimation. It provides three methods \code{'CV'} (default), \code{'elbow'},
|
||||
#' and \code{'wilcoxon'} to estimate the dimension of the SDR.
|
||||
|
|
|
@ -8,16 +8,13 @@
|
|||
\description{
|
||||
Conditional Variance Estimation (CVE) is a novel sufficient dimension
|
||||
reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)},
|
||||
where \eqn{B'X} is a lower dimensional projection of the predictors. CVE,
|
||||
where \eqn{B'X} is a lower dimensional projection of the predictors and
|
||||
\eqn{Y} is a univariate responce. CVE,
|
||||
similarly to its main competitor, the mean average variance estimation
|
||||
(MAVE), is not based on inverse regression, and does not require the
|
||||
restrictive linearity and constant variance conditions of moment based SDR
|
||||
methods. CVE is data-driven and applies to additive error regressions with
|
||||
continuous predictors and link function. The effectiveness and accuracy of
|
||||
CVE compared to MAVE and other SDR techniques is demonstrated in simulation
|
||||
studies. CVE is shown to outperform MAVE in some model set-ups, while it
|
||||
remains largely on par under most others.
|
||||
Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
|
||||
continuous predictors and link function. Let \eqn{X} be a real
|
||||
\eqn{p}-dimensional covariate vector. We assume that the dependence of
|
||||
\eqn{Y} and \eqn{X} is modelled by
|
||||
\deqn{Y = g(B'X) + \epsilon}
|
||||
|
@ -26,11 +23,11 @@ variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
|
|||
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
|
||||
is an unknown, continuous non-constant function,
|
||||
and \eqn{B = (b_1, ..., b_k)} is
|
||||
a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}.
|
||||
a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
|
||||
Without loss of generality \eqn{B} is assumed to be orthonormal.
|
||||
}
|
||||
\references{
|
||||
Fertl Lukas, Bura Efstathia. (2019), Conditional Variance
|
||||
Fertl, L. and Bura, E. (2019), Conditional Variance
|
||||
Estimation for Sufficient Dimension Reduction. Working Paper.
|
||||
}
|
||||
\author{
|
||||
|
|
|
@ -2,24 +2,24 @@
|
|||
% Please edit documentation in R/coef.R
|
||||
\name{coef.cve}
|
||||
\alias{coef.cve}
|
||||
\title{Gets estimated SDR basis.}
|
||||
\title{Extracts estimated SDR basis.}
|
||||
\usage{
|
||||
\method{coef}{cve}(object, k, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{instance of \code{cve} as output from \code{\link{cve}} or
|
||||
\code{\link{cve.call}}.}
|
||||
\item{object}{an object of class \code{"cve"}, usually, a result of a call to
|
||||
\code{\link{cve}} or \code{\link{cve.call}}.}
|
||||
|
||||
\item{k}{the SDR dimension.}
|
||||
|
||||
\item{...}{ignored.}
|
||||
\item{...}{ignored (no additional arguments).}
|
||||
}
|
||||
\value{
|
||||
dir the matrix of CS or CMS of given dimension
|
||||
The matrix \eqn{B} of dimensions \eqn{p\times k}{p x k}.
|
||||
}
|
||||
\description{
|
||||
Returns the SDR basis matrix for dimension \code{k}, i.e. returns the
|
||||
cve-estimate with dimension \eqn{p\times k}{p x k}.
|
||||
cve-estimate of \eqn{B} with dimension \eqn{p\times k}{p x k}.
|
||||
}
|
||||
\examples{
|
||||
# set dimensions for simulation model
|
||||
|
@ -30,7 +30,7 @@ n <- 200 # samplesize
|
|||
b1 <- rep(1 / sqrt(p), p)
|
||||
b2 <- (-1)^seq(1, p) / sqrt(p)
|
||||
B <- cbind(b1, b2)
|
||||
|
||||
|
||||
set.seed(21)
|
||||
# creat predictor data x ~ N(0, I_p)
|
||||
x <- matrix(rnorm(n * p), n, p)
|
||||
|
@ -42,7 +42,7 @@ y <- (x \%*\% b1)^2 + 2 * (x \%*\% b2) + 0.25 * rnorm(100)
|
|||
cve.obj <- cve(y ~ x, max.dim = 5)
|
||||
# get cve-estimate for B with dimensions (p, k = 2)
|
||||
B2 <- coef(cve.obj, k = 2)
|
||||
|
||||
|
||||
# Projection matrix on span(B)
|
||||
# equivalent to `B \%*\% t(B)` since B is semi-orthonormal
|
||||
PB <- B \%*\% solve(t(B) \%*\% B) \%*\% t(B)
|
||||
|
|
|
@ -20,14 +20,42 @@ the environment from which \code{cve} is called.}
|
|||
\item{method}{This character string specifies the method of fitting. The
|
||||
options are
|
||||
\itemize{
|
||||
\item "simple" implementation as described in the paper.
|
||||
\item "simple" implementation,
|
||||
\item "weighted" variation with adaptive weighting of slices.
|
||||
}
|
||||
see paper.}
|
||||
see Fertl, L. and Bura, E. (2019).}
|
||||
|
||||
\item{max.dim}{upper bounds for \code{k}, (ignored if \code{k} is supplied).}
|
||||
|
||||
\item{...}{optional parameters passed on to \code{cve.call}.}
|
||||
\item{...}{optional parameters passed on to \code{cve.call}.
|
||||
|
||||
|
||||
Conditional Variance Estimation (CVE) is a sufficient dimension reduction
|
||||
(SDR) method for regressions studying \eqn{E(Y|X)}, the conditional
|
||||
expectation of a response \eqn{Y} given a set of predictors \eqn{X}. This
|
||||
function provides methods for estimating the dimension and the subspace
|
||||
spanned by the columns of a \eqn{p\times k}{p x k} matrix \eqn{B} of minimal
|
||||
rank \eqn{k} such that
|
||||
\deqn{%
|
||||
E(Y|X) = E(Y|B'X) %
|
||||
}
|
||||
or, equivalently,
|
||||
\deqn{%
|
||||
Y = g(B'X) + \epsilon %
|
||||
}
|
||||
where \eqn{X} is independent of \eqn{\epsilon} with positive definite
|
||||
variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
|
||||
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
|
||||
is an unknown, continuous non-constant function, and \eqn{B = (b_1,..., b_k)}
|
||||
is a real \eqn{p \times k}{p x k} matrix of rank \eqn{k \leq p}{k <= p}.
|
||||
|
||||
Both the dimension \eqn{k} and the subspace \eqn{span(B)} are unknown. The
|
||||
CVE method makes very few assumptions.
|
||||
|
||||
A kernel matrix \eqn{\hat{B}}{Bhat} is estimated such that the column space
|
||||
of \eqn{\hat{B}}{Bhat} should be close to the mean subspace \eqn{span(B)}.
|
||||
The primary output from this method is a set of orthonormal vectors,
|
||||
\eqn{\hat{B}}{Bhat}, whose span estimates \eqn{span(B)}.}
|
||||
}
|
||||
\value{
|
||||
an S3 object of class \code{cve} with components:
|
||||
|
@ -56,28 +84,10 @@ an S3 object of class \code{cve} with components:
|
|||
}
|
||||
}
|
||||
\description{
|
||||
Conditional Variance Estimation (CVE) is a novel sufficient dimension
|
||||
reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)},
|
||||
where \eqn{B'X} is a lower dimensional projection of the predictors. CVE,
|
||||
similarly to its main competitor, the mean average variance estimation
|
||||
(MAVE), is not based on inverse regression, and does not require the
|
||||
restrictive linearity and constant variance conditions of moment based SDR
|
||||
methods. CVE is data-driven and applies to additive error regressions with
|
||||
continuous predictors and link function. The effectiveness and accuracy of
|
||||
CVE compared to MAVE and other SDR techniques is demonstrated in simulation
|
||||
studies. CVE is shown to outperform MAVE in some model set-ups, while it
|
||||
remains largely on par under most others.
|
||||
Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
|
||||
\eqn{p}-dimensional covariate vector. We assume that the dependence of
|
||||
\eqn{Y} and \eqn{X} is modelled by
|
||||
\deqn{Y = g(B'X) + \epsilon}
|
||||
where \eqn{X} is independent of \eqn{\epsilon} with positive definite
|
||||
variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
|
||||
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
|
||||
is an unknown, continuous non-constant function,
|
||||
and \eqn{B = (b_1, ..., b_k)} is
|
||||
a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}.
|
||||
Without loss of generality \eqn{B} is assumed to be orthonormal.
|
||||
This is the main function in the \code{CVE} package. It creates objects of
|
||||
class \code{"cve"} to estimate the mean subspace. Helper functions that
|
||||
require a \code{"cve"} object can then be applied to the output from this
|
||||
function.
|
||||
}
|
||||
\examples{
|
||||
# set dimensions for simulation model
|
||||
|
@ -131,7 +141,7 @@ norm(PB - PB.w, type = 'F')
|
|||
|
||||
}
|
||||
\references{
|
||||
Fertl Lukas, Bura Efstathia. (2019), Conditional Variance
|
||||
Fertl, L. and Bura, E. (2019), Conditional Variance
|
||||
Estimation for Sufficient Dimension Reduction. Working Paper.
|
||||
}
|
||||
\seealso{
|
||||
|
|
|
@ -10,9 +10,9 @@ cve.call(X, Y, method = "simple", nObs = sqrt(nrow(X)), h = NULL,
|
|||
max.iter = 50L, attempts = 10L, logger = NULL)
|
||||
}
|
||||
\arguments{
|
||||
\item{X}{Design matrix with dimension \eqn{n\times p}{n x p}.}
|
||||
\item{X}{Design predictor matrix.}
|
||||
|
||||
\item{Y}{numeric array of length \eqn{n} of Responses.}
|
||||
\item{Y}{\eqn{n}-dimensional vector of responces.}
|
||||
|
||||
\item{method}{specifies the CVE method variation as one of
|
||||
\itemize{
|
||||
|
@ -60,7 +60,7 @@ used as starting value in the optimization. (If supplied,
|
|||
out \code{attempts} times with starting values drawn from the invariant
|
||||
measure on the Stiefel manifold (see \code{\link{rStiefel}}).}
|
||||
|
||||
\item{logger}{a logger function (only for advanced user, slows down the
|
||||
\item{logger}{a logger function (only for advanced users, slows down the
|
||||
computation).}
|
||||
}
|
||||
\value{
|
||||
|
@ -90,28 +90,10 @@ an S3 object of class \code{cve} with components:
|
|||
}
|
||||
}
|
||||
\description{
|
||||
Conditional Variance Estimation (CVE) is a novel sufficient dimension
|
||||
reduction (SDR) method for regressions satisfying \eqn{E(Y|X) = E(Y|B'X)},
|
||||
where \eqn{B'X} is a lower dimensional projection of the predictors. CVE,
|
||||
similarly to its main competitor, the mean average variance estimation
|
||||
(MAVE), is not based on inverse regression, and does not require the
|
||||
restrictive linearity and constant variance conditions of moment based SDR
|
||||
methods. CVE is data-driven and applies to additive error regressions with
|
||||
continuous predictors and link function. The effectiveness and accuracy of
|
||||
CVE compared to MAVE and other SDR techniques is demonstrated in simulation
|
||||
studies. CVE is shown to outperform MAVE in some model set-ups, while it
|
||||
remains largely on par under most others.
|
||||
Let \eqn{Y} be real denotes a univariate response and \eqn{X} a real
|
||||
\eqn{p}-dimensional covariate vector. We assume that the dependence of
|
||||
\eqn{Y} and \eqn{X} is modelled by
|
||||
\deqn{Y = g(B'X) + \epsilon}
|
||||
where \eqn{X} is independent of \eqn{\epsilon} with positive definite
|
||||
variance-covariance matrix \eqn{Var(X) = \Sigma_X}. \eqn{\epsilon} is a mean
|
||||
zero random variable with finite \eqn{Var(\epsilon) = E(\epsilon^2)}, \eqn{g}
|
||||
is an unknown, continuous non-constant function,
|
||||
and \eqn{B = (b_1, ..., b_k)} is
|
||||
a real \eqn{p \times k}{p x k} of rank \eqn{k \leq p}{k <= p}.
|
||||
Without loss of generality \eqn{B} is assumed to be orthonormal.
|
||||
This is the main function in the \code{CVE} package. It creates objects of
|
||||
class \code{"cve"} to estimate the mean subspace. Helper functions that
|
||||
require a \code{"cve"} object can then be applied to the output from this
|
||||
function.
|
||||
}
|
||||
\examples{
|
||||
# create B for simulation (k = 1)
|
||||
|
|
|
@ -5,20 +5,23 @@
|
|||
\alias{directions}
|
||||
\title{Computes projected training data \code{X} for given dimension `k`.}
|
||||
\usage{
|
||||
\method{directions}{cve}(dr, k)
|
||||
\method{directions}{cve}(object, k, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{dr}{Instance of \code{'cve'} as returned by \code{\link{cve}}.}
|
||||
\item{object}{an object of class \code{"cve"}, usually, a result of a call to
|
||||
\code{\link{cve}} or \code{\link{cve.call}}.}
|
||||
|
||||
\item{k}{SDR dimension to use for projection.}
|
||||
|
||||
\item{...}{ignored (no additional arguments).}
|
||||
}
|
||||
\value{
|
||||
the \eqn{n\times k}{n x k} dimensional matrix \eqn{X B} where \eqn{B}
|
||||
is the cve-estimate for dimension \eqn{k}.
|
||||
}
|
||||
\description{
|
||||
Projects the dimensional design matrix \eqn{X} on the columnspace of the
|
||||
cve-estimate for given dimension \eqn{k}.
|
||||
Returns \eqn{B'X}. That is the dimensional design matrix \eqn{X} on the
|
||||
columnspace of the cve-estimate for given dimension \eqn{k}.
|
||||
}
|
||||
\examples{
|
||||
# create B for simulation (k = 1)
|
||||
|
@ -39,3 +42,6 @@ x.proj <- directions(cve.obj.simple, k = 1)
|
|||
plot(x.proj, y)
|
||||
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{cve}}
|
||||
}
|
||||
|
|
|
@ -7,9 +7,9 @@
|
|||
estimate.bandwidth(X, k, nObs, version = 1L)
|
||||
}
|
||||
\arguments{
|
||||
\item{X}{a \eqn{n\times p}{n x p} matrix with samples in its rows.}
|
||||
\item{X}{the \eqn{n\times p}{n x p} matrix of predictor values.}
|
||||
|
||||
\item{k}{Dimension of lower dimensional projection.}
|
||||
\item{k}{the SDR dimension.}
|
||||
|
||||
\item{nObs}{number of points in a slice, only for version 2.}
|
||||
|
||||
|
@ -26,11 +26,11 @@ defaults to using the following formula (version 1)
|
|||
h = (2 * tr(\Sigma) / p) * (1.2 * n^(-1 / (4 + k)))^2}
|
||||
Alternative version 2 is used for dimension prediction which is given by
|
||||
\deqn{%
|
||||
h = (2 * tr(\Sigma) / p) * \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{%
|
||||
h = \frac{2 tr(\Sigma)}{p} \chi_k^{-1}(\frac{nObs - 1}{n - 1})}{%
|
||||
h = (2 * tr(\Sigma) / p) * \chi_k^-1((nObs - 1) / (n - 1))}
|
||||
with \eqn{n} the sample size, \eqn{p} its dimension and the
|
||||
covariance-matrix \eqn{\Sigma}, which is \code{(n-1)/n} times the sample
|
||||
covariance estimate.
|
||||
with \eqn{n} the sample size, \eqn{p} the dimension of \eqn{X} and
|
||||
\eqn{\Sigma} is \eqn{(n - 1) / n} times the sample covariance matrix of
|
||||
\eqn{X}.
|
||||
}
|
||||
\examples{
|
||||
# set dimensions for simulation model
|
||||
|
|
|
@ -2,12 +2,13 @@
|
|||
% Please edit documentation in R/plot.R
|
||||
\name{plot.cve}
|
||||
\alias{plot.cve}
|
||||
\title{Loss distribution elbow plot.}
|
||||
\title{Elbow plot of the loss function.}
|
||||
\usage{
|
||||
\method{plot}{cve}(x, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{Object of class \code{"cve"} (result of [\code{\link{cve}}]).}
|
||||
\item{x}{an object of class \code{"cve"}, usually, a result of a call to
|
||||
\code{\link{cve}} or \code{\link{cve.call}}.}
|
||||
|
||||
\item{...}{Pass through parameters to [\code{\link{plot}}] and
|
||||
[\code{\link{lines}}]}
|
||||
|
@ -15,8 +16,9 @@
|
|||
\description{
|
||||
Boxplots of the output \code{L} from \code{\link{cve}} over \code{k} from
|
||||
\code{min.dim} to \code{max.dim}. For given \code{k}, \code{L} corresponds
|
||||
to \eqn{L_n(V, X_i)} where \eqn{V \in S(p, p - k)}{V} is the minimizer of
|
||||
\eqn{L_n(V)}, for further details see the paper.
|
||||
to \eqn{L_n(V, X_i)} where \eqn{V} is a stiefel manifold element as
|
||||
minimizer of
|
||||
\eqn{L_n(V)}, for further details see Fertl, L. and Bura, E. (2019).
|
||||
}
|
||||
\examples{
|
||||
# create B for simulation
|
||||
|
@ -46,7 +48,7 @@ plot(cve.obj.simple)
|
|||
|
||||
}
|
||||
\references{
|
||||
Fertl Lukas, Bura Efstathia. (2019), Conditional Variance
|
||||
Fertl, L. and Bura, E. (2019), Conditional Variance
|
||||
Estimation for Sufficient Dimension Reduction. Working Paper.
|
||||
}
|
||||
\seealso{
|
||||
|
|
|
@ -4,24 +4,24 @@
|
|||
\alias{predict.cve}
|
||||
\title{Predict method for CVE Fits.}
|
||||
\usage{
|
||||
\method{predict}{cve}(object, newdata, dim, ...)
|
||||
\method{predict}{cve}(object, newdata, k, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{instance of class \code{cve} (result of \code{cve},
|
||||
\code{cve.call}).}
|
||||
\item{object}{an object of class \code{"cve"}, usually, a result of a call to
|
||||
\code{\link{cve}} or \code{\link{cve.call}}.}
|
||||
|
||||
\item{newdata}{Matrix of the new data to be predicted.}
|
||||
\item{newdata}{Matrix of new predictor values, \eqn{C}.}
|
||||
|
||||
\item{dim}{dimension of SDR space to be used for data projecition.}
|
||||
\item{k}{dimension of SDR space to be used for data projection.}
|
||||
|
||||
\item{...}{further arguments passed to \code{\link{mars}}.}
|
||||
}
|
||||
\value{
|
||||
prediced response of data \code{newdata}.
|
||||
prediced response at \code{newdata}.
|
||||
}
|
||||
\description{
|
||||
Predict response using projected data where the forward model \eqn{g(B'X)}
|
||||
is estimated using \code{\link{mars}}.
|
||||
Predict response using projected data \eqn{B'C} by fitting
|
||||
\eqn{g(B'C) + \epsilon} using \code{\link{mars}}.
|
||||
}
|
||||
\examples{
|
||||
# create B for simulation
|
||||
|
|
|
@ -7,8 +7,8 @@
|
|||
predict_dim(object, ..., method = "CV")
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{instance of class \code{cve} (result of \code{\link{cve}},
|
||||
\code{\link{cve.call}}).}
|
||||
\item{object}{an object of class \code{"cve"}, usually, a result of a call to
|
||||
\code{\link{cve}} or \code{\link{cve.call}}.}
|
||||
|
||||
\item{...}{ignored.}
|
||||
|
||||
|
@ -26,16 +26,15 @@ list with
|
|||
\description{
|
||||
This function estimates the dimension of the mean dimension reduction space,
|
||||
i.e. number of columns of \eqn{B} matrix. The default method \code{'CV'}
|
||||
performs cross-validation using \code{mars}. Given
|
||||
performs l.o.o cross-validation using \code{mars}. Given
|
||||
\code{k = min.dim, ..., max.dim} a cross-validation via \code{mars} is
|
||||
performed on the dataset \eqn{(Y i, B_k' X_i)_{i = 1, ..., n}} where
|
||||
\eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate given
|
||||
\eqn{k}. The estimated SDR dimension is the \eqn{k} where the
|
||||
cross-validation mean squared error is the lowest. The method \code{'elbow'}
|
||||
estimates the dimension via \eqn{k = argmin_k L_n(V_{p − k})} where
|
||||
\eqn{V_{p − k}} is the CVE estimate of the orthogonal columnspace of
|
||||
\eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'} but finds
|
||||
the minimum using the wilcoxon-test.
|
||||
performed on the dataset \eqn{(Y_i, B_k' X_i)_{i = 1, ..., n}} where
|
||||
\eqn{B_k} is the \eqn{p \times k}{p x k} dimensional CVE estimate. The
|
||||
estimated SDR dimension is the \eqn{k} where the
|
||||
cross-validation mean squared error is minimal. The method \code{'elbow'}
|
||||
estimates the dimension via \eqn{k = argmin_k L_n(V_{p - k})} where
|
||||
\eqn{V_{p - k}} is space that is orthogonal to the columns-space of the CVE estimate of \eqn{B_k}. Method \code{'wilcoxon'} is similar to \code{'elbow'}
|
||||
but finds the minimum using the wilcoxon-test.
|
||||
}
|
||||
\examples{
|
||||
# create B for simulation
|
||||
|
|
Loading…
Reference in New Issue