add: Ising shiny data sen visualization,

wip: GMLM TeX, add: ising small sim, fix: reduction in sims
2022-10-12 20:28:59 +02:00 · 2022-10-12 20:28:59 +02:00 · 79794f01ac
commit 79794f01ac
parent 4c6d6c0d0f
21 changed files with 1502 additions and 3058 deletions
--- a/LaTeX/GMLM.tex
+++ b/LaTeX/GMLM.tex
@ -1,4 +1,4 @@
-\documentclass[a4paper, 10pt]{article}
+\documentclass[a4paper, 12pt]{article}

 \usepackage[utf8]{inputenc}
 \usepackage[T1]{fontenc}
@ -81,24 +81,55 @@

 \maketitle

-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%                                 Abstract                                 %%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \begin{abstract}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    We propose a method for sufficient dimension reduction of Tensor-valued predictor (multi dimensional arrays) for regression or classification. We assume an Quadratic Exponential Family for a Generalized Linear Model in an inverse regression setting where the relation via a link is of a multi-linear nature.
    Using a multi-linear relation allows to perform per-axis reductions which reduces the total number of parameters drastically for higher order Tensor-valued predictors. Under the Exponential Family we derive maximum likelihood estimates for the multi-linear sufficient dimension reduction of the Tensor-valued predictors. Furthermore, we provide an estimation algorithm which utilizes the Tensor structure allowing efficient implementations. The performance of the method is illustrated via simulations and real world examples are provided.
 \end{abstract}

+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+                        \section{Introduction}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{Notation}
+Vectors are write as boldface lowercase letters (e.g. $\mat a$, $\mat b$), matrices use boldface uppercase or Greek letters (e.g. $\mat A$, $\mat B$, $\mat\alpha$, $\mat\Delta$). The identity matrix of dimensions $p\times p$ is denoted by $\mat{I}_p$ and the commutation matrix as $\mat{K}_{p, q}$ or $\mat{K}_p$ is case of $p = q$. Tensors, meaning multi-dimensional arrays of order at least 3, use uppercase calligraphic letters (e.g. $\ten{A}$, $\ten{B}$, $\ten{X}$, $\ten{Y}$, $\ten{F}$). Boldface indices (e.g. $\mat{i}, \mat{j}, \mat{k}$) denote multi-indices $\mat{i} = (i_1, ..., i_r)\in[\mat{d}]$ where the bracket notation is a shorthand for $[r] = \{1, ..., r\}$ which in conjunction with a multi-index as argument means $[\mat{d}] = [d_1]\times ... \times[d_K]$.
+
+Let $\ten{A} = (a_{i_1,...,i_r})\in\mathbb{R}^{d_1\times ...\times d_r}$ be an order\footnote{Also called rank, therefore the variable name $r$, but this term is \emph{not} used as it leads to confusion with the rank as in ``the rank of a matrix''.} $r$ tensor where $r\in\mathbb{N}$ is the number of modes or axis of $\ten{A}$. For matrices $\mat{B}_k\in\mathbb{R}^{p_k\times d_k}$ with $k\in[r] = \{1, 2, ..., r\}$ the \emph{multi-linear multiplication} is defined element wise as
+\begin{displaymath}
+    (\ten{A}\times\{\mat{B}_1, ..., \mat{B}_r\})_{j_1, ..., j_r} = \sum_{i_1, ..., i_r = 1}^{d_1, ..., d_r} a_{i_1, ..., i_r}(B_{1})_{j_1, i_1} \cdots (B_{r})_{j_r, i_r}
+\end{displaymath}
+which results in an order $r$ tensor of dimensions $p_1\times ...\times p_k)$. With this the \emph{$k$-mode product} between the tensor $\ten{A}$ with the matrix $\mat{B}_k$ is given by
+\begin{displaymath}
+    \mat{A}\times_k\mat{B}_k = \ten{A}\times\{\mat{I}_{d_1}, ..., \mat{I}_{d_{k-1}}, \mat{B}_{k}, \mat{I}_{d_{k+1}}, ..., \mat{I}_{d_r}\}.
+\end{displaymath}
+Furthermore, the notation $\ten{A}\times_{k\in S}$ is a short hand for writing the iterative application if the mode product for all indices in $S\subset[r]$. For example $\ten{A}\times_{k\in\{2, 5\}}\mat{B}_k = \ten{A}\times_2\mat{B}_2\times_5\mat{B}_5$. By only allowing $S$ to be a set this notation is unambiguous because the mode products commutes for different modes $j\neq k\Rightarrow\ten{A}\times_j\mat{B}_j\times_k\mat{B}_k = \ten{A}\times_k\mat{B}_k\times_j\mat{B}_j$.
+
+The \emph{inner product} between two tensors of the same order and dimensions is
+\begin{displaymath}
+    \langle\ten{A}, \ten{B}\rangle = \sum_{i_1, ..., i_r} a_{i_1, ..., i_r}b_{i_1, ..., i_r}
+\end{displaymath}
+with which the \emph{Frobenius Norm} $\|\ten{A}\|_F = \sqrt{\langle\ten{A}, \ten{A}\rangle}$. Of interest is also the \emph{maximum norm} $\|\ten{A}\|_{\infty} = \max_{i_1, ..., i_K} a_{i_1, ..., i_K}$. Furthermore, the Frobenius and maximum norm are also used for matrices while for a vector $\mat{a}$ the \emph{2 norm} is $\|\mat{a}\|_2 = \sqrt{\langle\mat{a}, \mat{a}\rangle}$.
+
+Matrices and tensor can be \emph{vectorized} by the \emph{vectorization} operator $\vec$. For tensors of order at least $2$ the \emph{flattening} (or \emph{unfolding} or \emph{matricization}) is a reshaping of the tensor into a matrix along an particular mode. For a tensor $\ten{A}$ of order $r$ and dimensions $d_1, ..., d_r$ the $k$-mode unfolding $\ten{A}_{(k)}$ is a $d_k\times \prod_{l=1, l\neq k}d_l$ matrix. For the tensor $\ten{A} = (a_{i_1,...,i_r})\in\mathbb{R}^{d_1, ..., d_r}$ the elements of the $k$ unfolded tensor $\ten{A}_{(k)}$ are
+\begin{displaymath}
+    (\ten{A}_{(k)})_{i_k, j} = a_{i_1, ..., i_r}\quad\text{ with }\quad j = 1 + \sum_{\substack{l = 1\\l \neq k}}^r (i_l - 1) \prod_{\substack{m = 1\\m\neq k}}^{l - 1}d_m.
+\end{displaymath}
+
+The rank of a tensor $\ten{A}$ of dimensions $d_1\times ...\times d_r$ is given by a vector $\rank{\ten{A}} = (a_1, ..., a_r)\in[d_1]\times...\times[d_r]$ where $a_k = \rank(\ten{A}_{(k)})$ is the usual matrix rank of the $k$ unfolded tensor.
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                \section{Quadratic Exponential Family GLM}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 \begin{description}
-    \item[Distribution] 
+    \item[Distribution]
        \begin{displaymath}
            f_{\mat{\theta}_y}(\ten{X}\mid Y = y) = h(\ten{X})\exp(\t{\mat{\eta}(\mat{\theta}_y)}\mat{t}(\ten{X}) - b(\mat{\theta}_y))
        \end{displaymath}
-    \item[(inverse) link] 
+    \item[(inverse) link]
        \begin{displaymath}
            \invlink(\mat{\eta}(\mat{\theta}_y)) = \E_{\mat{\theta}_y}[\mat{t}(\ten{X})\mid Y = y]
        \end{displaymath}
--- a/LaTeX/main.matrix.tex
+++ b/LaTeX/main.matrix.tex
@ -0,0 +1,815 @@
+\documentclass[a4paper, 10pt]{article}
+
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{fullpage}
+\usepackage{amsmath, amssymb, amstext, amsthm}
+\usepackage{bm}                         % \boldsymbol and italic corrections, ...
+\usepackage[pdftex]{hyperref}
+\usepackage{makeidx}                    % Index (Symbols, Names, ...)
+\usepackage{xcolor, graphicx}           % colors and including images
+\usepackage{tikz}
+\usepackage[
+    % backend=bibtex,
+    style=authoryear-comp
+]{biblatex}
+
+% Document meta into
+\title{Derivation of Gradient Descent Algorithm for K-PIR}
+\author{Daniel Kapla}
+\date{November 24, 2021}
+% Set PDF title, author and creator.
+\AtBeginDocument{
+    \hypersetup{
+        pdftitle = {Derivation of Gradient Descent Algorithm for K-PIR},
+        pdfauthor = {Daniel Kapla},
+        pdfcreator = {\pdftexbanner}
+    }
+}
+
+\makeindex
+
+% Bibliography resource(s)
+\addbibresource{main.bib}
+
+% Setup environments
+% Theorem, Lemma
+\theoremstyle{plain}
+\newtheorem{theorem}{Theorem}
+\newtheorem{lemma}{Lemma}
+\newtheorem{example}{Example}
+% Definition
+\theoremstyle{definition}
+\newtheorem{defn}{Definition}
+% Remark
+\theoremstyle{remark}
+\newtheorem{remark}{Remark}
+
+% Define math macros
+\newcommand{\mat}[1]{\boldsymbol{#1}}
+\newcommand{\ten}[1]{\mathcal{#1}}
+\renewcommand{\vec}{\operatorname{vec}}
+\newcommand{\dist}{\operatorname{dist}}
+\DeclareMathOperator{\kron}{\otimes}    % Kronecker Product
+\DeclareMathOperator{\hada}{\odot}      % Hadamard Product
+\newcommand{\ttm}[1][n]{\times_{#1}}    % n-mode product (Tensor Times Matrix)
+\DeclareMathOperator{\df}{\operatorname{df}}
+\DeclareMathOperator{\tr}{\operatorname{tr}}
+\DeclareMathOperator{\var}{Var}
+\DeclareMathOperator{\cov}{Cov}
+\DeclareMathOperator{\E}{\operatorname{\mathbb{E}}}
+% \DeclareMathOperator{\independent}{{\bot\!\!\!\bot}}
+\DeclareMathOperator*{\argmin}{{arg\,min}}
+\DeclareMathOperator*{\argmax}{{arg\,max}}
+\newcommand{\D}{\textnormal{D}}
+\renewcommand{\d}{\textnormal{d}}
+\renewcommand{\t}[1]{{#1^{\prime}}}
+\newcommand{\pinv}[1]{{#1^{\dagger}}} % `Moore-Penrose pseudoinverse`
+\newcommand{\todo}[1]{{\color{red}TODO: #1}}
+
+% \DeclareFontFamily{U}{mathx}{\hyphenchar\font45}
+% \DeclareFontShape{U}{mathx}{m}{n}{
+%       <5> <6> <7> <8> <9> <10>
+%       <10.95> <12> <14.4> <17.28> <20.74> <24.88>
+%       mathx10
+%       }{}
+% \DeclareSymbolFont{mathx}{U}{mathx}{m}{n}
+% \DeclareMathSymbol{\bigtimes}{1}{mathx}{"91}
+
+\begin{document}
+
+\maketitle
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                               Introduction                               %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Notation}
+We start with a brief summary of the used notation.
+
+\todo{write this}
+
+Let $\ten{A}$ be a multi-dimensional array of order (rank) $r$ with dimensions $p_1\times ... \times p_r$ and the matrices $\mat{B}_i$ of dimensions $q_i\times p_i$ for $i = 1, ..., r$, then
+\begin{displaymath}
+    \ten{A} \ttm[1] \mat{B}_1 \ttm[2] \ldots \ttm[r] \mat{B}_r
+        = \ten{A}\times\{ \mat{B}_1, ..., \mat{B}_r \}
+        = \ten{A}\times_{i\in[r]} \mat{B}_i
+        = (\ten{A}\times_{i\in[r]\backslash j} \mat{B}_i)\ttm[j]\mat{B}_j
+\end{displaymath}
+As an alternative example consider
+\begin{displaymath}
+    \ten{A}\times_2\mat{B}_2\times_3\mat{B}_3 = \ten{A}\times\{ \mat{I}, \mat{B}_2, \mat{B}_3 \} = \ten{A}\times_{i\in\{2, 3\}}\mat{B}_i
+\end{displaymath}
+Another example
+\begin{displaymath}
+    \mat{B}\mat{A}\t{\mat{C}} = \mat{A}\times_1\mat{B}\times_2\mat{C}
+        = \mat{A}\times\{\mat{B}, \mat{C}\}
+\end{displaymath}
+
+\begin{displaymath}
+    (\ten{A}\ttm[i]\mat{B})_{(i)} = \mat{B}\ten{A}_{(i)}
+\end{displaymath}
+
+\todo{continue}
+
+\section{Tensor Normal Distribution}
+Let $\ten{X}$ be a multi-dimensional array random variable of order (rank) $r$ with dimensions $p_1\times ... \times p_r$ written as
+\begin{displaymath}
+    \ten{X}\sim\mathcal{TN}(\mu, \mat{\Delta}_1, ..., \mat{\Delta}_r).
+\end{displaymath}
+Its density is given by
+\begin{displaymath}
+    f(\ten{X}) = \Big( \prod_{i = 1}^r \sqrt{(2\pi)^{p_i}|\mat{\Delta}_i|^{p_{-i}}} \Big)^{-1}
+        \exp\!\left( -\frac{1}{2}\langle \ten{X} - \mu, (\ten{X} - \mu)\times\{\mat{\Delta}_1^{-1}, ..., \mat{\Delta}_r^{-1}\} \rangle \right)
+\end{displaymath}
+where $p_{\lnot i} = \prod_{j \neq i}p_j$. This is equivalent to the vectorized $\vec\ten{X}$ following a Multi-Variate Normal distribution
+\begin{displaymath}
+    \vec{\ten{X}}\sim\mathcal{N}_{p}(\vec{\mu}, \mat{\Delta}_r\otimes...\otimes\mat{\Delta}_1)
+\end{displaymath}
+with $p = \prod_{i = 1}^r p_i$.
+
+\begin{theorem}[Tensor Normal to Multi-Variate Normal equivalence]
+    For a multi-dimensional random variable $\ten{X}$ of order $r$ with dimensions $p_1\times ..., p_r$. Let $\ten{\mu}$ be the mean of the same order and dimensions as $\ten{X}$ and the mode covariance matrices $\mat{\Delta}_i$ of dimensions $p_i\times p_i$ for $i = 1, ..., n$. Then the tensor normal distribution is equivalent to the multi-variate normal distribution by the relation
+    \begin{displaymath}
+        \ten{X}\sim\mathcal{TN}(\mu, \mat{\Delta}_1, ..., \mat{\Delta}_r)
+            \qquad\Leftrightarrow\qquad
+        \vec{\ten{X}}\sim\mathcal{N}_{p}(\vec{\mu}, \mat{\Delta}_r\otimes ...\otimes \mat{\Delta}_1)
+    \end{displaymath}
+    where $p = \prod_{i = 1}^r p_i$.
+\end{theorem}
+\begin{proof}
+    A straight forward way is to rewrite the Tensor Normal density as the density of a Multi-Variate Normal distribution depending on the vectorization of $\ten{X}$. First consider
+    \begin{align*}
+        \langle \ten{X} - \mu, (\ten{X} - \mu)\times\{\mat{\Delta}_1^{-1}, ..., \mat{\Delta}_r^{-1}\} \rangle
+            &= \t{\vec(\ten{X} - \mu)}\vec((\ten{X} - \mu)\times\{\mat{\Delta}_1^{-1}, ..., \mat{\Delta}_r^{-1}\}) \\
+            &= \t{\vec(\ten{X} - \mu)}(\mat{\Delta}_r^{-1}\otimes ...\otimes\mat{\Delta}_1^{-1})\vec(\ten{X} - \mu) \\
+            &= \t{(\vec\ten{X} - \vec\mu)}(\mat{\Delta}_r\otimes ...\otimes\mat{\Delta}_1)^{-1}(\vec\ten{X} - \vec\mu).
+    \end{align*}
+    Next, using a property of the determinant of a Kronecker product $|\mat{\Delta}_1\otimes\mat{\Delta}_2| = |\mat{\Delta}_1|^{p_2}|\mat{\Delta}_2|^{p_1}$ yields
+    \begin{displaymath}
+        |\mat{\Delta}_r\otimes...\otimes\mat{\Delta}_1|
+            = |\mat{\Delta}_r\otimes...\otimes\mat{\Delta}_2|^{p_1}|\mat{\Delta}_1|^{p_{\lnot 1}}
+    \end{displaymath}
+    where $p_{\lnot i} = \prod_{j \neq i}p_j$. By induction over $r$ the relation
+    \begin{displaymath}
+        |\mat{\Delta}_r\otimes...\otimes\mat{\Delta}_1|
+            = \prod_{i = 1}^r |\mat{\Delta}_i|^{p_{\lnot i}}
+    \end{displaymath}
+    holds for arbitrary order $r$. Substituting into the Tensor Normal density leads to
+    \begin{align*}
+        f(\ten{X}) = \Big( (2\pi)^p |\mat{\Delta}_r\otimes...\otimes\mat{\Delta}_1| \Big)^{-1/2}
+        \exp\!\left( -\frac{1}{2}\t{(\vec\ten{X} - \vec\mu)}(\mat{\Delta}_r\otimes ...\otimes\mat{\Delta}_1)^{-1}(\vec\ten{X} - \vec\mu) \right)
+    \end{align*}
+    which is the Multi-Variate Normal density of the $p$ dimensional vector $\vec\ten{X}$.
+\end{proof}
+
+When sampling from the Multi-Array Normal one way is to sample from the Multi-Variate Normal and then reshaping the result, but this is usually very inefficient because it requires to store the multi-variate covariance matrix which is very big. Instead, it is more efficient to sample $\ten{Z}$ as a tensor of the same shape as $\ten{X}$ with standard normal entries and then transform the $\ten{Z}$ to follow the Multi-Array Normal as follows
+\begin{displaymath}
+    \ten{Z}\sim\mathcal{TN}(0, \mat{I}_{p_1}, ..., \mat{I}_{p_r})
+        \quad\Rightarrow\quad
+    \ten{X} = \ten{Z}\times\{\mat{\Delta}_1^{1/2}, ..., \mat{\Delta}_r^{1/2}\} + \mu\sim\mathcal{TN}(\mu, \mat{\Delta}_1, ..., \mat{\Delta}_r).
+\end{displaymath}
+where the sampling from the standard Multi-Array Normal is done by sampling all of the elements of $\ten{Z}$ from a standard Normal.
+
+
+\section{Introduction}
+We assume the model
+\begin{displaymath}
+    \mat{X} = \mat{\mu} + \mat{\beta}\mat{f}_y \t{\mat{\alpha}} + \mat{\epsilon}
+\end{displaymath}
+where the dimensions of all the components are listed in Table~\ref{tab:dimensions}.
+and its vectorized form
+\begin{displaymath}
+    \vec\mat{X} = \vec\mat{\mu} + (\mat{\alpha}\kron\mat{\beta})\vec\mat{f}_y + \vec\mat{\epsilon}
+\end{displaymath}
+
+\begin{table}[!htp]
+    \centering
+    \begin{minipage}{0.8\textwidth}
+    \centering
+    \begin{tabular}{l l}
+        $\mat X, \mat\mu, \mat R, \mat\epsilon$     & $p\times q$ \\
+        $\mat{f}_y$                                 & $k\times r$ \\
+        $\mat\alpha$                                & $q\times r$ \\
+        $\mat\beta$                                 & $p\times k$ \\
+        $\mat\Delta$                                & $p q\times p q$ \\
+        $\mat\Delta_1$                              & $q\times q$ \\
+        $\mat\Delta_2$                              & $p\times p$ \\
+        $\mat{r}$                                   & $p q\times 1$ \\
+        \hline
+        $\ten{X}, \ten{R}$                          & $n\times p\times q$ \\
+        $\ten{F}$                                   & $n\times k\times r$ \\
+    \end{tabular}
+    \caption{\label{tab:dimensions}\small Summary listing of dimensions with the corresponding sample versions $\mat{X}_i, \mat{R}_i, \mat{r}_i, \mat{f}_{y_i}$ for $i = 1, ..., n$ as well as estimates $\widehat{\mat{\alpha}}, \widehat{\mat{\beta}}, \widehat{\mat\Delta}, \widehat{\mat\Delta}_1$ and $\widehat{\mat\Delta}_2$.}
+    \end{minipage}
+\end{table}
+
+The log-likelihood $l$ given $n$ i.i.d. observations assuming that $\mat{X}_i\mid(Y = y_i)$ is normal distributed as
+\begin{displaymath}
+    \vec\mat{X}_i \sim \mathcal{N}_{p q}(\vec\mat\mu + (\mat\alpha\kron\mat\beta)\vec\mat{f}_{y_i}, \Delta)
+\end{displaymath}
+Replacing all unknown by there estimates gives the (estimated) log-likelihood
+\begin{equation}\label{eq:log-likelihood-est}
+    \hat{l}(\mat\alpha, \mat\beta) = -\frac{n q p}{2}\log 2\pi - \frac{n}{2}\log|\widehat{\mat\Delta}| - \frac{1}{2}\sum_{i = 1}^n \t{\mat{r}_i}\widehat{\mat\Delta}^{-1}\mat{r}_i
+\end{equation}
+where the residuals are
+\begin{displaymath}
+    \mat{r}_i = \vec\mat{X}_i - \vec\overline{\mat{X}} - (\mat\alpha\kron\mat\beta)\vec{\mat f}_{y_i}\qquad (p q \times 1)
+\end{displaymath}
+and the MLE estimate assuming $\mat\alpha, \mat\beta$ known for the covariance matrix $\widehat{\mat\Delta}$ as solution to the score equations is
+\begin{equation}\label{eq:Delta}
+    \widehat{\mat\Delta} = \frac{1}{n}\sum_{i = 1}^n \mat{r}_i \t{\mat{r}_i} \qquad(p q \times p q).
+\end{equation}
+Note that the log-likelihood estimate $\hat{l}$ only depends on $\mat\alpha, \mat\beta$. Next, we compute the gradient for $\mat\alpha$ and $\mat\beta$ of $\hat{l}$ used to formulate a Gradient Descent base estimation algorithm for $\mat\alpha, \mat\beta$ as the previous algorithmic. The main reason is to enable an estimation for bigger dimensions of the $\mat\alpha, \mat\beta$ coefficients since the previous algorithm does \emph{not} solve the high run time problem for bigger dimensions.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                                Derivative                                %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Derivative of the Log-Likelihood}
+Start with the general case of $\mat X_i|(Y_i = y_i)$ is multivariate normal distributed with the covariance $\mat\Delta$ being a $p q\times p q$ positive definite symmetric matrix \emph{without} an further assumptions. We have $i = 1, ..., n$ observations following
+\begin{displaymath}
+    \mat{r}_i = \vec(\mat X_i - \mat\mu - \mat\beta\mat{f}_{y_i}\t{\mat\alpha}) \sim \mathcal{N}_{p q}(\mat 0, \mat\Delta).
+\end{displaymath}
+The MLE estimates of $\mat\mu, \mat\Delta$ are
+\begin{displaymath}
+    \widehat{\mat\mu} = \overline{\mat X} = \frac{1}{n}\sum_{i = 1}^n \mat X_i {\color{gray}\qquad(p\times q)},
+    \qquad \widehat{\mat\Delta} = \frac{1}{n}\sum_{i = 1}^n \mat r_i\t{\mat r_i} {\color{gray}\qquad(p q\times p q)}.
+\end{displaymath}
+Substitution of the MLE estimates into the log-likelihood $l(\mat\mu, \mat\Delta, \mat\alpha, \mat\beta)$ gives the estimated log-likelihood $\hat{l}(\mat\alpha, \mat\beta)$ as
+\begin{displaymath}
+    \hat{l}(\mat\alpha, \mat\beta) = -\frac{n q p}{2}\log 2\pi - \frac{n}{2}\log|\widehat{\mat\Delta}| - \frac{1}{2}\sum_{i = 1}^n \t{\mat{r}_i}\widehat{\mat\Delta}^{-1}\mat{r}_i.
+\end{displaymath}
+We are interested in the gradients $\nabla_{\mat\alpha}\hat{l}(\mat\alpha, \mat\beta)$, $\nabla_{\mat\beta}\hat{l}(\mat\alpha, \mat\beta)$ of the estimated log-likelihood. Therefore, we consider the differential of $\hat{l}$.
+\begin{align}
+    \d\hat{l}(\mat\alpha, \mat\beta)
+        &= -\frac{n}{2}\log|\widehat{\mat{\Delta}}| - \frac{1}{2}\sum_{i = 1}^n \big(\t{(\d \mat{r}_i)}\widehat{\mat{\Delta}}^{-1} \mat{r}_i + \t{\mat{r}_i}(\d\widehat{\mat{\Delta}}^{-1}) \mat{r}_i + \t{\mat{r}_i}\widehat{\mat{\Delta}}^{-1} \d \mat{r}_i\big) \nonumber\\
+        &= \underbrace{-\frac{n}{2}\log|\widehat{\mat{\Delta}}| - \frac{1}{2}\sum_{i = 1}^n \t{\mat{r}_i}(\d\widehat{\mat{\Delta}}^{-1}) \mat{r}_i}_{=\,0\text{ due to }\widehat{\mat{\Delta}}\text{ beeing the MLE}} \label{eq:deriv1}
+            - \sum_{i = 1}^n \t{\mat{r}_i}\widehat{\mat{\Delta}}^{-1} \d \mat{r}_i.
+\end{align}
+The next step is to compute $\d \mat{r}_i$ which depends on both $\mat\alpha$ and $\mat\beta$
+\begin{align*}
+    \d\mat{r}_i(\mat\alpha, \mat\beta)
+        &= -\d(\mat\alpha\kron \mat\beta)\vec\mat{f}_{y_i} \\
+        &= -\vec\!\big( \mat{I}_{p q}\,\d(\mat\alpha\kron \mat\beta)\vec\mat{f}_{y_i} \big) \\
+        &= -(\t{\vec(\mat{f}_{y_i})}\kron \mat{I}_{p q})\,\d\vec(\mat\alpha\kron \mat\beta) \\
+\intertext{using the identity \ref{eq:vecKron}, to obtain vectorized differentials, gives}
+    \dots
+        &= -(\t{\vec(\mat{f}_{y_i})}\kron \mat{I}_{p q})(\mat{I}_r\kron\mat{K}_{k,q}\kron\mat{I}_p) \,\d(\vec \mat\alpha\kron \vec \mat\beta) \\
+        &= -(\t{\vec(\mat{f}_{y_i})}\kron \mat{I}_{p q})(\mat{I}_r\kron\mat{K}_{k,q}\kron\mat{I}_p) \big((\d\vec \mat\alpha)\kron \vec \mat\beta + \vec \mat\alpha\kron (\d\vec \mat\beta)\big) \\
+        &= -(\t{\vec(\mat{f}_{y_i})}\kron \mat{I}_{p q})(\mat{I}_r\kron\mat{K}_{k,q}\kron\mat{I}_p) \big(\mat{I}_{r q}(\d\vec \mat\alpha)\kron (\vec \mat\beta)\mat{I}_1 + (\vec \mat\alpha)\mat{I}_1\kron \mat{I}_{k p}(\d\vec \mat\beta)\big) \\
+        &= -(\t{\vec(\mat{f}_{y_i})}\kron \mat{I}_{p q})(\mat{I}_r\kron\mat{K}_{k,q}\kron\mat{I}_p) \big((\mat{I}_{r q}\kron\vec \mat\beta)\d\vec \mat\alpha + (\vec \mat\alpha\kron \mat{I}_{k p})\d\vec \mat\beta\big)
+\end{align*}
+Now, substitution of $\d\mat{r}_i$ into \eqref{eq:deriv1} gives the gradients (not dimension standardized versions of $\D\hat{l}(\mat\alpha)$, $\D\hat{l}(\mat\beta)$) by identification of the derivatives from the differentials (see: \todo{appendix})
+\begin{align*}
+    \nabla_{\mat\alpha}\hat{l}(\mat\alpha, \mat\beta) &=
+        \sum_{i = 1}^n (\t{\vec(\mat{f}_{y_i})}\kron\t{\mat{r}_i}\widehat{\mat\Delta}^{-1}) (\mat{I}_r\kron\mat{K}_{k,q}\kron\mat{I}_p) (\mat{I}_{r q}\kron\vec \mat\beta),
+        {\color{gray}\qquad(q\times r)} \\
+    \nabla_{\mat\beta}\hat{l}(\mat\alpha, \mat\beta)  &=
+        \sum_{i = 1}^n (\t{\vec(\mat{f}_{y_i})}\kron\t{\mat{r}_i}\widehat{\mat\Delta}^{-1}) (\mat{I}_r\kron\mat{K}_{k,q}\kron\mat{I}_p) (\vec \mat\alpha\kron \mat{I}_{k p}).
+        {\color{gray}\qquad(p\times k)}
+\end{align*}
+These quantities are very verbose as well as completely unusable for an implementation. By detailed analysis of the gradients we see that the main parts are only element permutations with a high sparsity. By defining the following compact matrix
+\begin{equation}\label{eq:permTransResponse}
+    \mat G = \vec^{-1}_{q r}\bigg(\Big( \sum_{i = 1}^n \vec\mat{f}_{y_i}\otimes \widehat{\mat\Delta}^{-1}\mat{r}_i \Big)_{\pi(i)}\bigg)_{i = 1}^{p q k r}{\color{gray}\qquad(q r \times p k)}
+\end{equation}
+with $\pi$ being a permutation of $p q k r$ elements corresponding to permuting the axis of a 4D tensor of dimensions $p\times q\times k\times r$ by $(2, 4, 1, 3)$. As a generalization of transposition this leads to a rearrangement of the elements corresponding to the permuted 4D tensor with dimensions $q\times r\times p\times k$ which is then vectorized and reshaped into a matrix of dimensions $q r \times p k$. With $\mat G$ the gradients simplify to \todo{validate this mathematically}
+\begin{align*}
+    \nabla_{\mat\alpha}\hat{l}(\mat\alpha, \mat\beta) &=
+        \vec_{q}^{-1}(\mat{G}\vec{\mat\beta}),
+        {\color{gray}\qquad(q\times r)} \\
+    \nabla_{\mat\beta}\hat{l}(\mat\alpha, \mat\beta)  &=
+        \vec_{p}^{-1}(\t{\mat{G}}\vec{\mat\alpha}).
+        {\color{gray}\qquad(p\times k)}
+\end{align*}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                      Kronecker Covariance Structure                      %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Kronecker Covariance Structure}
+Now we assume the residuals covariance has the form $\mat\Delta = \mat\Delta_1\otimes\mat\Delta_2$ where $\mat\Delta_1$, $\mat\Delta_2$ are $q\times q$, $p\times p$ covariance matrices, respectively. This is analog to the case that $\mat{R}_i$'s are i.i.d. Matrix Normal distribution
+\begin{displaymath}
+    \mat{R}_i = \mat{X}_i - \mat\mu - \mat\beta\mat{f}_{y_i}\t{\mat\alpha} \sim \mathcal{MN}_{p\times q}(\mat 0, \mat\Delta_2, \mat\Delta_1).
+\end{displaymath}
+The density of the Matrix Normal (with mean zero) is equivalent to the vectorized quantities being multivariate normal distributed with Kronecker structured covariance
+\begin{align*}
+    f(\mat R)
+        &= \frac{1}{\sqrt{(2\pi)^{p q}|\mat\Delta|}}\exp\left(-\frac{1}{2}\t{\vec(\mat{R})} \mat\Delta^{-1}\vec(\mat{R})\right) \\
+        &= \frac{1}{(2\pi)^{p q / 2}|\mat\Delta_1|^{p / 2}|\mat\Delta_2|^{q / 2}}\exp\left(-\frac{1}{2}\tr(\mat\Delta_1^{-1}\t{\mat{R}}\mat\Delta_2^{-1}\mat{R})\right)
+\end{align*}
+which leads for given data to the log-likelihood
+\begin{displaymath}
+    l(\mat{\mu}, \mat\Delta_1, \mat\Delta_2) =
+        -\frac{n p q}{2}\log 2\pi
+        -\frac{n p}{2}\log|\mat{\Delta}_1|
+        -\frac{n q}{2}\log|\mat{\Delta}_2|
+        -\frac{1}{2}\sum_{i = 1}^n \tr(\mat\Delta_1^{-1}\t{\mat{R}_i}\mat\Delta_2^{-1}\mat{R}_i).
+\end{displaymath}
+\subsection{MLE covariance estimates}
+Out first order of business is to derive the MLE estimated of the covariance matrices $\mat\Delta_1$, $\mat\Delta_2$ (the mean estimate $\widehat{\mat\mu}$ is trivial). Therefore, we look at the differentials with respect to changes in the covariance matrices as
+\begin{align*}
+    \d l(\mat\Delta_1, \mat\Delta_2) &=
+    -\frac{n p}{2}\d\log|\mat{\Delta}_1|
+    -\frac{n q}{2}\d\log|\mat{\Delta}_2|
+    -\frac{1}{2}\sum_{i = 1}^n
+        \tr(  (\d\mat\Delta_1^{-1})\t{\mat{R}_i}\mat\Delta_2^{-1}\mat{R}_i
+            + \mat\Delta_1^{-1}\t{\mat{R}_i}(\d\mat\Delta_2^{-1})\mat{R}_i) \\
+        &=
+    -\frac{n p}{2}\tr\mat{\Delta}_1^{-1}\d\mat{\Delta}_1
+    -\frac{n q}{2}\tr\mat{\Delta}_2^{-1}\d\mat{\Delta}_2 \\
+        &\qquad\qquad
+    +\frac{1}{2}\sum_{i = 1}^n
+        \tr(  \mat\Delta_1^{-1}(\d\mat\Delta_1)\mat\Delta_1^{-1}\t{\mat{R}_i}\mat\Delta_2^{-1}\mat{R}_i
+            + \mat\Delta_1^{-1}\t{\mat{R}_i}\mat\Delta_2^{-1}(\d\mat\Delta_2)\mat\Delta_2^{-1}\mat{R}_i) \\
+        &= \frac{1}{2}\tr\!\Big(\Big(
+            -n p \mat{I}_q + \mat\Delta_1^{-1}\sum_{i = 1}^n \t{\mat{R}_i}\mat\Delta_2^{-1}\mat{R}_i
+        \Big)\mat{\Delta}_1^{-1}\d\mat{\Delta}_1\Big) \\
+            &\qquad\qquad
+            + \frac{1}{2}\tr\!\Big(\Big(
+            -n q \mat{I}_p + \mat\Delta_2^{-1}\sum_{i = 1}^n \mat{R}_i\mat\Delta_1^{-1}\t{\mat{R}_i}
+        \Big)\mat{\Delta}_2^{-1}\d\mat{\Delta}_2\Big) \overset{!}{=} 0.
+\end{align*}
+Setting $\d l$ to zero yields the MLE estimates as
+\begin{displaymath}
+    \widehat{\mat{\mu}} = \overline{\mat X}{\color{gray}\quad(p\times q)}, \qquad
+    \widehat{\mat\Delta}_1 = \frac{1}{n p}\sum_{i = 1}^n \t{\mat{R}_i}\widehat{\mat\Delta}_2^{-1}\mat{R}_i{\color{gray}\quad(q\times q)}, \qquad
+    \widehat{\mat\Delta}_2 = \frac{1}{n q}\sum_{i = 1}^n \mat{R}_i\widehat{\mat\Delta}_1^{-1}\t{\mat{R}_i}{\color{gray}\quad(p\times p)}.
+\end{displaymath}
+Next, analog to above, we take the estimated log-likelihood and derive gradients with respect to $\mat{\alpha}$, $\mat{\beta}$.
+The estimated log-likelihood derives by replacing the unknown covariance matrices by there MLE estimates leading to
+\begin{displaymath}
+    \hat{l}(\mat\alpha, \mat\beta) =
+        -\frac{n p q}{2}\log 2\pi
+        -\frac{n p}{2}\log|\widehat{\mat{\Delta}}_1|
+        -\frac{n q}{2}\log|\widehat{\mat{\Delta}}_2|
+        -\frac{1}{2}\sum_{i = 1}^n \tr(\widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}\mat{R}_i)
+\end{displaymath}
+and its differential
+\begin{displaymath}
+    \d\hat{l}(\mat\alpha, \mat\beta) =
+        -\frac{n p}{2}\d\log|\widehat{\mat{\Delta}}_1|
+        -\frac{n q}{2}\d\log|\widehat{\mat{\Delta}}_2|
+        -\frac{1}{2}\sum_{i = 1}^n \d\tr(\widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}\mat{R}_i).
+\end{displaymath}
+We first take a closer look at the sum. After a bit of algebra using $\d\mat A^{-1} = -\mat A^{-1}(\d\mat A)\mat A^{-1}$ and the definitions of $\widehat{\mat\Delta}_1$, $\widehat{\mat\Delta}_2$ the sum can be rewritten
+\begin{displaymath}
+    \frac{1}{2}\sum_{i = 1}^n \d\tr(\widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}\mat{R}_i)
+        = \sum_{i = 1}^n \tr(\widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}\d\mat{R}_i)
+            - \frac{np}{2}\d\log|\widehat{\mat\Delta}_1|
+            - \frac{nq}{2}\d\log|\widehat{\mat\Delta}_2|.
+\end{displaymath}
+This means that most of the derivative cancels out and we get
+\begin{align*}
+    \d\hat{l}(\mat\alpha, \mat\beta)
+        &= \sum_{i = 1}^n \tr(\widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}\d\mat{R}_i) \\
+        &= \sum_{i = 1}^n \tr(\widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}((\d\mat\beta)\mat{f}_{y_i}\t{\mat\alpha} + \mat\beta\mat{f}_{y_i}\t{(\d\mat\alpha}))) \\
+        &= \sum_{i = 1}^n \t{\vec(\widehat{\mat{\Delta}}_2^{-1}\mat{R}_i\widehat{\mat{\Delta}}_1^{-1}\mat\alpha\t{\mat{f}_{y_i}})}\d\vec\mat\beta
+            + \sum_{i = 1}^n \t{\vec(\widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}\mat\beta\mat{f}_{y_i})}\d\vec\mat\alpha
+\end{align*}
+which means the gradients are
+\begin{align*}
+    \nabla_{\mat\alpha}\hat{l}(\mat\alpha, \mat\beta)
+        &= \sum_{i = 1}^n \widehat{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widehat{\mat{\Delta}}_2^{-1}\mat\beta\mat{f}_{y_i}
+        = (\ten{R}\ttm[3]\widehat{\mat{\Delta}}_1^{-1}\ttm[2]\widehat{\mat{\Delta}}_2^{-1})_{(3)}\t{(\ten{F}\ttm[2]\mat\beta)_{(3)}}\\
+    \nabla_{\mat\beta}\hat{l}(\mat\alpha, \mat\beta)
+        &= \sum_{i = 1}^n \widehat{\mat{\Delta}}_2^{-1}\mat{R}_i\widehat{\mat{\Delta}}_1^{-1}\mat\alpha\t{\mat{f}_{y_i}}
+        = (\ten{R}\ttm[3]\widehat{\mat{\Delta}}_1^{-1}\ttm[2]\widehat{\mat{\Delta}}_2^{-1})_{(2)}\t{(\ten{F}\ttm[3]\mat\alpha)_{(2)}}
+\end{align*}
+
+\paragraph{Comparison to the general case:} There are two main differences, first the general case has a closed form solution for the gradient due to the explicit nature of the MLE estimate of $\widehat{\mat\Delta}$ compared to the mutually dependent MLE estimates $\widehat{\mat\Delta}_1$, $\widehat{\mat\Delta}_2$. On the other hand the general case has dramatically bigger dimensions of the covariance matrix ($p q \times p q$) compared to the two Kronecker components with dimensions $q \times q$ and $p \times p$. This means that in the general case there is a huge performance penalty in the dimensions of $\widehat{\mat\Delta}$ while in the other case an extra estimation is required to determine $\widehat{\mat\Delta}_1$, $\widehat{\mat\Delta}_2$.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                     Alternative covariance estimates                     %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{Alternative covariance estimates}
+An alternative approach is \emph{not} to use the MLE estimates for $\mat\Delta_1$, $\mat\Delta_2$ but (up to scaling) unbiased estimates.
+\begin{displaymath}
+    \widetilde{\mat\Delta}_1 = \frac{1}{n}\sum_{i = 1}^n \t{\mat{R}_i}\mat{R}_i {\color{gray}\quad(q\times q)},\qquad
+    \widetilde{\mat\Delta}_2 = \frac{1}{n}\sum_{i = 1}^n \mat{R}_i\t{\mat{R}_i} {\color{gray}\quad(p\times p)}.
+\end{displaymath}
+The unbiasednes comes directly from the following short computation;
+\begin{displaymath}
+    (\E\widetilde{\mat\Delta}_1)_{j,k} = \frac{1}{n}\sum_{i = 1}^n \sum_{l = 1}^p \E \mat{R}_{i,l,j}\mat{R}_{i,l,k}
+        = \frac{1}{n}\sum_{i = 1}^n \sum_{l = 1}^p (\mat{\Delta}_{2})_{l,l}(\mat{\Delta}_{1})_{j,k}
+        = (\mat\Delta_1\tr(\mat\Delta_2))_{j,k}.
+\end{displaymath}
+which means that $\E\widetilde{\mat\Delta}_1 = \mat\Delta_1\tr(\mat\Delta_2)$ and in analogy $\E\widetilde{\mat\Delta}_2 = \mat\Delta_2\tr(\mat\Delta_1)$. Now, we need to handle the scaling which can be estimated unbiasedly by
+\begin{displaymath}
+    \tilde{s} = \frac{1}{n}\sum_{i = 1}^n \|\mat{R}_i\|_F^2
+\end{displaymath}
+because with $\|\mat{R}_i\|_F^2 = \tr \mat{R}_i\t{\mat{R}_i} = \tr \t{\mat{R}_i}\mat{R}_i$ the scale estimate $\tilde{s} = \tr(\widetilde{\mat\Delta}_1) = \tr(\widetilde{\mat\Delta}_2)$. Then $\E\tilde{s} = \tr(\E\widetilde{\mat\Delta}_1) = \tr{\mat\Delta}_1 \tr{\mat\Delta}_2 = \tr({\mat\Delta}_1\otimes{\mat\Delta}_2)$. Leading to the estimate of the covariance as
+\begin{displaymath}
+    \widetilde{\mat\Delta} = \tilde{s}^{-1}(\widetilde{\mat{\Delta}}_1\otimes\widetilde{\mat{\Delta}}_2)
+\end{displaymath}
+
+\todo{ prove they are consistent, especially $\widetilde{\mat\Delta} = \tilde{s}^{-1}(\widetilde{\mat\Delta}_1\otimes\widetilde{\mat\Delta}_2)$!}
+
+The hoped for a benefit is that these covariance estimates are in a closed form which means there is no need for an additional iterative estimations step. Before we start with the derivation of the gradients define the following two quantities
+\begin{align*}
+    \mat{S}_1 = \frac{1}{n}\sum_{i = 1}^n \t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i = \frac{1}{n}\ten{R}_{(3)}\t{(\ten{R}\ttm[2]\widetilde{\mat{\Delta}}_2^{-1})_{(3)}}\quad{\color{gray}(q\times q)}, \\
+    \mat{S}_2 = \frac{1}{n}\sum_{i = 1}^n \mat{R}_i\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i} = \frac{1}{n}\ten{R}_{(2)}\t{(\ten{R}\ttm[3]\widetilde{\mat{\Delta}}_1^{-1})_{(2)}}\quad{\color{gray}(p\times p)}.
+\end{align*}
+\todo{Check tensor form!}
+
+Now, the matrix normal with the covariance matrix of the vectorized quantities of the form $\mat{\Delta} = s^{-1}(\mat{\Delta}_1\otimes\mat{\Delta}_2)$ has the form
+\begin{align*}
+    f(\mat R)
+        &= \frac{1}{\sqrt{(2\pi)^{p q}|\mat\Delta|}}\exp\left(-\frac{1}{2}\t{\vec(\mat{R})} \mat\Delta^{-1}\vec(\mat{R})\right) \\
+        &= \frac{s^{p q / 2}}{(2\pi)^{p q / 2}|\mat\Delta_1|^{p / 2}|\mat\Delta_2|^{q / 2}}\exp\left(-\frac{s}{2}\tr(\mat\Delta_1^{-1}\t{\mat{R}}\mat\Delta_2^{-1}\mat{R})\right)
+\end{align*}
+
+The approximated log-likelihood is then
+\begin{align*}
+    \tilde{l}(\mat\alpha, \mat\beta)
+        &=
+        -\frac{n p q}{2}\log{2\pi}
+        -\frac{n}{2}\log|\widetilde{\mat{\Delta}}|
+        -\frac{1}{2}\sum_{i = 1}^n \t{\mat{r}_i}\widetilde{\mat{\Delta}}^{-1}\mat{r}_i \\
+        &=
+        -\frac{n p q}{2}\log{2\pi}
+        +\frac{n p q}{2}\log\tilde{s}
+        -\frac{n p}{2}\log|\widetilde{\mat{\Delta}}_1|
+        -\frac{n q}{2}\log|\widetilde{\mat{\Delta}}_2|
+        -\frac{\tilde{s}}{2}\sum_{i = 1}^n \tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i).
+\end{align*}
+The second form is due to the property of the determinant for scaling and the Kronecker product giving that $|\widetilde{\mat\Delta}| = (\tilde{s}^{-1})^{p q}|\widetilde{\mat{\Delta}}_1|^p |\widetilde{\mat{\Delta}}_2|^q$ as well as an analog Kronecker decomposition as in the MLE case.
+
+Note that with the following holds
+\begin{displaymath}
+    \sum_{i = 1}^n \tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i)
+    = n \tr(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1)
+    = n \tr(\widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2)
+    = n \tr(\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1})
+    = n \tr(\mat{S}_2\widetilde{\mat{\Delta}}_2^{-1}).
+\end{displaymath}
+
+The derivation of the Gradient of the approximated log-likelihood $\tilde{l}$ is tedious but straight forward. We tackle the summands separately;
+\begin{align*}
+    \d\log\tilde{s} &= \tilde{s}^{-1}\d\tilde{s} = \frac{2}{n\tilde{s}}\sum_{i = 1}^n \tr(\t{\mat{R}_i}\d\mat{R}_i)
+        = -\frac{2}{n\tilde{s}}\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\mat{R}_i\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\t{\mat{R}_i}\d\mat{\beta}), \\
+    \d\log|\widetilde{\mat{\Delta}}_1| &=\tr(\widetilde{\mat{\Delta}}_1^{-1}\d\widetilde{\mat{\Delta}}_1) = \frac{2}{n}\sum_{i = 1}^n \tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\d\mat{R}_i)
+        = -\frac{2}{n}\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\mat{R}_i\widetilde{\mat{\Delta}}_1^{-1}\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\d\mat{\beta}), \\
+    \d\log|\widetilde{\mat{\Delta}}_2| &=\tr(\widetilde{\mat{\Delta}}_2^{-1}\d\widetilde{\mat{\Delta}}_2) = \frac{2}{n}\sum_{i = 1}^n \tr(\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\d\mat{R}_i)
+        = -\frac{2}{n}\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\d\mat{\beta})
+\end{align*}
+as well as
+\begin{displaymath}
+    \d\,\tilde{s}\sum_{i = 1}^n \tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i)
+        = (\d\tilde{s})\sum_{i = 1}^n \tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i)
+            + \tilde{s}\, \d \sum_{i = 1}^n \tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i).
+\end{displaymath}
+We have
+\begin{displaymath}
+    \d\tilde{s} = -\frac{2}{n}\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\mat{R}_i\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\t{\mat{R}_i}\d\mat{\beta})
+\end{displaymath}
+and the remaining term
+\begin{align*}
+    \d\sum_{i = 1}^n\tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i)
+         = 2\sum_{i = 1}^n \tr(&\t{\mat{f}_{y_i}}\t{\mat{\beta }}(\mat{R}_i   \widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1}  +    \widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2\widetilde{\mat{\Delta}}_2^{-1} \mat{R}_i - \widetilde{\mat{\Delta}}_2^{-1}   \mat{R}_i \widetilde{\mat{\Delta}}_1^{-1})\d\mat{\alpha} \\
+         +\,&\mat{f}_{y_i} \t{\mat{\alpha}}(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i} + \t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2\widetilde{\mat{\Delta}}_2^{-1} - \widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1})\d\mat{\beta }).
+\end{align*}
+The last one is tedious but straight forward. Its computation extensively uses the symmetry of $\widetilde{\mat{\Delta}}_1$, $\widetilde{\mat{\Delta}}_2$, the cyclic property of the trace and the relation $\d\mat{A}^{-1} = -\mat{A}^{-1}(\d\mat{A})\mat{A}^{-1}$.
+
+Putting it all together
+\begin{align*}
+    \d\tilde{l}(\mat{\alpha}, \mat{\beta})
+       &= \frac{n p q}{2}\Big(-\frac{2}{n\tilde{s}}\Big)\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\mat{R}_i\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\t{\mat{R}_i}\d\mat{\beta}) \\
+       &\hspace{3em} - \frac{n p}{2}\Big(-\frac{2}{n}\Big)\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\mat{R}_i\widetilde{\mat{\Delta}}_1^{-1}\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\d\mat{\beta}) \\
+       &\hspace{3em} - \frac{n q}{2}\Big(-\frac{2}{n}\Big)\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\d\mat{\beta}) \\
+       &\hspace{3em} -\frac{1}{2}\Big(-\frac{2}{n}\Big)\Big(\sum_{i = 1}^n \tr(\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i)\Big)\sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\mat{R}_i\d\mat{\alpha} + \mat{f}_{y_i}\t{\mat{\alpha}}\t{\mat{R}_i}\d\mat{\beta}) \\
+       &\hspace{3em} -\frac{\tilde{s}}{2}2\sum_{i = 1}^n \tr\!\Big(\t{\mat{f}_{y_i}}\t{\mat{\beta }}(\mat{R}_i \widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1} + \widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2\widetilde{\mat{\Delta}}_2^{-1} \mat{R}_i - \widetilde{\mat{\Delta}}_2^{-1} \mat{R}_i \widetilde{\mat{\Delta}}_1^{-1})\d\mat{\alpha} \\
+       &\hspace{3em} \hspace{4.7em} + \mat{f}_{y_i} \t{\mat{\alpha}}(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i} + \t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2\widetilde{\mat{\Delta}}_2^{-1} - \widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1})\d\mat{\beta }\Big) \\
+%
+       &= \sum_{i = 1}^n \tr\bigg(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\Big(
+            -p q \tilde{s}^{-1} \mat{R}_i + p \mat{R}_i\widetilde{\mat{\Delta}}_1^{-1} + q \widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i + \tr(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1)\mat{R}_i \\
+       &\hspace{3em} \hspace{4.7em} - \tilde{s}(\mat{R}_i \widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1} + \widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2\widetilde{\mat{\Delta}}_2^{-1} \mat{R}_i - \widetilde{\mat{\Delta}}_2^{-1} \mat{R}_i \widetilde{\mat{\Delta}}_1^{-1})
+       \Big)\d\mat{\alpha}\bigg) \\
+       &\hspace{3em}+ \sum_{i = 1}^n \tr\bigg(\mat{f}_{y_i}\t{\mat{\alpha}}\Big(
+            -p q \tilde{s}^{-1} \t{\mat{R}_i} + p \widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i} + q \t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1} + \tr(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1)\t{\mat{R}_i} \\
+       &\hspace{3em}\hspace{3em} \hspace{4.7em} - \tilde{s}(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1}\t{\mat{R}_i} + \t{\mat{R}_i}\widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2\widetilde{\mat{\Delta}}_2^{-1} - \widetilde{\mat{\Delta}}_1^{-1} \t{\mat{R}_i} \widetilde{\mat{\Delta}}_2^{-1})
+       \Big)\d\mat{\beta}\bigg).
+\end{align*}
+Observe that the bracketed expressions before $\d\mat{\alpha}$ and $\d\mat{\beta}$ are transposes. Lets denote the expression for $\d\mat{\alpha}$ as $\mat{G}_i$ which has the form
+\begin{displaymath}
+    \mat{G}_i
+        = (\tr(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1) - p q \tilde{s}^{-1})\mat{R}_i
+        + (q\mat{I}_p - \tilde{s}\widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2)\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i
+        + \mat{R}_i\widetilde{\mat{\Delta}}_1^{-1}(p\mat{I}_q - \tilde{s}\mat{S}_1\widetilde{\mat{\Delta}}_1^{-1})
+        + \tilde{s}\widetilde{\mat{\Delta}}_2^{-1}\mat{R}_i\widetilde{\mat{\Delta}}_1^{-1}
+\end{displaymath}
+and with $\mathcal{G}$ the order 3 tensor stacking the $\mat{G}_i$'s such that the first mode indexes the observation
+\begin{displaymath}
+    \ten{G}
+        = (\tr(\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1) - p q \tilde{s}^{-1})\ten{R}
+        + \ten{R}\ttm[2](q\mat{I}_p - \tilde{s}\widetilde{\mat{\Delta}}_2^{-1}\mat{S}_2)\widetilde{\mat{\Delta}}_2^{-1}
+        + \ten{R}\ttm[3](p\mat{I}_q - \tilde{s}\widetilde{\mat{\Delta}}_1^{-1}\mat{S}_1)\widetilde{\mat{\Delta}}_1^{-1}
+        + \tilde{s}\ten{R}\ttm[2]\widetilde{\mat{\Delta}}_2^{-1}\ttm[3]\widetilde{\mat{\Delta}}_1^{-1}
+\end{displaymath}
+This leads to the following form of the differential of $\tilde{l}$ given by
+\begin{displaymath}
+    \d\tilde{l}(\mat{\alpha}, \mat{\beta})
+        = \sum_{i = 1}^n \tr(\t{\mat{f}_{y_i}}\t{\mat{\beta}}\mat{G}_i\d\mat{\alpha})
+        + \sum_{i = 1}^n \tr(\mat{f}_{y_i}\t{\mat{\alpha}}\t{\mat{G}_i}\d\mat{\beta})
+\end{displaymath}
+and therefore the gradients
+\begin{align*}
+    \nabla_{\mat{\alpha}}\tilde{l}(\mat{\alpha}, \mat{\beta}) &= \sum_{i = 1}^n \t{\mat{G}_i}\mat{\beta}\mat{f}_{y_i}
+        = \ten{G}_{(3)}\t{(\ten{F}\ttm[2]\mat{\beta})_{(3)}}, \\
+    \nabla_{\mat{\beta}} \tilde{l}(\mat{\alpha}, \mat{\beta}) &= \sum_{i = 1}^n \mat{G}_i\mat{\alpha}\t{\mat{f}_{y_i}}
+        = \ten{G}_{(2)}\t{(\ten{F}\ttm[3]\mat{\alpha})_{(2)}}.
+\end{align*}
+
+\todo{check the tensor version of the gradient!!!}
+
+\newpage
+
+\section{Thoughts on initial value estimation}
+\todo{This section uses an alternative notation as it already tries to generalize to general multi-dimensional arrays. Furthermore, one of the main differences is that the observation are indexed in the \emph{last} mode. The benefit of this is that the mode product and parameter matrix indices match not only in the population model but also in sample versions.}
+Let $\ten{X}, \ten{F}$ be order (rank) $r$ tensors of dimensions $p_1\times ... \times p_r$ and $q_1\times ... \times q_r$, respectively. Also denote the error tensor $\epsilon$ of the same order and dimensions as $\ten{X}$. The considered model for the $i$'th observation is
+\begin{displaymath}
+    \ten{X}_i = \ten{\mu} + \ten{F}_i\times\{ \mat{\alpha}_1, ..., \mat{\alpha}_r \} + \ten{\epsilon}_i
+\end{displaymath}
+where we assume $\ten{\epsilon}_i$ to be i.i.d. mean zero tensor normal distributed $\ten{\epsilon}\sim\mathcal{TN}(0, \mat{\Delta}_1, ..., \mat{\Delta}_r)$ for $\mat{\Delta}_j\in\mathcal{S}^{p_j}_{++}$, $j = 1, ..., r$. Given $i = 1, ..., n$ observations the collected model containing all observations
+\begin{displaymath}
+    \ten{X} = \ten{\mu} + \ten{F}\times\{ \mat{\alpha}_1, ..., \mat{\alpha}_r, \mat{I}_n \} + \ten{\epsilon}
+\end{displaymath}
+which is almost identical as the observations $\ten{X}_i, \ten{F}_i$ are stacked on an addition $r + 1$ mode leading to response, predictor and error tensors $\ten{X}, \ten{F}$ of order (rank) $r + 1$ and dimensions $p_1\times...\times p_r\times n$ for $\ten{X}, \ten{\epsilon}$ and $q_1\times...\times q_r\times n$ for $\ten{F}$.
+
+In the following we assume w.l.o.g that $\ten{\mu} = 0$, as if this is not true we simply replace $\ten{X}_i$ with $\ten{X}_i - \ten{\mu}$ for $i = 1, ..., n$ before collecting all the observations in the response tensor $\ten{X}$.
+
+The goal here is to find reasonable estimates for $\mat{\alpha}_j$, $j = 1, ..., n$ for the mean model
+\begin{displaymath}
+    \E \ten{X}|\ten{F}, \mat{\alpha}_1, ..., \mat{\alpha}_r = \ten{F}\times\{\mat{\alpha}_1, ..., \mat{\alpha}_r, \mat{I}_n\}
+        = \ten{F}\times_{j\in[r]}\mat{\alpha}_j.
+\end{displaymath}
+Under the mean model we have using the general mode product relation $(\ten{A}\times_j\mat{B})_{(j)} = \mat{B}\ten{A}_{(j)}$ we get
+\begin{align*}
+    \ten{X}_{(j)}\t{\ten{X}_{(j)}} \overset{\text{SVD}}{=} \mat{U}_j\mat{D}_j\t{\mat{U}_j}
+        = \mat{\alpha}_j(\ten{F}\times_{k\in[r]\backslash j}\mat{\alpha}_k)_{(j)}
+          \t{(\ten{F}\times_{k\in[r]\backslash j}\mat{\alpha}_k)_{(j)}}\t{\mat{\alpha}_j}
+\end{align*}
+for the $j = 1, ..., r$ modes. Using this relation we construct an iterative estimation process by setting the initial estimates of $\hat{\mat{\alpha}}_j^{(0)} = \mat{U}_j[, 1:q_j]$ which are the first $q_j$ columns of $\mat{U}_j$.
+
+For getting least squares estimates for $\mat{\alpha}_j$, $j = 1, ..., r$ we observe that by matricization of the mean model
+\begin{displaymath}
+    \ten{X}_{(j)} = (\ten{F}\times_{k\in[r]}\mat{\alpha}_k)_{(j)} = \mat{\alpha}_j(\ten{F}\times_{k\in[r]\backslash j}\mat{\alpha}_k)_{(j)}
+\end{displaymath}
+leads to normal equations for each $\mat{\alpha}_j$, $j = 1, ..., r$
+\begin{displaymath}
+    \ten{X}_{(j)}\t{(\ten{F}\times_{k\in[r]\backslash j}\mat{\alpha}_k)_{(j)}} = \mat{\alpha}_j(\ten{F}\times_{k\in[r]\backslash j}\mat{\alpha}_k)_{(j)}\t{(\ten{F}\times_{k\in[r]\backslash j}\mat{\alpha}_k)_{(j)}}
+\end{displaymath}
+where the normal equations for $\mat{\alpha}_j$ depend on all the other $\mat{\alpha}_k$. With the initial estimates from above this allows an alternating approach. Index with $t = 1, ...$ the current iteration, then a new estimate $\widehat{\mat{\alpha}}_j^{(t)}$ given the previous estimates $\widehat{\mat{\alpha}}_k^{(t-1)}$, $k = 1, ..., r$ is computed as
+\begin{displaymath}
+    \widehat{\mat{\alpha}}_j^{(t)} =
+        \ten{X}_{(j)}
+        \t{\big(\ten{F}\times_{k\in[r]\backslash j}\widehat{\mat{\alpha}}_k^{(t-1)}\big)_{(j)}}
+    \left(
+        \big(\ten{F}\times_{k\in[r]\backslash j}\widehat{\mat{\alpha}}_k^{(t-1)}\big)_{(j)}
+        \t{\big(\ten{F}\times_{k\in[r]\backslash j}\widehat{\mat{\alpha}}_k^{(t-1)}\big)_{(j)}}
+    \right)^{-1}
+\end{displaymath}
+for $j = 1, ..., r$ until convergence or a maximum number of iterations is exceeded. The final estimates are the least squares estimates by this procedure.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                            Numerical Examples                            %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Numerical Examples}
+% The first example (which by it self is \emph{not} exemplary) is the estimation with parameters $n = 200$, $p = 11$, $q = 5$, $k = 14$ and $r = 9$. The ``true'' matrices $\mat\alpha$, $\mat\beta$ generated by sampling there elements i.i.d. standard normal like the responses $y$. Then, for each observation, $\mat{f}_y$ is computed as $\sin(s_{i, j} y + o_{i j})$ \todo{ properly describe} to fill the elements of $\mat{f}_y$. Then the $\mat{X}$'s are samples as
+% \begin{displaymath}
+%     \mat{X} = \mat{\beta}\mat{f}_y \t{\mat{\alpha}} + \mat{\epsilon}, \qquad \vec{\mat{\epsilon}} \sim \mathbb{N}_{p q}(\mat{0}, \mat{\Delta})
+% \end{displaymath}
+% where $\mat{\Delta}_{i j} = 0.5^{|i - j|}$ for $i, j = 1, ..., p q$.
+
+\begin{table}[!ht]
+    \centering
+    % see: https://en.wikibooks.org/wiki/LaTeX/Tables
+    \begin{tabular}{ll|r@{ }l *{3}{r@{.}l}}
+          method & init
+            & \multicolumn{2}{c}{loss}
+            & \multicolumn{2}{c}{MSE}
+            & \multicolumn{2}{c}{$\dist(\hat{\mat\alpha}, \mat\alpha)$}
+            & \multicolumn{2}{c}{$\dist(\hat{\mat\beta}, \mat\beta)$}
+            \\ \hline
+          base     &  vlp & -2642&(1594) & 1&82  (2.714) & 0&248 (0.447) & 0&271 (0.458) \\
+          new      &  vlp & -2704&(1452) & 1&78  (2.658) & 0&233 (0.438) & 0&260 (0.448) \\
+          new      &   ls & -3479&  (95) & 0&99  (0.025) & 0&037 (0.017) & 0&035 (0.015) \\
+          momentum &  vlp & -2704&(1452) & 1&78  (2.658) & 0&233 (0.438) & 0&260 (0.448) \\
+          momentum &   ls & -3479&  (95) & 0&99  (0.025) & 0&037 (0.017) & 0&035 (0.015) \\
+          approx   &  vlp &  6819&(1995) & 3&99 (12.256) & 0&267 (0.448) & 0&287 (0.457) \\
+          approx   &   ls &  5457& (163) & 0&99  (0.025) & 0&033 (0.017) & 0&030 (0.012) \\
+    \end{tabular}
+    \caption{Mean (standard deviation) for simulated runs of $20$ repititions for the model $\mat{X} = \mat{\beta}\mat{f}_y\t{\mat{\alpha}}$ of dimensinos $(p, q) = (11, 7)$, $(k, r) = (3, 5)$ with a sample size of $n = 200$. The covariance structure is $\mat{\Delta} = \mat{\Delta}_2\otimes \mat{\Delta}_1$ for $\Delta_i = \text{AR}(\sqrt{0.5})$, $i = 1, 2$. The functions applied to the standard normal response $y$ are $\sin, \cos$ with increasing frequency.}
+\end{table}
+
+% \begin{figure}
+%     \centering
+%     \includegraphics{loss_Ex01.png}
+% \end{figure}
+% \begin{figure}
+%     \centering
+%     \includegraphics{estimates_Ex01.png}
+% \end{figure}
+% \begin{figure}
+%     \centering
+%     \includegraphics{Delta_Ex01.png}
+% \end{figure}
+% \begin{figure}
+%     \centering
+%     \includegraphics{hist_Ex01.png}
+% \end{figure}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                              Bib and Index                               %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\printindex
+\nocite{*}
+\printbibliography
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                                 Appendix                                 %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\appendix
+\section{Matrix Differential Rules}
+Let $\mat A$ be a square matrix (and invertible if needed) and $|.|$ stands for the determinant
+\begin{align*}
+    \d\log\mat A &= \frac{1}{|\mat A|}\d\mat{A} \\
+    \d|\mat A| &= |\mat A|\tr \mat{A}^{-1}\d\mat A \\
+    \d\log|\mat A| &= \tr\mat{A}^{-1}\d\mat A \\
+    \d\mat{X}^{-1} &= -\mat{X}^{-1}(\d\mat{X})\mat{X}^{-1}
+\end{align*}
+
+\section{Useful Matrix Identities}
+In this section we summarize a few useful matrix identities, for more details see for example \cite{MatrixAlgebra-AbadirMagnus2005}.
+
+For two matrices $\mat A$ of dimensions $q\times r$ and $\mat B$ of dimensions $p\times k$ holds
+\begin{equation}\label{eq:vecKron}
+    \vec(\mat A\kron\mat B) = (\mat{I}_r\kron\mat{K}_{k,q}\kron\mat{I}_p)(\vec\mat A\kron\vec\mat B).
+\end{equation}
+
+Let $\mat A$ be a $p\times p$ dimensional non-singular matrix. Furthermore, let $\mat a, \mat b$ be $p$ vectors such that $\t{\mat b}A^{-1}\mat a\neq -1$, then
+\begin{displaymath}
+    (\mat A + \mat a\t{\mat b})^{-1} = \mat{A}^{-1} - \frac{1}{1 + \t{\mat b}A^{-1}\mat a}\mat{A}^{-1}\mat{a}\t{\mat{b}}\mat{A}^{-1}
+\end{displaymath}
+as well as
+\begin{displaymath}
+    \det(\mat A + \mat a\t{\mat b}) = \det(\mat A)(1 + \t{\mat b}{\mat A}^{-1}\mat a)
+\end{displaymath}
+which even holds in the case $\t{\mat b}A^{-1}\mat a = -1$. This is known as Sylvester's determinant theorem.
+
+
+\section{Commutation Matrix and Permutation Identities}
+\begin{center}
+    Note: In this section we use 0-indexing for the sake of simplicity!
+\end{center}
+In this section we summarize relations between the commutation matrix and corresponding permutation. We also list some extensions to ``simplify'' or represent some term. This is mostly intended for implementation purposes and understanding of terms occurring in the computations.
+
+Let $\mat A$ be an arbitrary $p\times q$ matrix. The permutation matrix $\mat K_{p, q}$ satisfies
+\begin{displaymath}
+    \mat{K}_{p, q}\vec{\mat{A}} = \vec{\t{\mat{A}}} \quad\Leftrightarrow\quad (\vec{\mat{A}})_{\pi_{p, q}(i)} = (\vec{\t{\mat{A}}})_{i}, \quad\text{for } i = 0, ..., p q - 1
+\end{displaymath}
+where $\pi_{p, q}$ is a permutation of the indices $i = 0, ..., p q - 1$ such that
+\begin{displaymath}
+    \pi_{p, q}(i + j p) = j + i q, \quad\text{for }i = 0, ..., p - 1; j = 0, ..., q - 1.
+\end{displaymath}
+
+\begin{table}[!htp]
+    \centering
+    \begin{minipage}{0.8\textwidth}
+        \centering
+        \begin{tabular}{l c l}
+            $\mat{K}_{p, q}$                 & $\hat{=}$ & $\pi_{p, q}(i + j p) = j + i q$ \\
+            $\mat{I}_r\kron\mat{K}_{p, q}$ & $\hat{=}$ & $\tilde{\pi}_{p, q, r}(i + j p + k p q) = j + i q + k p q$ \\
+            $\mat{K}_{p, q}\kron\mat{I}_r$ & $\hat{=}$ & $\hat{\pi}_{p, q, r}(i + j p + k p q) = r(j + i q) + k$
+        \end{tabular}
+    \caption{\label{tab:commutation-permutation}Commutation matrix terms and corresponding permutations. Indices are all 0-indexed with the ranges; $i = 0, ..., p - 1$, $j = 0, ..., q - 1$ and $k = 0, ..., r - 1$.}
+    \end{minipage}
+\end{table}
+
+
+
+\section{Matrix and Tensor Operations}
+
+The \emph{Kronecker product}\index{Operations!Kronecker@$\kron$ Kronecker product} is denoted as $\kron$ and the \emph{Hadamard product} uses the symbol $\circ$. We also need the \emph{Khatri-Rao product}\index{Operations!KhatriRao@$\hada$ Khatri-Rao product}
+$\hada$ as well as the \emph{Transposed Khatri-Rao product} $\odot_t$ (or \emph{Face-Splitting product}). There is also the \emph{$n$-mode Tensor Matrix Product}\index{Operations!ttm@$\ttm[n]$ $n$-mode tensor product} denoted by $\ttm[n]$ in conjunction with the \emph{$n$-mode Matricization} of a Tensor $\mat{T}$ written as $\mat{T}_{(n)}$, which is a matrix. See below for definitions and examples of these operations.\todo{ Definitions and Examples}
+
+\todo{ resolve confusion between Khatri-Rao, Column-wise Kronecker / Khatri-Rao, Row-wise Kronecker / Khatri-Rao, Face-Splitting Product, .... Yes, its a mess.}
+\paragraph{Kronecker Product $\kron$:}
+\paragraph{Khatri-Rao Product $\hada$:}
+\paragraph{Transposed Khatri-Rao Product $\odot_t$:} This is also known as the Face-Splitting Product and is the row-wise Kronecker product of two matrices. If relates to the Column-wise Kronecker Product through
+\begin{displaymath}
+    \t{(\mat{A}\odot_{t}\mat{B})} = \t{\mat{A}}\hada\t{\mat{B}}
+\end{displaymath}
+
+\paragraph{$n$-mode unfolding:} \emph{Unfolding}, also known as \emph{flattening} or \emph{matricization}, is an reshaping of a tensor into a matrix with rearrangement of the elements such that mode $n$ corresponds to columns of the result matrix and all other modes are vectorized in the rows. Let $\ten{T}$ be a tensor of order $m$ with dimensions $t_1\times ... \times t_n\times ... \times t_m$ and elements indexed by $(i_1, ..., i_n, ..., i_m)$. The $n$-mode flattening, denoted $\ten{T}_{(n)}$, is defined as a $(t_n, \prod_{k\neq n}t_k)$ matrix with element indices $(i_n, j)$ such that $j = \sum_{k = 1, k\neq n}^m i_k\prod_{l = 1, l\neq n}^{k - 1}t_l$.
+\todo{ give an example!}
+
+\paragraph{$n$-mode Tensor Product $\ttm[n]$:}
+The \emph{$n$-mode tensor product} $\ttm[n]$ between a tensor $\mat{T}$ of order $m$ with dimensions $t_1\times t_2\times ... \times t_n\times ... \times t_m$ and a $p\times t_n$ matrix $\mat{M}$ is defined element-wise as
+\begin{displaymath}
+    (\ten{T}\ttm[n] \mat{M})_{i_1, ..., i_{n-1}, j, i_{n+1}, ..., i_m} = \sum_{k = 1}^{t_n} \ten{T}_{i_1, ..., i_{n-1}, k, i_{n+1}, ..., i_m} \mat{M}_{j, k}
+\end{displaymath}
+where $i_1, ..., i_{n-1}, i_{n+1}, ..., i_m$ run from $1$ to $t_1, ..., t_{n-1}, t_{n+1}, ..., t_m$, respectively. Furthermore, the $n$-th fiber index $j$ of the product ranges from $1$ to $p$. This gives a new tensor $\mat{T}\ttm[n]\mat{M}$ of order $m$ with dimensions $t_1\times t_2\times ... \times p\times ... \times t_m$.
+
+\begin{example}[Matrix Multiplication Analogs]
+    Let $\mat{A}$, $\mat{B}$ be two matrices with dimensions $t_1\times t_2$ and $p\times q$, respectively. Then $\mat{A}$ is also a tensor of order $2$, now the $1$-mode and $2$-mode products are element wise given by
+    \begin{align*}
+        (\mat{A}\ttm[1] \mat{B})_{i,j} &= \sum_{l = 1}^{t_1} \mat{A}_{l,j}\mat{B}_{i,l}
+            = (\mat{B}\mat{A})_{i,j}
+                                        & \text{for }t_1 = q, \\
+        (\mat{A}\ttm[2] \mat{B})_{i,j} &= \sum_{l = 1}^{t_2} \mat{A}_{i,l}\mat{B}_{j,l}
+            = (\mat{A}\t{\mat{B}})_{i,j} = \t{(\mat{B}\t{\mat{A}})}_{i,j}
+                                        & \text{for }t_2 = q.
+    \end{align*}
+    In other words, the $1$-mode product equals $\mat{A}\ttm[1] \mat{B} = \mat{B}\mat{A}$ and the $2$-mode is $\mat{A}\ttm[2] \mat{B} = \t{(\mat{B}\t{\mat{A}})}$ in the case of the tensor $\mat{A}$ being a matrix.
+\end{example}
+
+\begin{example}[Order Three Analogs]
+    Let $\mat{A}$ be a tensor of the form $t_1\times t_2\times t_3$ and $\mat{B}$ a matrix of dimensions $p\times q$, then the $n$-mode products have the following look
+    \begin{align*}
+        (\mat{A}\ttm[1]\mat{B})_{i,j,k} &= \sum_{l = 1}^{t_1} \mat{A}_{l,j,k}\mat{B}_{i,l} & \text{for }t_1 = q, \\
+        (\mat{A}\ttm[2]\mat{B})_{i,j,k} &= \sum_{l = 1}^{t_2} \mat{A}_{i,l,k}\mat{B}_{j,l} \equiv (\mat{B}\mat{A}_{i,:,:})_{j,k} & \text{for }t_2 = q, \\
+        (\mat{A}\ttm[3]\mat{B})_{i,j,k} &= \sum_{l = 1}^{t_3} \mat{A}_{i,j,l}\mat{B}_{k,l} \equiv \t{(\mat{B}\t{\mat{A}_{i,:,:}})}_{j,k} & \text{for }t_3 = q.
+    \end{align*}
+\end{example}
+
+Letting $\ten{F}$ be the $3$-tensor of dimensions $n\times k\times r$ such that $\ten{F}_{i,:,:} = \mat{f}_{y_i}$, then
+\begin{displaymath}
+    \mat{\beta}\mat{f}_{y_i}\t{\mat{\alpha}} = (\ten{F}\ttm[2]\mat{\beta}\ttm[3]\mat{\alpha})_{i,:,:}
+\end{displaymath}
+or in other words, the $i$-th slice of the tensor product $\ten{F}\ttm[2]\mat{\beta}\ttm[3]\mat{\alpha}$ contains $\mat{\beta}\mat{f}_{y_i}\t{\mat{\alpha}}$ for $i = 1, ..., n$.
+Another analog way of writing this is
+\begin{displaymath}
+    (\ten{F}\ttm[2]\mat{\beta}\ttm[3]\mat{\alpha})_{(1)} = \mathbb{F}_{y}(\t{\mat{\alpha}}\kron\t{\mat{\beta}})
+\end{displaymath}
+
+\section{Equivalencies}
+In this section we give a short summary of alternative but equivalent operations.
+Using the notation $\widehat{=}$ to indicate that two expressions are identical in the sense that they contain the same element in the same order but may have different dimensions. Meaning, when vectorizing ether side of $\widehat{=}$, they are equal ($\mat{A}\widehat{=}\mat{B}\ :\Leftrightarrow\ \vec{\mat{A}} = \vec{\mat{B}}$).
+
+Therefore, we use $\mat{A}, \mat{B}, \mat{X}, \mat{F}, \mat{R}, ...$ for matrices. 3-Tensors are written as $\ten{A}, \ten{B}, \ten{T}, \ten{X}, \ten{F}, \ten{R}, ...$.
+
+\begin{align*}
+    \ten{T}\ttm[3]\mat{A}\ &{\widehat{=}}\ \mat{T}\t{\mat A} & \ten{T}(n, p, q)\ \widehat{=}\ \mat{T}(n p, q), \mat{A}(p, q) \\
+    \ten{T}\ttm[2]\mat{B}\ &{\widehat{=}}\ \mat{B}\ten{T}_{(2)} & \ten{T}(n, p, q), \ten{T}_{(2)}(p, n q), \mat{B}(q, p)
+\end{align*}
+
+% \section{Matrix Valued Normal Distribution}
+% A random variable $\mat{X}$ of dimensions $p\times q$ is \emph{Matrix-Valued Normal Distribution}, denoted 
+% \begin{displaymath}
+%     \mat{X}\sim\mathcal{MN}_{p\times q}(\mat{\mu}, \mat{\Delta}_2, \mat{\Delta}_1),
+% \end{displaymath}
+% if and only if $\vec\mat{X}\sim\mathcal{N}_{p q}(\vec\mat\mu, \mat\Delta_1\otimes\mat\Delta_2)$. Note the order of the covariance matrices $\mat\Delta_1, \mat\Delta_2$. Its density is given by
+% \begin{displaymath}
+%     f(\mat{X}) = \frac{1}{(2\pi)^{p q / 2}|\mat\Delta_1|^{p / 2}|\mat\Delta_2|^{q / 2}}\exp\left(-\frac{1}{2}\tr(\mat\Delta_1^{-1}\t{(\mat X - \mat \mu)}\mat\Delta_2^{-1}(\mat X - \mat \mu))\right).
+% \end{displaymath}
+
+% \section{Sampling form a Multi-Array Normal Distribution}
+% Let $\ten{X}$ be an order (rank) $r$ Multi-Array random variable of dimensions $p_1\times...\times p_r$ following a Multi-Array (or Tensor) Normal distributed
+% \begin{displaymath}
+%     \ten{X}\sim\mathcal{TN}(\mu, \mat{\Delta}_1, ..., \mat{\Delta}_r).
+% \end{displaymath}
+% Its density is given by
+% \begin{displaymath}
+%     f(\ten{X}) = \Big( \prod_{i = 1}^r \sqrt{(2\pi)^{p_i}|\mat{\Delta}_i|^{q_i}} \Big)^{-1}
+%         \exp\!\left( -\frac{1}{2}\langle \ten{X} - \mu, (\ten{X} - \mu)\times\{\mat{\Delta}_1^{-1}, ..., \mat{\Delta}_r^{-1}\} \rangle \right)
+% \end{displaymath}
+% with $q_i = \prod_{j \neq i}p_j$. This is equivalent to the vectorized $\vec\ten{X}$ following a Multi-Variate Normal distribution
+% \begin{displaymath}
+%     \vec{\ten{X}}\sim\mathcal{N}_{p}(\vec{\mu}, \mat{\Delta}_r\otimes...\otimes\mat{\Delta}_1)
+% \end{displaymath}
+% with $p = \prod_{i = 1}^r p_i$.
+
+% \todo{Check this!!!}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%                           Reference Summaries                            %%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\section{Reference Summaries}
+This section contains short summaries of the main references with each sub-section concerning one paper.
+
+\subsection{}
+
+\subsection{Generalized Tensor Decomposition With Features on Multiple Modes}
+The \cite{TensorDecomp-HuLeeWang2022} paper proposes a multi-linear conditional mean model for a constraint rank tensor decomposition. Let the responses $\ten{Y}\in\mathbb{R}^{d_1\times ... \times\d_K}$ be an order $K$ tensor. Associated with each mode $k\in[K]$ they assume feature matrices $\mat{X}_k\in\mathbb{R}^{d_k\times p_k}$. Now, they assume that conditional on the feature matrices $\mat{X}_k$ the entries of the tensor $\ten{Y}$ are independent realizations. The rank constraint is specified through $\mat{r} = (r_1, ..., r_K)$, then the model is given by
+\begin{displaymath}
+    \E(\ten{Y} | \mat{X}_1, ..., \mat{X}_K) = f(\ten{C}\times\{ \mat{X}_1\mat{M}_1, ..., \mat{X}_K\mat{M}_K \}),\qquad \t{\mat{M}_k}\mat{M}_k = \mat{I}_{r_k}\ \forall k\in[K].
+\end{displaymath}
+The order $K$ tensor $\ten{C}\in\mathbb{R}^{r_1\times...\times r_K}$ is an unknown full-rank core tensor and the matrices $\mat{M}_k\in\mathbb{R}^{p_k\times r_k}$ are unknown factor matrices. The function $f$ is applied element wise and serves as the link function based on the assumed distribution family of the tensor entries. Finally, the operation $\times$ denotes the tensor-by-matrix product using a short hand
+\begin{displaymath}
+    \ten{C}\times\{ \mat{X}_1\mat{M}_1, ..., \mat{X}_K\mat{M}_K \}
+        = \ten{C}\ttm[1]\mat{X}_1\mat{M}_1\ ...\ttm[K]\mat{X}_K\mat{M}_K
+\end{displaymath}
+with $\ttm[k]$ denoting the $k$-mode tensor matrix product.
+
+The algorithm for estimation of $\ten{C}$ and $\mat{M}_1, ..., \mat{M}_K$ assumes the individual conditional entries of $\ten{Y}$ to be independent and to follow a generalized linear model with link function $f$. The proposed algorithm is an iterative algorithm for minimizing the negative log-likelihood
+\begin{displaymath}
+    l(\ten{C}, \mat{M}_1, ..., \mat{M}_K) = \langle \ten{Y}, \Theta \rangle - \sum_{i_1, ..., i_K} b(\Theta_{i_1, ..., i_K}), \qquad \Theta = \ten{C}\times\{ \mat{X}_1\mat{M}_1, ..., \mat{X}_K\mat{M}_K \}
+\end{displaymath}
+where $b = f'$ it the derivative of the canonical link function $f$ in the generalized linear model the conditioned entries of $\ten{Y}$ follow. The algorithm utilizes the higher-order SVD (HOSVD) to enforce the rank-constraint.
+
+The main benefit is that this approach generalizes well to a multitude of different structured data sets.
+
+\todo{ how does this relate to the $\mat{X} = \mat{\mu} + \mat{\beta}\mat{f}_y\t{\mat{\alpha}} + \mat{\epsilon}$ model.}
+
+\end{document}
--- a/eeg_analysis/eeg_analysis_poi_step1.R
+++ b/eeg_analysis/eeg_analysis_poi_step1.R
@ -1,66 +0,0 @@
-# Source Code.                                    # Loaded functions.
-source('../tensor_predictors/poi.R')              # POI
-
-# Load C implentation of 'FastPOI-C' subroutine.
-# Required for using 'use.C = TRUE' in the POI method.
-# Compiled via.
-# $ cd ../tensor_predictors/
-# $ R CMD SHLIB poi.c
-dyn.load('../tensor_predictors/poi.so')
-# dyn.load('../tensor_predictors/poi.dll') # On Windows
-# In this case 'use.C = TRUE' is required cause the R implementation is not
-# sufficient due to memory exhaustion (and runtime).
-
-# Load Dataset.
-# > dataset <- read.table(file = 'egg.extracted.means.txt', header = TRUE,
-# >                       stringsAsFactors = FALSE, check.names = FALSE)
-# Save as Rdata file for faster loading.
-# > saveRDS(dataset, file = 'eeg_data.rds')
-dataset <- readRDS('../data_analysis/eeg_data.rds')
-
-# Positive and negative case index.
-set.seed(42)
-zero <- sample(which(dataset$Case_Control == 0))
-one  <- sample(which(dataset$Case_Control == 1))
-
-# 10-fold test groups.
-zero <- list(zero[ 1: 4], zero[ 5: 8], zero[ 9:12], zero[13:16],
-             zero[17:20], zero[21:25], zero[26:30],
-             zero[31:35], zero[36:40], zero[41:45])
-
-one <- list(one[ 1: 8], one[ 9:16], one[17:24], one[25:32],
-            one[33:40], one[41:48], one[49:56],
-            one[57:63], one[64:70], one[71:77])
-
-# Iterate data folds.
-folds <- vector('list', 10)
-for (i in seq_along(folds)) {
-    cat('\r%d/%d ', i, length(folds))
-
-    # Call garbage collector.
-    gc()
-
-    # Formulate PFC-GEP for EEG data.
-    index <- c(zero[[i]], one[[i]])
-    X  <- scale(dataset[-index, -(1:2)], scale = FALSE, center = TRUE)
-    Fy <- scale(dataset$Case_Control[-index], scale = FALSE, center = TRUE)
-    B <- crossprod(X) / nrow(X)                     # Sigma
-    P_Fy <- Fy %*% solve(crossprod(Fy), t(Fy))
-    A <- crossprod(X, P_Fy %*% X) / nrow(X)         # Sigma_fit
-
-    # Before Starting POI on (very big GEP) call the garbage collector.
-    gc()
-    poi <- POI(A, B, 1L, lambda = lambda, use.C = TRUE)
-    rm(A, B)
-    gc()
-
-    # Set fold index.
-    poi$index = index
-
-    folds[[i]] <- poi
-}
-cat('\n')
-
-# Save complete 10 fold results.
-file <- sprintf('eeg_analysis_poi.rds')
-saveRDS(folds, file = file)
--- a/eeg_analysis/eeg_analysis_poi_step2.R
+++ b/eeg_analysis/eeg_analysis_poi_step2.R
@ -1,140 +0,0 @@
-suppressPackageStartupMessages({
-    library(pROC)
-})
-
-source('../tensor_predictors/approx_kronecker.R')
-source('../tensor_predictors/multi_assign.R')
-# Load EEG dataset
-dataset <- readRDS('eeg_data.rds')
-# Load EEG k-fold simulation results.
-folds <- readRDS('eeg_analysis_poi.rds')
-# Set dimenional parameters.
-p <- 64L    # nr. of predictors (count of sensorce)
-t <- 256L   # nr. of time points (measurements)
-
-labels      <- vector('list', length(folds))
-predictions <- vector('list', length(folds))
-alphas <- matrix(0, length(folds), t)
-betas  <- matrix(0, length(folds), p)
-# For each fold.
-for (i in seq_along(folds)) {
-    fold <- folds[[i]]
-    # Factorize POI result in alpha, beta.
-    c(alpha, beta) %<-% approx.kronecker(fold$Q, c(t, 1), c(p, 1))
-    # Drop small values of alpha, beta.
-    alpha[abs(alpha) < 1e-6] <- 0
-    beta[abs(beta) < 1e-6] <- 0
-    # Reconstruct B from factorization.
-    B <- kronecker(alpha, beta)
-    # Select folds train/test sets.
-    X_train <- as.matrix(dataset[-fold$index, -(1:2)])
-    y_train <- as.factor(dataset[-fold$index, 'Case_Control'])
-    X_test <- as.matrix(dataset[fold$index, -(1:2)])
-    y_test <- as.factor(dataset[fold$index, 'Case_Control'])
-    # Predict via a logit model building on the reduced data.
-    model <- glm(y ~ x, family = binomial(link = "logit"),
-                 data = data.frame(x = X_train %*% B, y = y_train))
-    y_hat <- predict(model, data.frame(x = X_test %*% B), type = "response")
-    # Set target and prediction values for the ROC curve.
-    labels[[i]]      <- y_test
-    predictions[[i]] <- y_hat
-    alphas[i, ] <- as.vector(alpha)
-    betas[i, ]  <- as.vector(beta)
-}
-
-# acc: Accuracy. P(Yhat = Y). Estimated as: (TP+TN)/(P+N).
-acc <- function(y_true, y_pred) mean(round(y_pred) == y_true)
-# err: Error rate. P(Yhat != Y). Estimated as: (FP+FN)/(P+N).
-err <- function(y_true, y_pred) mean(round(y_pred) != y_true)
-# fpr: False positive rate. P(Yhat = + | Y = -). aliases: Fallout.
-fpr <- function(y_true, y_pred) mean((round(y_pred) == 1)[y_true == 0])
-# tpr: True positive rate.  P(Yhat = + | Y = +). aliases: Sensitivity, Recall.
-tpr <- function(y_true, y_pred) mean((round(y_pred) == 1)[y_true == 1])
-# fnr: False negative rate. P(Yhat = - | Y = +). aliases: Miss.
-fnr <- function(y_true, y_pred) mean((round(y_pred) == 0)[y_true == 1])
-# tnr: True negative rate.  P(Yhat = - | Y = -).
-tnr <- function(y_true, y_pred) mean((round(y_pred) == 0)[y_true == 0])
-
-# Combined accuracy, error, ...
-cat("acc: ", acc(unlist(labels), unlist(predictions)), "\n",
-    "err: ", err(unlist(labels), unlist(predictions)), "\n",
-    "fpr: ", fpr(unlist(labels), unlist(predictions)), "\n",
-    "tpr: ", tpr(unlist(labels), unlist(predictions)), "\n",
-    "fnr: ", fnr(unlist(labels), unlist(predictions)), "\n",
-    "tnr: ", tnr(unlist(labels), unlist(predictions)), "\n",
-    "auc: ", roc(unlist(labels), unlist(predictions), quiet = TRUE)$auc, "\n",
-    sep = '')
-# Confidence interval for AUC.
-ci(roc(unlist(labels), unlist(predictions), quiet = TRUE))
-
-# Means of per fold accuracy, error, ...
-cat("acc: ", mean(mapply(acc, labels, predictions)), "\n",
-    "err: ", mean(mapply(err, labels, predictions)), "\n",
-    "fpr: ", mean(mapply(fpr, labels, predictions)), "\n",
-    "tpr: ", mean(mapply(tpr, labels, predictions)), "\n",
-    "fnr: ", mean(mapply(fnr, labels, predictions)), "\n",
-    "tnr: ", mean(mapply(tnr, labels, predictions)), "\n",
-    "auc: ", mean(mapply(function(...) roc(...)$auc, labels, predictions,
-        MoreArgs = list(direction = '<', quiet = TRUE))), "\n",
-    sep = '')
-# Means of per fold CI.
-rowMeans(mapply(function(...) ci(roc(...)), labels, predictions,
-    MoreArgs = list(direction = '<', quiet = TRUE)))
-sd(mapply(function(...) roc(...)$auc, labels, predictions,
-        MoreArgs = list(direction = '<', quiet = TRUE)))
-
-################################################################################
-###                                   plot                                   ###
-################################################################################
-multiplot <- function(..., plotlist = NULL, cols) {
-    library(grid)
-    # Make a list from the ... arguments and plotlist
-    plots <- c(list(...), plotlist)
-    numPlots = length(plots)
-    # Make the panel
-    plotCols = cols
-    # Number of rows needed, calculated from cols
-    plotRows = ceiling(numPlots / plotCols)
-    # Set up the page
-    grid.newpage()
-    pushViewport(viewport(layout = grid.layout(plotRows, plotCols)))
-    vplayout <- function(x, y) {
-        viewport(layout.pos.row = x, layout.pos.col = y)
-    }
-    # Make each plot, in the correct location
-    for (i in 1:numPlots) {
-        curRow = ceiling(i / plotCols)
-        curCol = (i - 1) %% plotCols + 1
-        print(plots[[i]], vp = vplayout(curRow, curCol))
-    }
-}
-
-pa <- ggplot(data.frame(time = rep(1:ncol(alphas), 2),
-                        means = c(colMeans(abs(alphas)), .5 * colMeans(!alphas)),
-                        type = factor(rep(c(0, 1), each = ncol(alphas)),
-                                      labels = c('mean', 'dropped'))),
-             aes(x = time, y = means, fill = type)) +
-    geom_col(position = 'dodge') +
-    labs(title = 'Components of alpha', x = 'time', y = 'means') +
-    coord_cartesian(ylim = c(0, 0.5)) +
-    scale_y_continuous(sec.axis = sec_axis(trans = ~ . * 2,
-                                           name = 'dropped',
-                                           labels = scales::percent)) +
-    theme(legend.position = 'top',
-          legend.title = element_blank())
-
-pb <- ggplot(data.frame(time = rep(1:ncol(betas), 2),
-                        means = c(colMeans(abs(betas)), .5 * colMeans(!betas)),
-                        type = factor(rep(c(0, 1), each = ncol(betas)),
-                                      labels = c('mean', 'dropped'))),
-             aes(x = time, y = means, fill = type)) +
-    geom_col(position = 'dodge') +
-    labs(title = 'Components of beta', x = 'sensors', y = 'means') +
-    coord_cartesian(ylim = c(0, 0.5)) +
-    scale_y_continuous(sec.axis = sec_axis(trans = ~ . * 2,
-                                           name = 'dropped',
-                                           labels = scales::percent)) +
-    theme(legend.position = 'top',
-          legend.title = element_blank())
-
-multiplot(pa, pb, cols = 1)
--- a/eeg_analysis/eeg_analysis_reduced.R
+++ b/eeg_analysis/eeg_analysis_reduced.R
@ -1,142 +0,0 @@
-suppressPackageStartupMessages({
-    library(pROC)
-})
-
-source('../tensor_predictors/approx_kronecker.R')
-source('../tensor_predictors/multi_assign.R')
-source('../tensor_predictors/tensor_predictors.R')
-source('../tensor_predictors/lsir.R')
-source('../tensor_predictors/pca2d.R')
-
-# acc: Accuracy. P(Yhat = Y). Estimated as: (TP+TN)/(P+N).
-acc <- function(y_true, y_pred) mean(round(y_pred) == y_true)
-# err: Error rate. P(Yhat != Y). Estimated as: (FP+FN)/(P+N).
-err <- function(y_true, y_pred) mean(round(y_pred) != y_true)
-# fpr: False positive rate. P(Yhat = + | Y = -). aliases: Fallout.
-fpr <- function(y_true, y_pred) mean((round(y_pred) == 1)[y_true == 0])
-# tpr: True positive rate.  P(Yhat = + | Y = +). aliases: Sensitivity, Recall.
-tpr <- function(y_true, y_pred) mean((round(y_pred) == 1)[y_true == 1])
-# fnr: False negative rate. P(Yhat = - | Y = +). aliases: Miss.
-fnr <- function(y_true, y_pred) mean((round(y_pred) == 0)[y_true == 1])
-# tnr: True negative rate.  P(Yhat = - | Y = -).
-tnr <- function(y_true, y_pred) mean((round(y_pred) == 0)[y_true == 0])
-
-# Load EEG dataset
-dataset <- readRDS('eeg_data.rds')
-
-#' @param ppc Number of "p"redictor "p"rincipal "c"omponents.
-#' @param tpc Number of "t"ime "p"rincipal "c"omponents.
-egg_analysis_reduced <- function(methods, ppc, tpc) {
-    # Set dimenional parameters.
-    n <- nrow(dataset)  # sample size (nr. of people)
-    p <- 64L            # nr. of predictors (count of sensorce)
-    t <- 256L           # nr. of time points (measurements)
-
-    # Extract dimension names from X.
-    nNames <- dataset$PersonID
-    tNames <- as.character(seq(t))
-    pNames <- unlist(strsplit(colnames(dataset)[2 + t * seq(p)], '_'))[c(T, F)]
-
-    # Split into X-y.
-    X <- as.matrix(dataset[, -(1:2)])
-    y <- dataset$Case_Control
-    # Reshape X as 3D tenros of shape (n, t, p) aka. samples, timesteps, predictors.
-    # (Each of the n rows in X iterate over the time bevore switching sensorce.)
-    X <- array(X, dim = c(n, t, p),
-                dimnames = list(nNames, tNames, pNames))
-    # Reorder axis to (p, t, n) = (predictors, timesteps, samples).
-    X <- aperm(X, c(3, 2, 1))
-
-    # Compute Mean of X.
-    X_mean <- apply(X, c(1, 2), mean)
-    X_center <- X - as.vector(X_mean)
-
-    # Compute "left" and "right" cov-matrices.
-    Sigma_t <- matrix(apply(apply(X_center, 3,  crossprod), 1, mean), t, t)
-    Sigma_p <- matrix(apply(apply(X_center, 3, tcrossprod), 1, mean), p, p)
-    # Get "left", "right" principal components.
-    V_p <- svd(Sigma_p, ppc, 0L)$u
-    V_t <- svd(Sigma_t, tpc, 0L)$u
-
-    # Reduce dimension.
-    X_reduced <- apply(X_center, 3, function(x) crossprod(V_p, x %*% V_t))
-    dim(X_reduced) <- c(ppc, tpc, n)
-
-    # Vectorize to shape of (predictors * timesteps, samples) and transpose to
-    # (samples, predictors * timesteps).
-    X_vec <- t(matrix(X_reduced, ppc * tpc, n))
-
-    loo.cv <- expand.grid(method = names(methods), fold = 1:n)
-    loo.cv$y_true <- y[loo.cv$fold]
-    loo.cv$y_pred <- NA
-
-    # Performe LOO cross-validation for each method.
-    for (i in 1L:n) {
-        # Print progress.
-        cat(sprintf("\rCross-Validation (p-PC: %d, t-PC: %d): %4d/%d",
-                    ppc, tpc, i, n))
-        # Leave Out the i-th element.
-        X_train <- X_vec[-i, ]
-        X_test <- X_vec[i, ]
-        y_train <- y[-i]
-        # Center y.
-        y_train <- scale(y_train, center = TRUE, scale = FALSE)
-
-        # For each method.
-        for (method.name in names(methods)) {
-            method <- methods[[method.name]]
-            # Compute reduction using current method under common API.
-            sdr <- method(X_train, y_train, ppc, tpc)
-            B <- kronecker(sdr$alpha, sdr$beta)
-            # Fit a linear model (which ensures a common sdr direction if possible).
-            model <- glm(y ~ x, family = binomial(link = "logit"),
-                        data = data.frame(y = y[-i], x = X_train %*% B))
-            # Predict out of sample and store in LOO CV data.frame.
-            y_pred <- predict(model, data.frame(x = X_test %*% B), type = "response")
-            loo.cv[loo.cv$method == method.name & loo.cv$fold == i, 'y_pred'] <- y_pred
-        }
-    }
-
-    for (method.name in names(methods)) {
-        labels <- loo.cv[loo.cv$method == method.name, 'y_true']
-        predictions <- loo.cv[loo.cv$method == method.name, 'y_pred']
-        ROC <- roc(unlist(labels), unlist(predictions), quiet = TRUE)
-        # Combined accuracy, error, ...
-        cat("\nMethod: ", method.name, "\n",
-            "acc: ", acc(unlist(labels), unlist(predictions)), "\n",
-            "err: ", err(unlist(labels), unlist(predictions)), "\n",
-            "fpr: ", fpr(unlist(labels), unlist(predictions)), "\n",
-            "tpr: ", tpr(unlist(labels), unlist(predictions)), "\n",
-            "fnr: ", fnr(unlist(labels), unlist(predictions)), "\n",
-            "tnr: ", tnr(unlist(labels), unlist(predictions)), "\n",
-            "auc: ", ROC$auc, "\n",
-            "auc sd: ", sqrt(var(ROC)), "\n",
-            sep = '')
-    }
-
-    loo.cv
-}
-
-methods <- list(
-    KPIR_LS  = function(...) tensor_predictor(..., method = "KPIR_LS"),
-    KPIR_MLE = function(...) tensor_predictor(..., method = "KPIR_MLE"),
-    KPFC1    = function(...) tensor_predictor(..., method = "KPFC1"),
-    KPFC2    = function(...) tensor_predictor(..., method = "KPFC2"),
-    LSIR     = LSIR
-)
-
-#                 ppc, tpc
-#              ------------
-params <- list( c(  4,   3)
-              , c( 15,  15)
-              , c( 30,  20)
-)
-
-for (param in params) {
-    c(ppc, tpc) %<-% param
-    sim <- egg_analysis_reduced(methods, ppc, tpc)
-
-    attr(sim, 'param') <- c(ppc = ppc, tpc = tpc)
-
-    saveRDS(sim, file = sprintf('eeg_analysis_reduced_%d_%d.rds', ppc, tpc))
-}
--- a/examples/example_cross_validation.R
+++ b/examples/example_cross_validation.R
@ -1,53 +0,0 @@
-# # Generate Sample Data.
-# n <- 250
-# # see: simulation_binary.R
-# data <- simulateData.binary(n / 2, n / 2, (p <- 10), (t <- 5), 0.3, 0.3)
-# X <- data$X
-# colnames(X) <- paste('X[', outer(1:p, 1:t, paste, sep = ','), ']', sep = '')
-# Y <- 2 * data$Y
-# write.csv(data.frame(X, Y), file = 'example_data.csv', row.names = FALSE)
-
-suppressPackageStartupMessages({
-    library(pROC)
-})
-
-source('../tensor_predictors/tensor_predictors.R')
-
-# Read sample data from file and split into predictors and responces.
-data <- read.csv('example_data.csv')
-X <- as.matrix(data[, names(data) != 'Y'])
-Y <- as.matrix(data[, 'Y'])
-
-# Set parameters (and check)
-n <- nrow(X)
-p <- 10
-t <- 5
-stopifnot(p * t == ncol(X))
-
-# Setup 10-fold (folds contains indices of the test set).
-folds <- split(sample.int(n), (seq(0, n - 1) * 10) %/% n)
-labels      <- vector('list', 10) # True test values (per fold)
-predictions <- vector('list', 10) # Predictions on test set.
-
-for (i in seq_along(folds)) {
-    fold <- folds[[i]]
-    # Split data into train and test sets.
-    X.train <- X[-fold, ]
-    Y.train <- Y[-fold, , drop = FALSE]
-    X.test <- X[fold, ]
-    Y.test <- Y[fold, , drop = FALSE]
-
-    # Compute reduction (method = c('KPIR_LS' ,'KPIR_MLE', 'KPFC1', 'KPFC2', 'KPFC3'))
-    # or LSIR(X.train, Y.train, p, t) in 'lsir.R'.
-    dr <- tensor_predictor(X.train, Y.train, p, t, method = 'KPIR_LS')
-    B <- kronecker(dr$alpha, dr$beta) # Also available: Gamma_1, Gamma_2, Gamma, B.
-    # Predict via a logit model building on the reduced data.
-    model <- glm(y ~ x, family = binomial(link = "logit"),
-                 data = data.frame(x = X.train %*% B, y = as.integer(Y.train > 0)))
-
-    labels[[i]] <- as.integer(Y.test > 0)
-    predictions[[i]] <- predict(model, data.frame(x = X.test %*% B), type = "response")
-}
-
-(meanAUC <- mean(mapply(function(...) roc(...)$auc, labels, predictions,
-                        MoreArgs = list(direction = '<', quiet = TRUE))))
--- a/examples/example_loo.R
+++ b/examples/example_loo.R
@ -1,56 +0,0 @@
-# # Generate Sample Data.
-# n <- 250
-# # see: simulation_binary.R
-# data <- simulateData.binary(n / 2, n / 2, (p <- 10), (t <- 5), 0.3, 0.3)
-# X <- data$X
-# colnames(X) <- paste('X[', outer(1:p, 1:t, paste, sep = ','), ']', sep = '')
-# Y <- 2 * data$Y
-# write.csv(data.frame(X, Y), file = 'example_data.csv', row.names = FALSE)
-
-suppressPackageStartupMessages({
-    library(pROC)
-})
-
-source('../tensor_predictors/tensor_predictors.R')
-
-# Read sample data from file and split into predictors and responces.
-data <- read.csv('example_data.csv')
-X <- as.matrix(data[, names(data) != 'Y'])
-Y <- as.matrix(data[, 'Y'])
-
-# Set parameters (and check)
-n <- nrow(X)
-p <- 10
-t <- 5
-stopifnot(p * t == ncol(X))
-
-# Setup folds (folds contains indices of the test set).
-nr.folds <- n # leave-one-out when number of folds equals the sample size `n`.
-folds <- split(sample.int(n), (seq(0, n - 1) * nr.folds) %/% n)
-labels      <- vector('list', nr.folds) # True test values (per fold)
-predictions <- vector('list', nr.folds) # Predictions on test set.
-
-for (i in seq_along(folds)) {
-    fold <- folds[[i]]
-    # Split data into train and test sets.
-    X.train <- X[-fold, ]
-    Y.train <- Y[-fold, , drop = FALSE]
-    X.test <- X[fold, ]
-    Y.test <- Y[fold, , drop = FALSE]
-
-    # Compute reduction (method = c('KPIR_LS' ,'KPIR_MLE', 'KPFC1', 'KPFC2', 'KPFC3'))
-    # or LSIR(X.train, Y.train, p, t) in 'lsir.R'.
-    dr <- tensor_predictor(X.train, Y.train, p, t, method = 'KPIR_LS')
-    B <- kronecker(dr$alpha, dr$beta) # Also available: Gamma_1, Gamma_2, Gamma, B.
-    # Predict via a logit model building on the reduced data.
-    model <- glm(y ~ x, family = binomial(link = "logit"),
-                 data = data.frame(x = X.train %*% B, y = as.integer(Y.train > 0)))
-
-    labels[[i]] <- as.integer(Y.test > 0)
-    predictions[[i]] <- predict(model, data.frame(x = X.test %*% B), type = "response")
-}
-
-# Compute classic ROC for predicted samples (mean AUC makes no sense for leave-one-out)
-y.true <- unlist(labels)
-y.pred <- unlist(predictions)
-roc(y.true, y.pred)
--- a/examples/poi_example.R
+++ b/examples/poi_example.R
@ -1,37 +0,0 @@
-# implementation contains fallback if the package is not available but for this
-# case required!
-library(RSpectra)
-
-# Load POI function and compiled C subroutine.
-source('../tensor_predictors/poi.R')
-dyn.load('../tensor_predictors/poi.so') # "Shared Object" of POI-Subrountine
-
-# Load data from sent data file (last Email)
-dataset <- readRDS('../eeg_analysis/eeg_data.rds')
-
-maxit <- 400L # Upper bound for number of optimization iterations.
-
-for (i in 1:nrow(dataset)) {
-    gc() # To be on the save side, call the garbage collector (free memory)
-
-    # Formulate PFC-GEP (Principal Fitted Components - Generalized Eigenvalue
-    # Problem) for EEG data.
-    X  <- scale(dataset[-i, -(1:2)], scale = FALSE, center = TRUE)
-    Fy <- scale(dataset$Case_Control[-i], scale = FALSE, center = TRUE)
-    B <- crossprod(X) / nrow(X)                     # Sigma
-    P_Fy <- Fy %*% solve(crossprod(Fy), t(Fy))
-    A <- crossprod(X, P_Fy %*% X) / nrow(X)         # Sigma_fit
-
-    # Call POI using C subroutine (requires "dyn.load" of subroutine)
-    poi_res <- POI(A, B, 1L, maxit = maxit, use.C = TRUE)
-    # Again, be nice to memory and delete with an explicit fall to gc.
-    rm(A, B)
-    gc()
-
-    # Store results, do analysis, ... (addapt to needs) .
-    poi_res$maxit = maxit
-    poi_res$loo_index = i # Keep track of LOO position.
-
-    # Save i-th LOO result to file for analysis/validation/visualization/...
-    saveRDS(poi_res, file = sprintf('eeg_poi_loo_%d.rds', i))
-}
--- a/sim/ising.R
+++ b/sim/ising.R
@ -4,6 +4,7 @@ library(mvbernoulli)
 set.seed(161803399, "Mersenne-Twister", "Inversion", "Rejection")

 ### simulation configuration
+file.prefix <- "sim-ising"
 reps <- 100                     # number of simulation replications
 max.iter <- 100                 # maximum number of iterations for GMLM
 sample.sizes <- c(100, 200, 300, 500, 750)  # sample sizes `n`
@ -134,7 +135,7 @@ globalCallingHandlers(list(
 start <- format(Sys.time(), "%Y%m%dT%H%M")
 for (n in sample.sizes) {
    ### write new simulation result file
-    file <- paste0(paste("sim-ising", start, n, sep = "-"), ".csv")
+    file <- paste0(paste(file.prefix, start, n, sep = "-"), ".csv")
    # CSV header, used to ensure correct value/column mapping when writing to file
    header <- outer(
        c("dist.subspace", "dist.projection", "error.pred"),  # measures
@ -162,8 +163,8 @@ for (n in sample.sizes) {
        fit.tsir <- NA # TSIR(X, y, q, sample.axis = sample.axis)

        ### Compute Reductions `B.*` where `B.*` spans the reduction subspaces
-        B.true <- Reduce(`%x%`, rev(Map(`%*%`, Omegas, alphas)))
-        B.gmlm <- with(fit.gmlm, Reduce(`%x%`, rev(Map(`%*%`, Omegas, alphas))))
+        B.true <- Reduce(`%x%`, rev(alphas))
+        B.gmlm <- with(fit.gmlm, Reduce(`%x%`, rev(alphas)))
        B.hopca <- Reduce(`%x%`, rev(fit.hopca))
        B.pca <- fit.pca$rotation
        B.tsir <- NA # Reduce(`%x%`, rev(fit.tsir))
--- a/sim/ising_small.R
+++ b/sim/ising_small.R
@ -0,0 +1,207 @@
+library(tensorPredictors)
+library(mvbernoulli)
+
+# seed = first 8 digits Euler's constant gamma = 0.57721 56649 01532 86060
+set.seed(57721566, "Mersenne-Twister", "Inversion", "Rejection")
+
+### simulation configuration
+file.prefix <- "sim-ising-small"
+reps <- 100                     # number of simulation replications
+max.iter <- 1000                # maximum number of iterations for GMLM
+sample.sizes <- c(100, 200, 300, 500, 750)  # sample sizes `n`
+N <- 2000                       # validation set size
+p <- c(2, 3)                    # preditor dimensions
+q <- c(1, 1)                    # response dimensions
+r <- length(p)
+# parameter configuration
+rho <- -0.55
+c1 <- 1
+c2 <- 1
+
+# initial consistency checks
+stopifnot(exprs = {
+    r == 2
+    length(p) == r
+    all(q == 1)
+})
+
+### small helpers
+# 270 deg matrix layout rotation (90 deg clockwise)
+rot270 <- function(A) t(A)[, rev(seq_len(nrow(A))), drop = FALSE]
+# Auto-Regression Covariance Matrix
+AR <- function(rho, dim) rho^abs(outer(seq_len(dim), seq_len(dim), `-`))
+# Inverse of the AR matrix
+AR.inv <- function(rho, dim) {
+    A <- diag(c(1, rep(rho^2 + 1, dim - 2), 1))
+    A[abs(.row(dim(A)) - .col(dim(A))) == 1] <- -rho
+    A / (1 - rho^2)
+}
+# projection matrix `P_A` as a projection onto the span of `A`
+proj <- function(A) tcrossprod(A, A %*% solve(crossprod(A, A)))
+
+### setup Ising parameters (to get reasonable parameters)
+eta1 <- 0
+alphas <- Map(function(pj, qj) {
+    data <- linspace <- seq(-1, 1, len = pj)
+    for (k in (seq_len(qj - 1) + 1)) {
+        data <- c(data, linspace^k)
+    }
+    matrix(data, nrow = pj)
+}, p, q)
+Omegas <- Map(AR, dim = p, MoreArgs = list(rho))
+
+# data sampling routine
+sample.data <- function(n, eta1, alphas, Omegas, sample.axis = r + 1L) {
+    # generate response (sample axis is last axis)
+    y <- runif(n, -1, 1)                    # Y ~ U[-1, 1]
+    Fy <- array(sin(pi * y), dim = c(q, n))
+
+    # natural exponential family parameters
+    eta_y1 <- c1 * (mlm(Fy, alphas) + c(eta1))
+    eta_y2 <- c2 * Reduce(`%x%`, rev(Omegas))
+
+    # conditional Ising model parameters
+    theta_y <- matrix(rep(vech(eta_y2), n), ncol = n)
+    ltri <- which(lower.tri(eta_y2, diag = TRUE))
+    diagonal <- which(diag(TRUE, nrow(eta_y2))[ltri])
+    theta_y[diagonal, ] <- eta_y1
+
+    # Sample X from conditional distribution
+    X <- apply(theta_y, 2, ising_sample, n = 1)
+    # convert (from compressed integer vector) to array data
+    attr(X, "p") <- prod(p)
+    X <- t(as.mvbmatrix(X))
+    dim(X) <- c(p, n)
+    storage.mode(X) <- "double"
+
+    # permute axis to requested get the sample axis
+    if (sample.axis != r + 1L) {
+        perm <- integer(r + 1L)
+        perm[sample.axis] <- r + 1L
+        perm[-sample.axis] <- seq_len(r)
+        X <- aperm(X, perm)
+        Fy <- aperm(Fy, perm)
+    }
+
+    list(X = X, Fy = Fy, y = y, sample.axis = sample.axis)
+}
+
+### Logging Errors and Warnings
+# Register a global warning and error handler for logging warnings/errors with
+# current simulation repetition session informatin allowing to reproduce problems
+exceptionLogger <- function(ex) {
+    # retrieve current simulation repetition information
+    rep.info <- get("rep.info", envir = .GlobalEnv)
+    # setup an error log file with the same name as `file`
+    log <- paste0(rep.info$file, ".log")
+    # Write (append) condition message with reproduction info to the log
+    cat("\n\n------------------------------------------------------------\n",
+        sprintf("file <- \"%s\"\nn <- %d\nrep <- %d\n.Random.seed <- c(%s)\n%s\nTraceback:\n",
+            rep.info$file, rep.info$n, rep.info$rep,
+            paste(rep.info$.Random.seed, collapse = ","),
+            as.character.error(ex)
+    ), sep = "", file = log, append = TRUE)
+    # add Traceback (see: `traceback()` which the following is addapted from)
+    n <- length(x <- .traceback(NULL, max.lines = -1L))
+    if (n == 0L) {
+        cat("No traceback available", "\n", file = log, append = TRUE)
+    } else {
+        for (i in 1L:n) {
+            xi <- x[[i]]
+            label <- paste0(n - i + 1L, ": ")
+            m <- length(xi)
+            srcloc <- if (!is.null(srcref <- attr(xi, "srcref"))) {
+                srcfile <- attr(srcref, "srcfile")
+                paste0(" at ", basename(srcfile$filename), "#", srcref[1L])
+            }
+            if (isTRUE(attr(xi, "truncated"))) {
+                xi <- c(xi, " ...")
+                m <- length(xi)
+            }
+            if (!is.null(srcloc)) {
+                xi[m] <- paste0(xi[m], srcloc)
+            }
+            if (m > 1) {
+                label <- c(label, rep(substr("          ", 1L,
+                  nchar(label, type = "w")), m - 1L))
+            }
+            cat(paste0(label, xi), sep = "\n", file = log, append = TRUE)
+        }
+    }
+}
+globalCallingHandlers(list(
+    message = exceptionLogger, warning = exceptionLogger, error = exceptionLogger
+))
+
+
+### for every sample size
+start <- format(Sys.time(), "%Y%m%dT%H%M")
+for (n in sample.sizes) {
+    ### write new simulation result file
+    file <- paste0(paste(file.prefix, start, n, sep = "-"), ".csv")
+    # CSV header, used to ensure correct value/column mapping when writing to file
+    header <- outer(
+        c("dist.subspace", "dist.projection", "error.pred"),  # measures
+        c("gmlm", "pca", "hopca", "tsir"),                    # methods
+        paste, sep = ".")
+    cat(paste0(header, collapse = ","), "\n", sep = "", file = file)
+
+    ### repeated simulation
+    for (rep in seq_len(reps)) {
+        ### Repetition session state info
+        # Stores specific session variables before starting the current
+        # simulation replication. This allows to log state information which
+        # can be used to replicate a specific simulation repetition in case of
+        # errors/warnings from the logs
+        rep.info <- list(n = n, rep = rep, file = file, .Random.seed = .Random.seed)
+
+        ### sample (training) data
+        c(X, Fy, y, sample.axis) %<-% sample.data(n, eta1, alphas, Omegas)
+
+        ### Fit data using different methods
+        fit.gmlm <- GMLM.default(X, Fy, sample.axis = sample.axis,
+            max.iter = max.iter, family = "ising")
+        fit.hopca <- HOPCA(X, npc = q, sample.axis = sample.axis)
+        fit.pca <- prcomp(mat(X, sample.axis), rank. = prod(q))
+        fit.tsir <- TSIR(X, y, q, sample.axis = sample.axis)
+
+        ### Compute Reductions `B.*` where `B.*` spans the reduction subspaces
+        B.true <- Reduce(`%x%`, rev(alphas))
+        B.gmlm <- with(fit.gmlm, Reduce(`%x%`, rev(alphas)))
+        B.hopca <- Reduce(`%x%`, rev(fit.hopca))
+        B.pca <- fit.pca$rotation
+        B.tsir <- Reduce(`%x%`, rev(fit.tsir))
+
+        # Subspace Distances: Normalized `|| P_A - P_B ||_F` where
+        #   `P_A = A (A' A)^-1/2 A'` and the normalization means that with
+        #   respect to the dimensions of `A, B` the subspace distance is in the
+        #   range `[0, 1]`.
+        dist.subspace.gmlm  <- dist.subspace(B.true, B.gmlm,  normalize = TRUE)
+        dist.subspace.hopca <- dist.subspace(B.true, B.hopca, normalize = TRUE)
+        dist.subspace.pca   <- dist.subspace(B.true, B.pca,   normalize = TRUE)
+        dist.subspace.tsir  <- dist.subspace(B.true, B.tsir,  normalize = TRUE)
+
+        # Projection Distances: Spectral norm (2-norm) `|| P_A - P_B ||_2`.
+        dist.projection.gmlm  <- dist.projection(B.true, B.gmlm)
+        dist.projection.hopca <- dist.projection(B.true, B.hopca)
+        dist.projection.pca   <- dist.projection(B.true, B.pca)
+        dist.projection.tsir  <- dist.projection(B.true, B.tsir)
+
+        ### Prediction Errors: (using new independend sample of size `N`)
+        c(X, Fy, y, sample.axis) %<-% sample.data(N, eta1, alphas, Omegas)
+        # centered model matrix of vectorized `X`s
+        vecX <- scale(mat(X, sample.axis), center = TRUE, scale = FALSE)
+        P.true <- proj(B.true)
+        error.pred.gmlm  <- norm(P.true - proj(B.gmlm),  "2")
+        error.pred.hopca <- norm(P.true - proj(B.hopca), "2")
+        error.pred.pca   <- norm(P.true - proj(B.pca),   "2")
+        error.pred.tsir  <- norm(P.true - proj(B.tsir),  "2")
+
+        # format estimation/prediction errors and write to file and console
+        line <- paste0(Map(get, header), collapse = ",")
+        cat(line, "\n", sep = "", file = file, append = TRUE)
+        # report progress
+        cat(sprintf("sample size: %d/%d - rep: %d/%d\n",
+            which(n == sample.sizes), length(sample.sizes), rep, reps))
+    }
+}
--- a/sim/ising_small_2.R
+++ b/sim/ising_small_2.R
@ -0,0 +1,131 @@
+library(tensorPredictors)
+library(mvbernoulli)
+
+# seed = leaf node count of a full chess search tree of depth 6 from the start pos
+# > position startpos
+# > go perft 6
+set.seed(119060324, "Mersenne-Twister", "Inversion", "Rejection")
+
+### simulation configuration
+reps <- 100                     # number of simulation replications
+max.iter <- 1000                # maximum number of iterations for GMLM
+n <- 100                        # sample sizes `n`
+N <- 2000                       # validation set size
+p <- c(2, 3)                    # preditor dimensions
+q <- c(1, 1)                    # response dimensions
+r <- length(p)
+# parameter configuration
+rho <- -0.55
+c1 <- 1
+c2 <- 1
+
+# initial consistency checks
+stopifnot(exprs = {
+    r == 2
+    length(p) == r
+    all(q == 1)
+})
+
+### small helpers
+# 270 deg matrix layout rotation (90 deg clockwise)
+rot270 <- function(A) t(A)[, rev(seq_len(nrow(A))), drop = FALSE]
+# Auto-Regression Covariance Matrix
+AR <- function(rho, dim) rho^abs(outer(seq_len(dim), seq_len(dim), `-`))
+# Inverse of the AR matrix
+AR.inv <- function(rho, dim) {
+    A <- diag(c(1, rep(rho^2 + 1, dim - 2), 1))
+    A[abs(.row(dim(A)) - .col(dim(A))) == 1] <- -rho
+    A / (1 - rho^2)
+}
+# projection matrix `P_A` as a projection onto the span of `A`
+proj <- function(A) tcrossprod(A, A %*% solve(crossprod(A, A)))
+
+### setup Ising parameters (to get reasonable parameters)
+eta1 <- 0
+alphas <- Map(function(pj, qj) {
+    data <- linspace <- seq(-1, 1, len = pj)
+    for (k in (seq_len(qj - 1) + 1)) {
+        data <- c(data, linspace^k)
+    }
+    matrix(data, nrow = pj)
+}, p, q)
+Omegas <- Map(AR, dim = p, MoreArgs = list(rho))
+
+# data sampling routine
+sample.data <- function(n, eta1, alphas, Omegas, sample.axis = r + 1L) {
+    # generate response (sample axis is last axis)
+    y <- runif(n, -1, 1)                    # Y ~ U[-1, 1]
+    Fy <- array(sin(pi * y), dim = c(q, n))
+
+    # natural exponential family parameters
+    eta_y1 <- c1 * (mlm(Fy, alphas) + c(eta1))
+    eta_y2 <- c2 * Reduce(`%x%`, rev(Omegas))
+
+    # conditional Ising model parameters
+    theta_y <- matrix(rep(vech(eta_y2), n), ncol = n)
+    ltri <- which(lower.tri(eta_y2, diag = TRUE))
+    diagonal <- which(diag(TRUE, nrow(eta_y2))[ltri])
+    theta_y[diagonal, ] <- eta_y1
+
+    # Sample X from conditional distribution
+    X <- apply(theta_y, 2, ising_sample, n = 1)
+    # convert (from compressed integer vector) to array data
+    attr(X, "p") <- prod(p)
+    X <- t(as.mvbmatrix(X))
+    dim(X) <- c(p, n)
+    storage.mode(X) <- "double"
+
+    # permute axis to requested get the sample axis
+    if (sample.axis != r + 1L) {
+        perm <- integer(r + 1L)
+        perm[sample.axis] <- r + 1L
+        perm[-sample.axis] <- seq_len(r)
+        X <- aperm(X, perm)
+        Fy <- aperm(Fy, perm)
+    }
+
+    list(X = X, Fy = Fy, y = y, sample.axis = sample.axis)
+}
+
+
+# logger to log iterative change in the estimation process of GMLM
+# log <- data.frame()
+log.likelihood <- tensorPredictors:::make.gmlm.family("ising")$log.likelihood
+B.true <- Reduce(`%x%`, rev(alphas))
+logger <- function(iter, eta1.est, alphas.est, Omegas.est) {
+    B.est <- Reduce(`%x%`, rev(alphas.est))
+
+    err.alphas <- mapply(dist.subspace, alphas, alphas.est, MoreArgs = list(normalize = TRUE))
+    err.Omegas <- mapply(norm, Map(`-`, Omegas, Omegas.est), MoreArgs = list(type = "F"))
+
+    if (iter > 0) { cat("\033[9A") }
+    cat(sprintf("\n\033[2mIter:     loss -     dist\n\033[0m%4d: %8.3f - %8.3f",
+            iter,
+            log.likelihood(X, Fy, eta1.est, alphas.est, Omegas.est),
+            dist.subspace(B.true, B.est, normalize = TRUE)
+        ),
+        "\033[2mMSE eta1\033[0m",
+        mean((eta1 - eta1.est)^2),
+        "\033[2msubspace distances of alphas\033[0m",
+        do.call(paste, Map(sprintf, err.alphas, MoreArgs = list(fmt = "%8.3f"))),
+        "\033[2mFrob. norm of Omega differences\033[0m",
+        do.call(paste, Map(sprintf, err.Omegas, MoreArgs = list(fmt = "%8.3f"))),
+        sep = "\n      "
+    )
+}
+
+### sample (training) data
+c(X, Fy, y, sample.axis) %<-% sample.data(n, eta1, alphas, Omegas)
+
+# now call the GMLM fitting routine with performance profiling
+tryCatch({
+    system.time(   # profvis::profvis(
+        fit.gmlm <- GMLM.default(
+            X, Fy, sample.axis = sample.axis, max.iter = max.iter,
+            family = "ising", logger = logger
+        )
+    )
+}, error = function(ex) {
+    print(ex)
+    traceback()
+})
--- a/sim/normal.R
+++ b/sim/normal.R
@ -3,6 +3,7 @@ library(tensorPredictors)
 set.seed(314159265, "Mersenne-Twister", "Inversion", "Rejection")

 ### simulation configuration
+file.prefix <- "sim-normal"
 reps <- 100                     # number of simulation replications
 max.iter <- 10000               # maximum number of iterations for GMLM
 sample.sizes <- c(100, 200, 300, 500, 750)  # sample sizes `n`
@ -102,7 +103,7 @@ globalCallingHandlers(list(
 start <- format(Sys.time(), "%Y%m%dT%H%M")
 for (n in sample.sizes) {
    ### write new simulation result file
-    file <- paste0(paste("sim-normal", start, n, sep = "-"), ".csv")
+    file <- paste0(paste(file.prefix, start, n, sep = "-"), ".csv")
    # CSV header, used to ensure correct value/column mapping when writing to file
    header <- outer(
        c("dist.subspace", "dist.projection", "error.pred"),  # measures
@ -129,8 +130,8 @@ for (n in sample.sizes) {
        fit.tsir <- TSIR(X, y, q, sample.axis = sample.axis)

        ### Compute Reductions `B.*` where `B.*` spans the reduction subspaces
-        B.true <- Reduce(`%x%`, rev(Map(`%*%`, Omegas, alphas)))
-        B.gmlm <- with(fit.gmlm, Reduce(`%x%`, rev(Map(`%*%`, Omegas, alphas))))
+        B.true <- Reduce(`%x%`, rev(alphas))
+        B.gmlm <- with(fit.gmlm, Reduce(`%x%`, rev(alphas)))
        B.hopca <- Reduce(`%x%`, rev(fit.hopca))
        B.pca <- fit.pca$rotation
        B.tsir <- Reduce(`%x%`, rev(fit.tsir))
--- a/sim/plots.R
+++ b/sim/plots.R
@ -3,14 +3,15 @@ if (!endsWith(getwd(), "/sim")) {
    setwd("sim")
 }

-date <- "20221007"  # yyyymmdd, to match all "[0-9]{6}"
+file.prefix <- "sim-ising-small"
+date <- "20221012"  # yyyymmdd, to match all "[0-9]{6}"
 time <- "[0-9]{4}"  # HHMM,     to match all "[0-9]{4}"
 sim <- Reduce(rbind, Map(function(path) {
    df <- read.csv(path)
-    df$n <- as.integer(strsplit(path, "[-.]")[[1]][[4]])
+    df$n <- as.integer(tail(head(strsplit(path, "[-.]")[[1]], -1), 1))
    df
 }, list.files(".", pattern = paste0(
-    "^sim-normal-", date, "T", time, "-[0-9]+[.]csv$", collapse = ""
+    "^", file.prefix, "-", date, "T", time, "-[0-9]+[.]csv$", collapse = ""
 ))))

 stats <- aggregate(. ~ n, sim, mean)
@ -21,26 +22,45 @@ colors <- c(gmlm = "#247407", hopca = "#2a62b6", pca = "#a11414", tsir = "#9313b
 line.width <- 1.75
 margins <- c(5.1, 4.1, 4.1, 0.1)

+
+layout(mat = matrix(c(
+    1, 2,
+    3, 3
+), 2, 2, byrow = TRUE),
+    widths = c(1, 1),
+    heights = c(8, 1), respect = FALSE)
+# layout.show(3)
+
 with(stats, {
    par(mar = margins)
    plot(range(n), c(0, 1.05),
-        type = "n", bty = "n", main = "Estimation Error",
+        type = "n", bty = "n", main = "Subspace Distance",
        xlab = "Sample Size", ylab = "Error")
-    lines(n, dist.projection.gmlm,  col = colors["gmlm"],  lwd = line.width)
-    lines(n, dist.projection.hopca, col = colors["hopca"], lwd = line.width)
-    lines(n, dist.projection.pca,   col = colors["pca"],   lwd = line.width)
-    lines(n, dist.projection.tsir,  col = colors["tsir"],  lwd = line.width)
+    lines(n, dist.subspace.gmlm,  col = colors["gmlm"],  lwd = line.width)
+    lines(n, dist.subspace.hopca, col = colors["hopca"], lwd = line.width)
+    lines(n, dist.subspace.pca,   col = colors["pca"],   lwd = line.width)
+    lines(n, dist.subspace.tsir,  col = colors["tsir"],  lwd = line.width)

-    par(mar = rep(0, 4))
-    legend("topright", legend = names(colors), col = colors, lwd = line.width,
-        lty = 1, bty = "n")
-    par(mar = margins)
+    xn <- c(q75$n, rev(q25$n))
+    polygon(x = xn, y = c(q75$dist.subspace.gmlm, rev(q25$dist.subspace.gmlm)),
+        col = adjustcolor(colors["gmlm"], alpha.f = 0.3), border = NA)
+    polygon(x = xn, y = c(q75$dist.subspace.hopca, rev(q25$dist.subspace.hopca)),
+        col = adjustcolor(colors["hopca"], alpha.f = 0.3), border = NA)
+    polygon(x = xn, y = c(q75$dist.subspace.pca, rev(q25$dist.subspace.pca)),
+        col = adjustcolor(colors["pca"], alpha.f = 0.3), border = NA)
+    polygon(x = xn, y = c(q75$dist.subspace.tsir, rev(q25$dist.subspace.tsir)),
+        col = adjustcolor(colors["tsir"], alpha.f = 0.3), border = NA)
+
+    # par(mar = rep(0, 4))
+    # legend("topright", legend = names(colors), col = colors, lwd = line.width,
+    #     lty = 1, bty = "n")
+    # par(mar = margins)
 })

 with(stats, {
    par(mar = margins)
    plot(range(n), c(0, 1.05),
-        type = "n", bty = "n", main = "Root Mean Squared Prediction Error",
+        type = "n", bty = "n", main = "RMSE (Prediction Error)",
        xlab = "Sample Size", ylab = "Error")
    xn <- c(q75$n, rev(q25$n))
    polygon(x = xn, y = c(q75$error.pred.gmlm, rev(q25$error.pred.gmlm)),
@ -49,13 +69,20 @@ with(stats, {
        col = adjustcolor(colors["hopca"], alpha.f = 0.3), border = NA)
    polygon(x = xn, y = c(q75$error.pred.pca, rev(q25$error.pred.pca)),
        col = adjustcolor(colors["pca"], alpha.f = 0.3), border = NA)
+    polygon(x = xn, y = c(q75$error.pred.tsir, rev(q25$error.pred.tsir)),
+        col = adjustcolor(colors["tsir"], alpha.f = 0.3), border = NA)
    lines(n, error.pred.gmlm,  col = colors["gmlm"],  lwd = line.width)
    lines(n, error.pred.hopca, col = colors["hopca"], lwd = line.width)
    lines(n, error.pred.pca,   col = colors["pca"],   lwd = line.width)
    lines(n, error.pred.tsir,  col = colors["tsir"],  lwd = line.width)

-    par(mar = rep(0, 4))
-    legend("topright", legend = names(colors), col = colors, lwd = line.width,
-        lty = 1, bty = "n")
-    par(mar = margins)
+    # par(mar = rep(0, 4))
+    # legend("topright", legend = names(colors), col = colors, lwd = line.width,
+    #     lty = 1, bty = "n")
+    # par(mar = margins)
 })
+
+par(mar = c(0, 1, 1, 0))
+plot(1:2, 1:2, type = "n", bty = "n", axes = FALSE, xlab = "", ylab = "")
+legend("center", legend = names(colors), col = colors, lwd = line.width,
+    lty = 1, bty = "n", horiz = TRUE)
--- a/simulations/eeg_sim.R
+++ b/simulations/eeg_sim.R
@ -1,321 +0,0 @@
-library(tensorPredictors)
-suppressPackageStartupMessages({
-    library(ggplot2)
-})
-
-################################################################################
-###                             Loading EEG Data                             ###
-################################################################################
-
-# Load as 3D predictors `X` and flat response `y`
-c(X, y) %<-% local({
-    # Load from file
-    ds <- readRDS("eeg_data.rds")
-
-    # Dimension values
-    n <- nrow(ds)       # sample size (nr. of people)
-    p <- 64L            # nr. of predictors (count of sensorce)
-    t <- 256L           # nr. of time points (measurements)
-
-    # Extract dimension names
-    nNames <- ds$PersonID
-    tNames <- as.character(seq(t))
-    pNames <- unlist(strsplit(colnames(ds)[2 + t * seq(p)], "_"))[c(TRUE, FALSE)]
-
-    # Split into predictors (with proper dims and names) and response
-    X <- array(as.matrix(ds[, -(1:2)]),
-        dim = c(person = n, time = t, sensor = p),
-        dimnames = list(person = nNames, time = tNames, sensor = pNames)
-    )
-    y <- ds$Case_Control
-
-    list(X, y)
-})
-
-################################################################################
-###                       LOO-CV for Multiple Methods                        ###
-################################################################################
-
-# compatibility wrapper for function implemented with the "old" API
-toNewAPI <- function(func) {
-    function(...) {
-        res <- func(...)
-        list(alphas = list(res$beta, res$alpha))
-    }
-}
-
-# Number of (2D)^2 PCA components per axis
-npcs <- list(c(3, 4), c(15, 15), c(20, 30), dim(X)[-1])
-
-# setup methods for simulation (with unified API)
-methods <- list(
-    hopca             = list(
-        fun = function(X, Fy) list(alphas = HOPCA(X, npc = c(1L, 1L), 1L)),
-        is.applicable = function(npc) all(npc == c(256L, 64L))  # NOT reduced
-    ),
-    hopir.ls.icu      = list(
-        fun = function(X, Fy) HOPIR(X, Fy, sample.axis = 1L, method = "ls", algorithm = "icu"),
-        is.applicable = function(npc) TRUE
-    ),
-    hopir.mle.icu     = list(
-        fun = function(X, Fy) HOPIR(X, Fy, sample.axis = 1L, method = "mle", algorithm = "icu"),
-        is.applicable = function(npc) TRUE
-    ),
-    hopir.ls.nagd      = list(
-        fun = function(X, Fy) HOPIR(X, Fy, sample.axis = 1L, method = "ls", algorithm = "nagd"),
-        is.applicable = function(npc) TRUE
-    ),
-    hopir.mle.nagd     = list(
-        fun = function(X, Fy) HOPIR(X, Fy, sample.axis = 1L, method = "mle", algorithm = "nagd"),
-        is.applicable = function(npc) TRUE
-    ),
-    kpir.base         = list(
-        fun = toNewAPI(kpir.base),
-        is.applicable = function(npc) prod(npc) < 100
-    ),
-    kpir.new.vlp      = list(
-        fun = toNewAPI(function(X, Fy) kpir.new(X, Fy, init.method = "vlp")),
-        is.applicable = function(npc) prod(npc) < 100
-    ),
-    kpir.new.ls       = list(
-        fun = toNewAPI(function(X, Fy) kpir.new(X, Fy, init.method = "ls")),
-        is.applicable = function(npc) prod(npc) < 100
-    ),
-    kpir.ls           = list(
-        fun = kpir.ls,
-        is.applicable = function(npc) TRUE
-    ),
-    LSIR              = list(
-        fun = function(X, Fy) {
-            res <- LSIR(matrix(X, nrow(X)), Fy, dim(X)[2], dim(X)[3])
-            list(alphas = list(res$beta, res$alpha))
-        },
-        is.applicable = function(npc) TRUE
-    ),
-    kpir.momentum.vlp = list(
-        fun = toNewAPI(function(X, Fy) kpir.momentum(X, Fy, init.method = "vlp")),
-        is.applicable = function(npc) prod(npc) < 100
-    ),
-    kpir.momentum.ls  = list(
-        fun = toNewAPI(function(X, Fy) kpir.momentum(X, Fy, init.method = "ls")),
-        is.applicable = function(npc) prod(npc) < 100
-    ),
-    kpir.approx.vlp   = list(
-        fun = toNewAPI(function(X, Fy) kpir.approx(X, Fy, init.method = "vlp")),
-        is.applicable = function(npc) prod(npc) < 100
-    ),
-    kpir.approx.ls    = list(
-        fun = toNewAPI(function(X, Fy) kpir.approx(X, Fy, init.method = "ls")),
-        is.applicable = function(npc) TRUE
-    )
-)
-
-# define AUC for reporting while simulation is running
-auc <- function(y_true, y_pred) pROC::roc(y_true, y_pred, quiet = TRUE)$auc[1]
-
-# file to dump simulation results
-log.file <- format(Sys.time(), "eeg_sim_%Y%m%dT%H%M.rds")
-
-# init complete simulation as empty
-sim <- NULL
-for (npc in npcs) {
-    # check if any PC count is smaller than the axis
-    if (any(npc < dim(X)[-1])) {
-        # Reduce dimensions using (2D)^2 PCA, which is a special case of the Higher
-        # Order Principal Component Analysis
-        pcs <- HOPCA(X, npc = npc, sample.axis = 1)
-        # Reduce dimensions
-        X.pc <- mlm(X, Map(t, pcs), modes = 2:3)
-    } else {
-        # No reduction
-        X.pc <- X
-    }
-
-    for (name in names(methods)) {
-        # check if method can be applied to current reduction dimensions
-        if (!methods[[name]]$is.applicable(npc)) {
-            next
-        }
-
-        # extract method to be applied
-        method <- methods[[name]]$fun
-
-        # report name of current simulation method
-        cat(sprintf("npc: (t = %d, p = %d), method: %s\n", npc[1], npc[2], name))
-
-        # Leave-One-Out Cross-Validation
-        loo.cv <- data.frame(
-            y_true = y, y_pred = NA,                    # CV responses
-            elapsed = NA, sys.self = NA, user.self = NA # execution time
-        )
-        for (i in seq_len(nrow(X.pc))) {
-            # report progress
-            cat(sprintf("\r%3d/%d", i, nrow(X.pc)))
-
-            # Split into training/test data
-            X.train <- X.pc[-i, , ]
-            y.train <- scale(y[-i], scale = FALSE)
-            X.test <- X.pc[i, , , drop = FALSE]
-            y.test <- scale(y[i], center = attr(y.train, "scaled:center"), scale = FALSE)
-
-            # fit reduction (with method one of the methods to be "validated")
-            time <- system.time(sdr <- method(X.train, c(y.train)))
-
-            # reduce training data and fit a GLM
-            if ("Deltas" %in% names(sdr)) {
-                # the small deltas are `delta_j = Delta_j^-1 alpha_j`
-                deltas <- Map(solve, sdr$Deltas, sdr$alphas)
-            } else {
-                deltas <- sdr$alphas
-            }
-            x.train <- mlm(X.train, Map(t, deltas), modes = 2:3)
-            fit <- glm(y ~ x, family = binomial(link = "logit"),
-                data = data.frame(y = y[-i], x = matrix(x.train, nrow(x.train))))
-
-            # predict from reduced test data
-            x.test <- mlm(X.test, Map(t, deltas), modes = 2:3)
-            y.pred <- predict(fit, data.frame(x = matrix(x.test, 1)), type = "response")
-
-            loo.cv[i, "y_pred"] <- y.pred
-            loo.cv[i, "elapsed"] <- time["elapsed"]
-            loo.cv[i, "sys.self"] <- time["sys.self"]
-            loo.cv[i, "user.self"] <- time["user.self"]
-        }
-
-        # accumulate LOO-CV results to previous results
-        loo.cv$method <- factor(name)
-        loo.cv$npc <- factor(sprintf("(%d, %d)", npc[1], npc[2]))
-        sim <- rbind(sim, loo.cv)
-
-        # Report partial sim done and one of the interesting measures
-        cat(sprintf(" (Done) AUC: %f\n", with(loo.cv, auc(y_true, y_pred))))
-
-        # dump simulation (after each fold) to file
-        saveRDS(sim, log.file)
-    }
-}
-
-################################################################################
-###                             Simulation Stats                             ###
-################################################################################
-# sim <- readRDS("eeg_sim_<date-time>.rds")
-# sim <- readRDS("eeg_sim_20220524T2100.rds")
-# sim <- readRDS("eeg_sim_20220525T1700.rds")
-# sim <- readRDS("eeg_sim_20220628T1222.rds")
-
-metrics <- list(
-    # acc: Accuracy. P(Yhat = Y). Estimated as: (TP+TN)/(P+N).
-    "Acc" = function(y_true, y_pred) mean(round(y_pred) == y_true),
-    # err: Error rate. P(Yhat != Y). Estimated as: (FP+FN)/(P+N).
-    "Err" = function(y_true, y_pred) mean(round(y_pred) != y_true),
-    # fpr: False positive rate. P(Yhat = + | Y = -). aliases: Fallout.
-    "FPR" = function(y_true, y_pred) mean((round(y_pred) == 1)[y_true == 0]),
-    # tpr: True positive rate.  P(Yhat = + | Y = +). aliases: Sensitivity, Recall.
-    "TPR" = function(y_true, y_pred) mean((round(y_pred) == 1)[y_true == 1]),
-    # fnr: False negative rate. P(Yhat = - | Y = +). aliases: Miss.
-    "FNR" = function(y_true, y_pred) mean((round(y_pred) == 0)[y_true == 1]),
-    # tnr: True negative rate.  P(Yhat = - | Y = -).
-    "TNR" = function(y_true, y_pred) mean((round(y_pred) == 0)[y_true == 0]),
-    # auc: Area Under the Curve
-    "AUC" = function(y_true, y_pred) pROC::roc(y_true, y_pred, quiet = TRUE)$auc[1],
-    # auc.sd: Estimated standard error of the AUC estimate
-    "sd(AUC)" = function(y_true, y_pred)
-        sqrt(pROC::var(pROC::roc(y_true, y_pred, quiet = TRUE)))
-)
-
-# Applies metrics on a group
-do.stats <- function(group) {
-    stat <- Map(do.call, metrics, list(as.list(group[c("y_true", "y_pred")])))
-    data.frame(method = group$method[1], npc = group$npc[1], stat, check.names = FALSE)
-}
-# Call stats for each grouping
-stats <- do.call(rbind, Map(do.stats, split(sim, ~ method + npc, sep = " ", drop = TRUE)))
-rownames(stats) <- NULL
-
-print(stats, digits = 2)
-
-# and execution time stats
-times <- aggregate(cbind(elapsed, sys.self, user.self) ~ method + npc, sim, median)
-
-print(times, digits = 2)
-
-## stats: 2022.05.24 + 2022.06.14
-#               method       npc  Acc  Err  FPR  TPR  FNR  TNR  AUC sd(AUC)
-# 1          kpir.base    (3, 4) 0.70 0.30 0.60 0.87 0.13 0.40 0.75   0.047
-# 2       kpir.new.vlp    (3, 4) 0.70 0.30 0.60 0.87 0.13 0.40 0.75   0.047
-# 3        kpir.new.ls    (3, 4) 0.74 0.26 0.51 0.88 0.12 0.49 0.77   0.045
-# 4            kpir.ls    (3, 4) 0.75 0.25 0.49 0.88 0.12 0.51 0.78   0.044
-#          (*) kpir.ls    (3, 4) 0.78 0.22 0.38 0.87 0.13 0.62 0.86   0.034
-#     (*) hopir.ls.icu    (3, 4) 0.80 0.20 0.33 0.87 0.13 0.67 0.85   0.036
-#    (*) hopir.mle.icu    (3, 4) 0.80 0.20 0.33 0.87 0.13 0.67 0.85   0.036
-# 5  kpir.momentum.vlp    (3, 4) 0.70 0.30 0.60 0.87 0.13 0.40 0.75   0.047
-# 6   kpir.momentum.ls    (3, 4) 0.70 0.30 0.58 0.87 0.13 0.42 0.76   0.046
-# 7    kpir.approx.vlp    (3, 4) 0.68 0.32 0.62 0.86 0.14 0.38 0.74   0.048
-# 8     kpir.approx.ls    (3, 4) 0.73 0.27 0.53 0.88 0.12 0.47 0.78   0.044
-#            (**) LSIR    (3, 4) 0.80 0.20 0.36 0.88 0.12 0.64 0.85   0.036
-# 9            kpir.ls  (15, 15) 0.75 0.25 0.47 0.87 0.13 0.53 0.78   0.044
-#          (*) kpir.ls  (15, 15) 0.76 0.24 0.44 0.88 0.12 0.56 0.83   0.039
-# 10    kpir.approx.ls  (15, 15) 0.73 0.27 0.51 0.87 0.13 0.49 0.78   0.044
-# 11           kpir.ls  (20, 30) 0.75 0.25 0.47 0.87 0.13 0.53 0.78   0.044
-#          (*) kpir.ls  (20, 30) 0.77 0.23 0.36 0.84 0.16 0.64 0.79   0.045
-#     (*) hopir.ls.icu  (15, 15) 0.79 0.21 0.38 0.88 0.12 0.62 0.83   0.041
-#    (*) hopir.mle.icu  (15, 15) 0.79 0.21 0.38 0.88 0.12 0.62 0.83   0.041
-#            (**) LSIR  (15, 15) 0.72 0.28 0.44 0.82 0.18 0.56 0.81   0.040
-#     (*) hopir.ls.icu  (20, 30) 0.75 0.25 0.38 0.83 0.17 0.62 0.80   0.045
-#    (*) hopir.mle.icu  (20, 30) 0.75 0.25 0.38 0.83 0.17 0.62 0.80   0.045
-# 12    kpir.approx.ls  (20, 30) 0.63 0.37 1.00 1.00 0.00 0.00 0.51   0.053
-#            (**) LSIR  (20, 30) 0.79 0.21 0.36 0.87 0.13 0.64 0.83   0.038
-# 13           kpir.ls (256, 64) 0.75 0.25 0.44 0.87 0.13 0.56 0.78   0.044
-#          (*) kpir.ls (256, 64) 0.68 0.32 0.51 0.79 0.21 0.49 0.66   0.054
-#     (*) hopir.ls.icu (256, 64) 0.67 0.33 0.53 0.79 0.21 0.47 0.69   0.052
-#    (*) hopir.mle.icu (256, 64) 0.67 0.33 0.53 0.79 0.21 0.47 0.69   0.052
-# 14    kpir.approx.ls (256, 64) 0.75 0.25 0.44 0.87 0.13 0.56 0.78   0.044
-#
-#  (*) Using reduction matrices `Map(solve, sdr$Deltas, sdr$alphas)` instead
-#      of only `sdr$alpha`.
-# (**) LSIR already considured the covariance estinates
-
-#            method       npc  Acc  Err  FPR  TPR  FNR  TNR  AUC sd(AUC)
-# 1    hopir.ls.icu    (3, 4) 0.80 0.20 0.33 0.87 0.13 0.67 0.85   0.036
-# 2   hopir.mle.icu    (3, 4) 0.80 0.20 0.36 0.88 0.12 0.64 0.85   0.036
-# 3   hopir.ls.nagd    (3, 4) 0.80 0.20 0.33 0.87 0.13 0.67 0.85   0.036
-# 4  hopir.mle.nagd    (3, 4) 0.80 0.20 0.33 0.87 0.13 0.67 0.85   0.036
-# 5    hopir.ls.icu  (15, 15) 0.79 0.21 0.38 0.88 0.12 0.62 0.83   0.041
-# 6   hopir.mle.icu  (15, 15) 0.77 0.23 0.40 0.87 0.13 0.60 0.83   0.041
-# 7   hopir.ls.nagd  (15, 15) 0.79 0.21 0.38 0.88 0.12 0.62 0.83   0.041
-# 8  hopir.mle.nagd  (15, 15) 0.76 0.24 0.47 0.90 0.10 0.53 0.81   0.043
-# 9    hopir.ls.icu  (20, 30) 0.75 0.25 0.38 0.83 0.17 0.62 0.80   0.045
-# 10  hopir.mle.icu  (20, 30) 0.75 0.25 0.40 0.83 0.17 0.60 0.83   0.039
-# 11  hopir.ls.nagd  (20, 30) 0.75 0.25 0.38 0.83 0.17 0.62 0.80   0.045
-# 12 hopir.mle.nagd  (20, 30) 0.75 0.25 0.42 0.86 0.14 0.58 0.80   0.044
-# 13   hopir.ls.icu (256, 64) 0.67 0.33 0.53 0.79 0.21 0.47 0.69   0.052
-
-## times: 2022.05.24 + 2022.06.14
-#               method       npc elapsed sys.self user.self
-# 1          kpir.base    (3, 4)   0.079    0.402     0.220
-# 2       kpir.new.vlp    (3, 4)   0.075    0.393     0.217
-# 3        kpir.new.ls    (3, 4)   0.218    0.243     0.305
-# 4            kpir.ls    (3, 4)   0.003    0.006     0.006
-# 5  kpir.momentum.vlp    (3, 4)   0.143    0.595     0.359
-# 6   kpir.momentum.ls    (3, 4)   0.297    0.252     0.385
-#     (*) hopir.ls.icu    (3, 4)   0.004    0.009     0.008
-#    (*) hopir.mle.icu    (3, 4)   0.004    0.008     0.007
-# 7    kpir.approx.vlp    (3, 4)   0.044    0.240     0.152
-# 8     kpir.approx.ls    (3, 4)   0.066    0.144     0.121
-#                 LSIR    (3, 4)   0.003    0.000     0.003
-# 9            kpir.ls  (15, 15)   0.012    0.059     0.034
-#     (*) hopir.ls.icu  (15, 15)   0.018    0.077     0.043
-#    (*) hopir.mle.icu  (15, 15)   0.018    0.084     0.043
-# 10    kpir.approx.ls  (15, 15)   0.813    3.911     2.325
-#                 LSIR  (15, 15)   0.011    0.031     0.024
-#     (*) hopir.ls.icu  (20, 30)   0.037    0.165     0.098
-#    (*) hopir.mle.icu  (20, 30)   0.036    0.163     0.090
-# 11           kpir.ls  (20, 30)   0.028    0.129     0.080
-# 12    kpir.approx.ls  (20, 30)   2.110   10.111     6.290
-#                 LSIR  (20, 30)   0.038    0.119     0.102
-# 13           kpir.ls (256, 64)   1.252    6.215     3.681
-#     (*) hopir.ls.icu (256, 64)   1.120    4.018     2.979
-#    (*) hopir.mle.icu (256, 64)   1.183    4.109     2.974
-# 14    kpir.approx.ls (256, 64)  36.754  141.028   147.490
-#
-#  (*) While in Zoom meeting
--- a/simulations/kpir_sim.R
+++ b/simulations/kpir_sim.R
--- a/simulations/simulation_binary.R
+++ b/simulations/simulation_binary.R
@ -1,166 +0,0 @@
-source('../tensor_predictors/random.R')
-source('../tensor_predictors/multi_assign.R')
-source('../tensor_predictors/tensor_predictors.R')
-source('../tensor_predictors/lsir.R')
-source('../tensor_predictors/pca2d.R')
-
-#' @param n0 number of controls
-#' @param n1 number of cases
-simulateData.binary <- function(n0, n1, p, t, rho.p, rho.t) {
-    # Response vector
-    Y <- c(rep(1, n1), rep(0, n0))
-
-    # Section 7.1.2 of Tensor_Predictors-4.pdf
-    alpha0 <- as.matrix(rep(0, t))
-    alpha1 <- as.matrix(1 / ((t + 1) - 1:t))
-    beta   <- as.matrix(rep(1 / sqrt(p), p))
-    mu0    <- kronecker(alpha0, beta)
-    mu1    <- kronecker(alpha1, beta)
-
-    sigma1 <- rho.t^abs(outer(1:t, 1:t, FUN = `-`))
-    sigma2 <- rho.p^abs(outer(1:p, 1:p, FUN = `-`))
-    sigma  <- kronecker(sigma1, sigma2)
-
-    # Compute Delta
-    # Delta = Sigma + E[vec(X)]E[vec(X)^t] - E{E[vec(X)|Y]E[vec(X)^t|Y]}
-    n      <- n0 + n1
-    muAvg  <- (n0 * mu0 + n1 * mu1) / n
-    mat0   <- mu0 %*% t(mu0)
-    mat1   <- mu1 %*% t(mu1)
-    matAvg <- (n0 * mat0 + n1 * mat1) / n
-    Delta  <- sigma + (muAvg %*% t(muAvg)) - matAvg
-
-    X1     <- rmvnorm(n1, mu1, Delta)
-    X0     <- rmvnorm(n0, mu0, Delta)
-    X      <- rbind(X1, X0)
-
-    # Center data
-    Y <- scale(Y, center = TRUE, scale = FALSE)
-    X <- scale(X, center = TRUE, scale = FALSE)
-
-    alpha <- alpha0 - alpha1
-    Gamma_1 <- alpha / norm(alpha, 'F')
-    Gamma_2 <- beta / norm(beta, 'F')
-    list(Y = Y, X = X,
-         Gamma_1 = Gamma_1, Gamma_2 = Gamma_2,
-         Gamma = kronecker(Gamma_1, Gamma_2),
-         alpha = alpha, beta = beta,
-         Delta = Delta
-    )
-}
-
-
-simulation.binary <- function(methods, reps, n0, n1, p, t, rho.p, rho.t) {
-    nsim <- length(methods) * reps
-    results <- vector('list', nsim)
-    E1      <- vector('list', nsim)
-    E2      <- vector('list', nsim)
-    vec1    <- vector('list', nsim)
-    vec2    <- vector('list', nsim)
-    Phi     <- vector('list', nsim)
-    phi1    <- vector('list', nsim)
-    phi2    <- vector('list', nsim)
-
-    i <- 1
-    for (rep in 1:reps) {
-        set.seed(rep)
-        ds <- simulateData.binary(n0, n1, p, t, rho.p, rho.t)
-        for (method.name in names(methods)) {
-            cat(sprintf('\r%4d/%d in %s', rep, reps, method.name))
-
-            method <- methods[[method.name]]
-            sdr <- method(ds$X, ds$Y, p, t)
-            # Store which silumation is at index i.
-            results[[i]] <- c(method = method.name, rep = rep)
-            # Compute simpulation validation metrics.
-            E1[[i]] <-
-                norm(kronecker(ds$alpha, ds$beta) - kronecker(sdr$alpha, sdr$beta), 'F') /
-                    norm(kronecker(ds$alpha, ds$beta), 'F')
-            E2[[i]]   <- norm(ds$Delta - sdr$Delta, 'F') / norm(ds$Delta, 'F')
-            vec1[[i]] <- as.double(kronecker(sdr$alpha, sdr$beta))
-            vec2[[i]] <- as.double(sdr$Delta)
-            # Subspace distances.
-            if (!('Gamma' %in% names(sdr))) {
-                # Assuming r = k = 1
-                sdr$Gamma_1 <- sdr$alpha / norm(sdr$alpha, 'F')
-                sdr$Gamma_2 <- sdr$beta / norm(sdr$beta, 'F')
-                sdr$Gamma <- kronecker(sdr$Gamma_1, sdr$Gamma_2)
-            }
-            Phi[[i]]  <- norm(tcrossprod(ds$Gamma) - tcrossprod(sdr$Gamma), 'F')
-            phi1[[i]] <- norm(tcrossprod(ds$Gamma_1) - tcrossprod(sdr$Gamma_1), 'F')
-            phi2[[i]] <- norm(tcrossprod(ds$Gamma_2) - tcrossprod(sdr$Gamma_2), 'F')
-            i <- i + 1
-        }
-    }
-    cat('\n')
-
-    # Aggregate per method statistics.
-    statistics <- list()
-    for (method.name in names(methods)) {
-        m <- which(unlist(lapply(results, `[`, 1)) == method.name)
-
-        # Convert list of vec(alpha %x% beta) to a matrix with vec(alpha %x% beta)
-        # in its columns.
-        tmp <- matrix(unlist(vec1[m]), ncol = length(m))
-        V1 <- sum(apply(tmp, 1, var))
-
-        # Convert list of vec(Delta) to a matrix with vec(Delta) in its columns.
-        tmp <- matrix(unlist(vec2[m]), ncol = length(m))
-        V2 <- sum(apply(tmp, 1, var))
-
-        statistics[[method.name]] <- list(
-            mean.E1 = mean(unlist(E1[m])),
-            sd.E1 = sd(unlist(E1[m])),
-            mean.E2 = mean(unlist(E2[m])),
-            sd.E2 = sd(unlist(E2[m])),
-            V1 = V1,
-            V2 = V2,
-            Phi  = mean(unlist(Phi[m])),
-            phi1 = mean(unlist(phi1[m])),
-            phi2 = mean(unlist(phi2[m]))
-        )
-    }
-    # transform the statistics list into a data.frame with row and col names.
-    stat <- t(matrix(unlist(statistics), ncol = length(statistics)))
-    rownames(stat) <- names(statistics)
-    colnames(stat) <- names(statistics[[1]])
-    stat <- as.data.frame(stat)
-    attr(stat, "params") <- c(reps = reps, n0 = n0, n1 = n1, p = p, t = t,
-                              rho.p = rho.p, rho.t = rho.t)
-    return(stat)
-}
-
-methods <- list(
-    KPIR_LS   = function(...) tensor_predictor(..., method = "KPIR_LS"),
-    KPIR_MLE  = function(...) tensor_predictor(..., method = "KPIR_MLE"),
-    KPFC1     = function(...) tensor_predictor(..., method = "KPFC1"),
-    KPFC2     = function(...) tensor_predictor(..., method = "KPFC2"),
-    KPFC3     = function(...) tensor_predictor(..., method = "KPFC3"),
-    LSIR      = function(X, Fy, p, t) LSIR(X, Fy, p, t, k = 1, r = 1),
-    PCA2d     = function(X, y = NULL, p, t, k = 1, r = 1, d1 = 1, d2 = 1) {
-        pca <- PCA2d(X, p, t, k, r)
-        pca$Gamma_1 <- pca$alpha[, 1:d1, drop = FALSE]
-        pca$Gamma_2 <- pca$beta[, 1:d2, drop = FALSE]
-        pca$Gamma <- kronecker(pca$Gamma_1, pca$Gamma_2)
-        pca$Delta <- kronecker(pca$Sigma_t, pca$Sigma_p)
-        return(pca)
-    }
-)
-
-#                   n0,   n1,  p,  t, rho.p, rho.t
-#              -----------------------------------
-params <- list( c( 250,  250, 10,  5,   0.3,   0.3)
-              , c( 500,  500, 10,  5,   0.3,   0.3)
-              , c(1000, 1000, 10,  5,   0.3,   0.3)
-)
-
-for (param in params) {
-    c(n0, n1, p, t, rho.p, rho.t) %<-% param
-    sim <- simulation.binary(methods, 500, n0, n1, p, t, rho.p, rho.t)
-
-    print(attr(sim, "params"))
-    print(round(sim, 2))
-
-    saveRDS(sim, file = sprintf("simulation_3_desc_%d_%d_%d_%d_%f_%f.rds",
-                                n0, n1, p, t, rho.p, rho.t))
-}
--- a/simulations/simulation_cont.R
+++ b/simulations/simulation_cont.R
@ -1,153 +0,0 @@
-source('../tensor_predictors/random.R')
-source('../tensor_predictors/multi_assign.R')
-source('../tensor_predictors/tensor_predictors.R')
-source('../tensor_predictors/lsir.R')
-source('../tensor_predictors/pca2d.R')
-
-simulateData.cont <- function(n, p, t, k, r, d1, d2, delta.identity = FALSE) {
-
-    stopifnot(d1 <= r, d2 <= k)
-
-    y  <- rnorm(n)
-    ns <- r * k / 2
-    Fy <- do.call(cbind, lapply(1:ns, function(s, z) {
-        cbind(cos(s * z), sin(s * z))
-    }, z = 2 * pi * y))
-    Fy <- scale(Fy, scale = FALSE)
-
-    Gamma_1 <- diag(1, t, d1)
-    gamma_1 <- diag(1, d1, r)
-    alpha   <- Gamma_1 %*% gamma_1
-    Gamma_2 <- diag(1, p, d2)
-    gamma_2 <- diag(1, d2, k)
-    beta    <- Gamma_2 %*% gamma_2
-
-    if (delta.identity) {
-        Delta    <- diag(1, p * t, p * t)
-    } else {
-        Delta    <- crossprod(matrix(rnorm((p * t)^2), p * t))
-        DM_Delta <- diag(sqrt(1 / diag(Delta)))
-        Delta    <- DM_Delta %*% Delta %*% DM_Delta
-    }
-
-    X <- tcrossprod(Fy, kronecker(alpha, beta)) + rmvnorm(n, sigma = Delta)
-    X <- scale(X, scale = FALSE)
-
-    return(list(X = X, y = y, Fy = Fy,
-                Gamma = kronecker(Gamma_1, Gamma_2),
-                Gamma_1 = Gamma_1, gamma_1 = gamma_1, alpha = alpha,
-                Gamma_2 = Gamma_2, gamma_2 = gamma_2, beta = beta,
-                Delta = Delta))
-}
-
-simulation.cont <- function(methods, reps, n, p, t, k, r, d1, d2) {
-    nsim <- length(methods) * reps
-    results <- vector('list', nsim)
-    E1      <- vector('list', nsim)
-    E2      <- vector('list', nsim)
-    vec1    <- vector('list', nsim)
-    vec2    <- vector('list', nsim)
-    Phi     <- vector('list', nsim)
-    phi1    <- vector('list', nsim)
-    phi2    <- vector('list', nsim)
-
-    i <- 1
-    for (rep in 1:reps) {
-        set.seed(rep)
-        ds <- simulateData.cont(n, p, t, k, r, d1, d2)
-        for (method.name in names(methods)) {
-            cat(sprintf('\r%4d/%d in %s', rep, reps, method.name))
-
-            method <- methods[[method.name]]
-            sdr <- method(ds$X, ds$Fy, p, t, k, r, d1, d2)
-            # Store which silumation is at index i.
-            results[[i]] <- c(method = method.name, rep = rep)
-            # Compute simpulation validation metrics.
-            E1[[i]] <-
-                norm(kronecker(ds$alpha, ds$beta) - kronecker(sdr$alpha, sdr$beta), 'F') /
-                    norm(kronecker(ds$alpha, ds$beta), 'F')
-            E2[[i]]   <- norm(ds$Delta - sdr$Delta, 'F') / norm(ds$Delta, 'F')
-            vec1[[i]] <- as.double(kronecker(sdr$alpha, sdr$beta))
-            vec2[[i]] <- as.double(sdr$Delta)
-            # Subspace distances.
-            Phi[[i]]  <- norm(tcrossprod(ds$Gamma) - tcrossprod(sdr$Gamma), 'F')
-            phi1[[i]] <- norm(tcrossprod(ds$Gamma_1) - tcrossprod(sdr$Gamma_1), 'F')
-            phi2[[i]] <- norm(tcrossprod(ds$Gamma_2) - tcrossprod(sdr$Gamma_2), 'F')
-            i <- i + 1
-        }
-    }
-    cat('\n')
-
-    # Aggregate per method statistics.
-    statistics <- list()
-    for (method.name in names(methods)) {
-        m <- which(unlist(lapply(results, `[`, 1)) == method.name)
-
-        # Convert list of vec(alpha %x% beta) to a matrix with vec(alpha %x% beta)
-        # in its columns.
-        tmp <- matrix(unlist(vec1[m]), ncol = length(m))
-        V1 <- sum(apply(tmp, 1, var))
-
-        # Convert list of vec(Delta) to a matrix with vec(Delta) in its columns.
-        tmp <- matrix(unlist(vec2[m]), ncol = length(m))
-        V2 <- sum(apply(tmp, 1, var))
-
-        statistics[[method.name]] <- list(
-            mean.E1 = mean(unlist(E1[m])),
-            sd.E1   = sd(unlist(E1[m])),
-            mean.E2 = mean(unlist(E2[m])),
-            sd.E2   = sd(unlist(E2[m])),
-            V1 = V1,
-            V2 = V2,
-            Phi  = mean(unlist(Phi[m])),
-            phi1 = mean(unlist(phi1[m])),
-            phi2 = mean(unlist(phi2[m]))
-        )
-    }
-    # transform the statistics list into a data.frame with row and col names.
-    stat <- t(matrix(unlist(statistics), ncol = length(statistics)))
-    rownames(stat) <- names(statistics)
-    colnames(stat) <- names(statistics[[1]])
-    stat <- as.data.frame(stat)
-    attr(stat, "params") <- c(reps = reps, n = n, p = p, t = t, k = k, r = r,
-                              d1 = d1, d2 = d2)
-    return(stat)
-}
-
-methods <- list(
-    KPIR_LS   = function(...) tensor_predictor(..., method = "KPIR_LS"),
-    KPIR_MLE  = function(...) tensor_predictor(..., method = "KPIR_MLE"),
-    KPFC1     = function(...) tensor_predictor(..., method = "KPFC1"),
-    KPFC2     = function(...) tensor_predictor(..., method = "KPFC2"),
-    KPFC3     = function(...) tensor_predictor(..., method = "KPFC3"),
-    PCA2d     = function(X, y = NULL, p, t, k = 1L, r = 1L, d1 = 1L, d2 = 1L) {
-        pca <- PCA2d(X, p, t, k, r)
-        # Note: alpha, beta are not realy meaningfull for (d1, d2) != (r, k)
-        pca$Gamma_1 <- pca$alpha[, 1:d1, drop = FALSE]
-        pca$Gamma_2 <- pca$beta[, 1:d2, drop = FALSE]
-        pca$Gamma <- kronecker(pca$Gamma_1, pca$Gamma_2)
-        pca$Delta <- kronecker(pca$Sigma_t, pca$Sigma_p)
-        return(pca)
-    }
-)
-
-#                    n,  p,  t,  k,  r, d1, d2
-#              -----------------------------
-params <- list( c( 500, 10,  8,  6,  6,  6,  6)
-              , c( 500, 10,  8,  6,  6,  4,  4)
-              , c( 500, 10,  8,  6,  6,  2,  2)
-              , c(5000, 10,  8,  6,  6,  6,  6)
-              , c(5000, 10,  8,  6,  6,  4,  4)
-              , c(5000, 10,  8,  6,  6,  2,  2)
-)
-
-for (param in params) {
-    c(n, p, t, k, r, d1, d2) %<-% param
-    sim <- simulation.cont(methods, 500, n, p, t, k, r, d1, d2)
-
-    print(attr(sim, "params"))
-    print(round(sim, 2))
-
-    saveRDS(sim, file = sprintf("simulation_cont_%d_%d_%d_%d_%d_%d_%d.rds",
-                                n, p, t, k, r, d1, d2))
-}
--- a/simulations/simulation_poi.R
+++ b/simulations/simulation_poi.R
@ -1,139 +0,0 @@
-################################################################################
-###                          Sparce SIR against SIR                          ###
-################################################################################
-devtools::load_all('tensorPredictors/')
-library(dr)
-
-n <- 100
-p <- 10
-
-X <- rmvnorm(n, sigma = 0.5^abs(outer(1:p, 1:p, `-`)))
-y <- rowSums(X[, 1:3]) + rnorm(n, 0.5)
-B <- as.matrix(1:p <= 3) / sqrt(3)
-
-dr.sir <- dr(y ~ X, method = 'sir', numdir = ncol(B))
-B.sir <- dr.sir$evectors[, seq_len(ncol(B)), drop = FALSE]
-
-dist.projection(B, B.sir)
-
-B.poi <- with(GEP(X, y, 'sir'), {
-    POI(lhs, rhs, ncol(B), method = 'POI-C', use.C = TRUE)$vectors
-})
-
-dist.projection(B, B.poi)
-
-################################################################################
-###                 LDA (sparse Linear Discrimina Analysis)                  ###
-################################################################################
-devtools::load_all('tensorPredictors/')
-
-C <- function(rho, p) {
-    res <- matrix(rho, p, p)
-    diag(res) <- 1
-    res
-}
-R <- function(rho, p) {
-    rho^abs(outer(1:p, 1:p, `-`))
-}
-
-dataset <- function(nr) {
-    K <- 3          # Nr. Groups
-    n.i <- 30       # Sample group size for each of the K groups
-    n <- K * n.i    # Sample size
-    p <- 200        # Nr. of predictors
-
-    # Generate test data
-    V <- cbind(matrix(c(
-        2, 1, 2, 1, 2,
-        1,-1, 1,-1, 1,
-        0, 1,-1, 1, 0
-    ), 3, 5, byrow = TRUE),
-        matrix(0, 3, p - 5)
-    )
-    W <- cbind(matrix(c(
-    -1, 1, 1, 1, 1,
-        1,-1, 1,-1, 1,
-        1, 1,-1, 1, 0
-    ), 3, 5, byrow = TRUE),
-        matrix(0, 3, p - 5)
-    )
-
-    if (nr == 1) {          # Model 1
-        y <- factor(rep(1:K, each = n.i))
-        X <- rmvnorm(n, mu = rep(0, p)) + V[y, ]
-        B <- cbind(V[1, ] - V[2, ], V[2, ] - V[3, ])
-    } else if (nr == 2) {   # Model 2
-        y <- factor(rep(1:K, each = n.i))
-        X <- rmvnorm(n, sigma = C(0.5, p)) + (V %*% C(0.5, p))[y, ]
-        B <- cbind(V[1, ] - V[2, ], V[2, ] - V[3, ])
-    } else if (nr == 3) {   # Model 3
-        y <- factor(rep(1:K, each = n.i))
-        X <- rmvnorm(n, sigma = R(0.5, p)) + (V %*% R(0.5, p))[y, ]
-        B <- cbind(V[1, ] - V[2, ], V[2, ] - V[3, ])
-    } else if (nr == 4) {   # Model 4
-        y <- factor(rep(1:K, each = n.i))
-        X <- rmvnorm(n, sigma = C(0.5, p)) + (W %*% C(0.5, p))[y, ]
-        B <- cbind(W[1, ] - W[2, ], W[2, ] - W[3, ])
-    } else if (nr == 5) {   # Model 5
-        K <- 4
-        n <- K * n.i
-
-        W.tilde <- 2 * rbind(W, colMeans(W))
-        mu.tilde <- W.tilde %*% C(0.5, p)
-
-        y <- factor(rep(1:K, each = n.i))
-        X <- rmvnorm(n, sigma = C(0.5, p)) + mu.tilde[y, ]
-
-        B <- cbind(W[1, ] - W[2, ], W[2, ] - W[3, ])
-    } else {
-        stop("Unknown model nr.")
-    }
-
-    list(X = X, y = y, B = qr.Q(qr(B)))
-}
-
-# Model 1
-fit <- with(dataset(1), {
-    with(GEP(X, y, 'lda'), POI(lhs, rhs, ncol(B), method = 'POI-C'))
-})
-fit <- with(dataset(1), {
-    with(GEP(X, y, 'lda'), POI(lhs, rhs, ncol(B), method = 'FastPOI-C'))
-})
-fit <- with(dataset(1), {
-    with(GEP(X, y, 'lda'), POI(lhs, rhs, ncol(B), method = 'POI-C', use.C = TRUE))
-})
-fit <- with(dataset(1), {
-    with(GEP(X, y, 'lda'), POI(lhs, rhs, ncol(B), method = 'FastPOI-C', use.C = TRUE))
-})
-
-# head(fit$vectors, 10)
-
-count <- 0
-nr.reps <- 100
-sim <- replicate(nr.reps, {
-    res <- double(0)
-    for (model.nr in 1:5) {
-        with(dataset(model.nr), {
-            for (method in c('POI-C', 'FastPOI-C')) {
-                for (use.C in c(FALSE, TRUE)) {
-                    fit <- with(GEP(X, y, 'lda'), {
-                        POI(lhs, rhs, ncol(B), method = 'POI-C', use.C = use.C)
-                    })
-                    dist <- dist.projection(B, fit$vectors)
-                    names(dist) <- paste('M', model.nr, '-', method, '-', use.C)
-                    res <<- c(res, dist)
-                }
-            }
-            fit <- with(GEP(X, y, 'lda'), {
-                solve.gep(lhs, rhs, ncol(B))
-            })
-            dist <- dist.projection(B, fit$vectors)
-            names(dist) <- paste('M', model.nr, '- solve -', use.C)
-            res <<- c(res, dist)
-        })
-    }
-    cat("Counter", (count <<- count + 1), "/", nr.reps, "\n")
-    res
-})
-
-(stats <- as.matrix(rowMeans(sim)))
--- a/simulations/simulation_sparse.R
+++ b/simulations/simulation_sparse.R
@ -1,146 +0,0 @@
-# Source Code.                                    # Loaded functions.
-source('../tensor_predictors/multi_assign.R')     # %<-%
-source('../tensor_predictors/approx_kronecker.R') # approx_kronecker
-source('../tensor_predictors/poi.R')              # POI
-source('../tensor_predictors/subspace.R')         # subspace
-source('../tensor_predictors/random.R')           # rmvnorm
-
-# Load C impleentation of 'FastPOI-C' subroutine.
-# Required for using 'use.C = TRUE' in the POI method.
-dyn.load('../tensor_predictors/poi.so')
-# When 'use.C = FALSE' the POI method uses a base R implementation.
-use.C = TRUE
-
-simulateData.sparse <- function(n, p, t, k, r, scale, degree = 2) {
-    # Define true reduction matrices alpha, beta.
-    alpha <- diag(1, t, r)
-    beta  <- diag(1, p, k)
-
-    # Create true "random" covariance of inverse model.
-    R <- matrix(rnorm((p * t)^2), p * t)        # random square matrix.
-    sigma <- tcrossprod(R / sqrt(rowSums(R^2))) # sym. pos.def. with diag = 1.
-
-    # Sample responces.
-    y <- rnorm(n, 0, 1)
-    # equiv to cbind(y^1, y^2, ..., y^degree)
-    Fy <- t(vapply(y, `^`, double(degree), seq(degree)))
-
-    # Calc X according the inverse regression model.
-    X <- tcrossprod(scale(Fy, scale = FALSE, center = TRUE), kronecker(alpha, beta))
-    X <- X + (scale * rmvnorm(n, sigma = sigma))
-
-    return(list(X = X, y = y, Fy = Fy, alpha = alpha, beta = beta))
-}
-
-# # True Positives Rate
-# tpr <- function(Y, Y_hat) {
-#     sum(as.logical(Y_hat) & as.logical(Y)) / sum(as.logical(Y)) # TP / P
-# }
-# False Positives Rate
-fpr <- function(Y, Y_hat) {
-    sum(as.logical(Y_hat) & !Y) / sum(!Y) # FP / N
-}
-# False Negative Rate
-fnr <- function(Y, Y_hat) {
-    sum(!Y_hat & as.logical(Y)) / sum(as.logical(Y)) # FN / P
-}
-# False Rate (rate of false positives and negatives)
-fr <- function(Y, Y_hat) {
-    sum(as.logical(Y) != as.logical(Y_hat)) / length(Y)
-}
-
-simulation.sparse <- function(scales, reps, n, p, t, k, r,
-                              eps = 100 * .Machine$double.eps) {
-    results <- vector('list', length(scales) * reps)
-
-    i <- 0
-    for (scale in scales) {
-        for (rep in 1:reps) {
-            cat(sprintf('\r%4d/%d for scale = %.2f', rep, reps, scale))
-
-            ds <- simulateData.sparse(n, p, t, k, r, scale)
-            # Formulate PFC-GEP for given dataset.
-            X  <- scale(ds$X,  scale = FALSE, center = TRUE)
-            Fy <- scale(ds$Fy, scale = FALSE, center = TRUE)
-            Sigma <- crossprod(X) / nrow(X)
-            P_Fy <- Fy %*% solve(crossprod(Fy), t(Fy))
-            Sigma_fit <- crossprod(X, P_Fy %*% X) / nrow(X)
-
-            poi <- POI(Sigma_fit, Sigma, k * r, use.C = use.C)
-            # Calc approx. alpha, beta and drop further drop "zero" from konecker
-            # factorization approximation.
-            c(alpha, beta) %<-% approx.kronecker(poi$Q, dim(ds$alpha), dim(ds$beta))
-            alpha[abs(alpha) < eps] <- 0
-            beta[abs(beta) < eps] <- 0
-
-            # Compair estimates against true alpha, beta.
-            result <- list(
-                scale = scale,
-                lambda = poi$lambda,
-                # alpha_tpr = tpr(ds$alpha, alpha),
-                alpha_fpr = fpr(ds$alpha, alpha),
-                alpha_fnr = fnr(ds$alpha, alpha),
-                alpha_fr = fr(ds$alpha, alpha),
-                # beta_tpr = tpr(ds$beta, beta),
-                beta_fpr = fpr(ds$beta, beta),
-                beta_fnr = fnr(ds$beta, beta),
-                beta_fr = fr(ds$beta, beta)
-            )
-            # Component-wise validation (_c_ stands for component)
-            if (ncol(alpha) > 1) {
-                ds_c_alpha <- apply(!!ds$alpha, 1, any)
-                   c_alpha <- apply(!!   alpha, 1, any)
-                # result$alpha_c_tpr <- tpr(ds_c_alpha, c_alpha)
-                result$alpha_c_fpr <- fpr(ds_c_alpha, c_alpha)
-                result$alpha_c_fnr <- fnr(ds_c_alpha, c_alpha)
-                result$alpha_c_fr <- fr(ds_c_alpha, c_alpha)
-            }
-            if (ncol(beta) > 1) {
-                ds_c_beta <- apply(!!ds$beta, 1, any)
-                   c_beta <- apply(!!   beta, 1, any)
-                # result$beta_c_tpr <- tpr(ds_c_beta, c_beta)
-                result$beta_c_fpr <- fpr(ds_c_beta, c_beta)
-                result$beta_c_fnr <- fnr(ds_c_beta, c_beta)
-                result$beta_c_fr <- fr(ds_c_beta, c_beta)
-            }
-            results[[i <- i + 1]] <- result
-        }
-        cat('\n')
-    }
-
-    # Restructure results list of lists as data.frame.
-    results <- as.data.frame(t(sapply(results, function(res, cols) {
-        unlist(res[cols])
-    }, names(results[[1]]))))
-    results$scale <- as.factor(results$scale)
-    attr(results, 'params') <- list(
-        reps = reps, n = n, p = p, t = t, k = k, r = r, eps = eps)
-
-    results
-}
-
-reps <- 500
-#                   n,  p,  t, k, r
-#              --------------------
-params <- list( c(100, 10,  5, 1, 2)
-              , c(100,  7,  5, 1, 2)
-              , c(100,  5,  3, 1, 2)
-              , c(500, 10,  5, 1, 2)
-              , c(500,  7,  5, 1, 2)
-              , c(500,  5,  3, 1, 2)
-)
-scales <- seq(0.5, 6, 0.25)
-
-for (param in params) {
-    c(n, p, t, k, r) %<-% param
-    results <- simulation.sparse(scales, reps, n, p, t, k, r)
-    sim <- aggregate(results[, 'scale' != names(results)],
-                     by = list(scale = results$scale), mean)
-    attr(sim, 'params') <- attr(results, 'params')
-
-    file.name <- sprintf("simulation_sparse_%d_%d_%d_%d_%d.rds", n, p, t, k, r)
-    saveRDS(sim, file = file.name)
-
-    cat(file.name, '\n')
-    print(sim, digits = 2)
-}
--- a/tensorPredictors/R/TSIR.R
+++ b/tensorPredictors/R/TSIR.R
@ -2,12 +2,13 @@
 #'
 #' @export
 TSIR <- function(X, y, d, sample.axis = 1L,
+    nr.slices = 10L,    # default slices, ignored if y is a factor or integer
    max.iter = 50L,
    eps = sqrt(.Machine$double.eps)
 ) {

-    if (!(is.factor(y) || is.integer(y))) { # TODO: Implement continuous case!
-        stop("Only factor and integer response implemented!")
+    if (!(is.factor(y) || is.integer(y))) {
+        y <- cut(y, nr.slices)
    }

    stopifnot(exprs = {
--- a/vis/ising_small/app.R
+++ b/vis/ising_small/app.R
@ -0,0 +1,258 @@
+# usage: R -e "shiny::runApp(port = 8080)"
+# usage: R -e "shiny::runApp(host = '127.0.0.1', port = 8080)"
+
+library(shiny)
+library(mvbernoulli)
+library(tensorPredictors)
+
+# configuration
+# color.palet <- hcl.colors(64, "YlOrRd", rev = TRUE)
+color.palet <- hcl.colors(64, "Blue-Red 2", rev = FALSE)
+
+# GMLM parameters
+n <- 250
+p <- c(2, 3)
+q <- c(1, 1)
+
+eta1 <- 0                                                       # intercept
+
+# 270 deg (90 deg clockwise) rotation of matrix layout
+#
+# Used to get proper ploted matrices cause `image` interprets the `z` matrix as
+# a table of `f(x[i], y[j])` values, so that the `x` axis corresponds to row
+# number and the `y` axis to column number, with column 1 at the bottom,
+# i.e. a 90 degree counter-clockwise rotation of the conventional printed layout
+# of a matrix. By first calling `rot270` on a matrix before passing it to
+# `image` the plotted matrix layout now matches the conventional printed layout.
+rot270 <- function(A) {
+    t(A)[, rev(seq_len(nrow(A))), drop = FALSE]
+}
+plot.mat <- function(mat, add.values = FALSE, zlim = range(mat)) {
+    par(oma = rep(0, 4), mar = rep(0, 4))
+    img <- rot270(mat)
+    image(x = seq_len(nrow(img)), y = seq_len(ncol(img)), z = img,
+          zlim = zlim, col = color.palet, xaxt = "n", yaxt = "n", bty = "n")
+    if (add.values) {
+        text(x = rep(seq_len(nrow(img)), ncol(img)),
+             y = rep(seq_len(ncol(img)), each = nrow(img)),
+             round(img, 2), adj = 0.5, col = "black")
+    }
+}
+
+AR <- function(rho, dim) {
+    rho^abs(outer(seq_len(dim), seq_len(dim), `-`))
+}
+AR.inv <- function(rho, dim) {
+    A <- diag(c(1, rep(rho^2 + 1, dim - 2), 1))
+    A[abs(.row(dim(A)) - .col(dim(A))) == 1] <- -rho
+    A / (1 - rho^2)
+}
+
+# User Interface (page layout)
+ui <- fluidPage(
+    titlePanel("Ising Model Simulation Data Generation"),
+    sidebarLayout(
+        sidebarPanel(
+            h2("Settings"),
+            h4("c1 (influence of eta_y1"),
+            sliderInput("c1", "", min = 0, max = 1, value = 1, step = 0.01),
+            h4("c2 (influence of eta_y2"),
+            sliderInput("c2", "", min = 0, max = 1, value = 1, step = 0.01),
+            sliderInput("y", "y", min = -1, max = 1, value = 0, step = 0.05,
+                animate = animationOptions(
+                    interval = 250,
+                    loop = TRUE,
+                    playButton = NULL,
+                    pauseButton = NULL
+            )),
+            fluidRow(
+                column(6,
+                    radioButtons("alphaType", "Type: alphas",
+                        choices = list(
+                            "linspace" = "linspace", "poly" = "poly", "QR" = "QR"
+                        ),
+                        selected = "linspace"
+                    )
+                ),
+                column(6,
+                    radioButtons("OmegaType", "Type: Omegas",
+                        choices = list(
+                            "Identity" = "identity", "AR(rho)" = "AR", "AR(rho)^-1" = "AR.inv"
+                        ),
+                        selected = "AR"
+                    )
+                )
+            ),
+            sliderInput("rho", "rho", min = -1, max = 1, value = -0.55, step = 0.01),
+            actionButton("reset", "Reset")
+        ),
+        mainPanel(
+            fluidRow(
+                column(4, h3("eta_y1"),  plotOutput("eta_y1")  ),
+                column(4, h3("eta_y2"),  plotOutput("eta_y2")  ),
+                column(4, h3("Theta_y"), plotOutput("Theta_y") )
+            ),
+            fluidRow(
+                column(4, offset = 2,
+                    h3("Expectation E[X | Y = y]"), plotOutput("expectationPlot"),
+                ),
+                column(4,
+                    h3("Covariance Cov(X | Y = y)"), plotOutput("covariancePlot"),
+                    textOutput("covRange"),
+                )
+            ),
+            fluidRow(
+                column(8, offset = 4, h3("iid samples") ),
+                column(4, "Conditional Expectations", plotOutput("cond_expectations") ),
+                column(4, "observations sorted by y_i", plotOutput("sample_sorted_y") ),
+                column(4, "observations sorted by X_i", plotOutput("sample_sorted_X") ),
+            ),
+            fluidRow(
+                column(6, h3("Sample Mean"), plotOutput("sampleMean") ),
+                column(6, h3("Sample Cov"),  plotOutput("sampleCov") )
+            )
+        )
+    )
+)
+
+# Server logic
+server <- function(input, output, session) {
+
+    Fun_y <- function(y) { array(sin(pi * y), dim = q) }
+
+    Fy <- reactive({ Fun_y(input$y) })
+    alphas <- reactive({
+        switch(input$alphaType,
+            "linspace" = Map(function(pj, qj) {
+                data <- linspace <- seq(-1, 1, len = pj)
+                for (k in seq_len(qj - 1)) {
+                    linspace <- rev(linspace)
+                    data <- c(data, linspace)
+                }
+                matrix(data, nrow = pj)
+            }, p, q),
+            "poly"  = Map(function(pj, qj) {
+                data <- linspace <- seq(-1, 1, len = pj)
+                for (k in (seq_len(qj - 1) + 1)) {
+                    data <- c(data, linspace^k)
+                }
+                matrix(data, nrow = pj)
+            }, p, q),
+            "QR"       = Map(function(pj, qj) {
+                qr.Q(qr(matrix(rnorm(pj * qj), pj, qj)))
+            }, p, q)
+        )
+    })
+    Omegas <- reactive({
+        switch(input$OmegaType,
+            "identity" = Map(diag, p),
+            "AR"       = Map(AR, list(input$rho), dim = p),
+            "AR.inv"   = Map(AR.inv, list(input$rho), dim = p)
+        )
+    })
+
+    eta_y1 <- reactive({
+        input$c1 * (mlm(Fy(), alphas()) + c(eta1))
+    })
+    eta_y2 <- reactive({
+        input$c2 * Reduce(`%x%`, rev(Omegas()))
+    })
+
+    # compute Ising model parameters from GMLM parameters given single `Fy`
+    theta_y <- reactive({
+        vech(diag(c(eta_y1())) + (1 - diag(nrow(eta_y2()))) * eta_y2())
+    })
+
+    E_y <- reactive({
+        mvbernoulli::ising_expectation(theta_y())
+    })
+    Cov_y <- reactive({
+        mvbernoulli::ising_cov(theta_y())
+    })
+
+    random_sample <- reactive({
+        c1 <- input$c1
+        c2 <- input$c2
+        eta_y_i2 <- eta_y2()
+
+        y <- sort(runif(n, -1, 1))
+        X <- sapply(y, function(y_i) {
+            Fy_i <- Fun_y(y_i)
+
+            eta_y_i1 <- c1 * (mlm(Fy_i, alphas()) + c(eta1))
+
+            theta_y_i <- vech(diag(c(eta_y_i1)) + (1 - diag(nrow(eta_y_i2))) * eta_y_i2)
+
+            ising_sample(1, theta_y_i)
+        })
+        attr(X, "p") <- prod(p)
+
+        as.mvbmatrix(X)
+    })
+
+    cond_expectations <- reactive({
+        c1 <- input$c1
+        c2 <- input$c2
+        eta_y_i2 <- eta_y2()
+
+        y <- seq(-1, 1, length.out = 50)
+        t(sapply(y, function(y_i) {
+            Fy_i <- Fun_y(y_i)
+
+            eta_y_i1 <- c1 * (mlm(Fy_i, alphas()) + c(eta1))
+
+            theta_y_i <- vech(diag(c(eta_y_i1)) + (1 - diag(nrow(eta_y_i2))) * eta_y_i2)
+
+            ising_expectation(theta_y_i)
+        }))
+    })
+
+    output$eta_y1 <- renderPlot({
+        plot.mat(eta_y1(), add.values = TRUE, zlim = c(-2, 2))
+    }, res = 108)
+    output$eta_y2 <- renderPlot({
+        plot.mat(eta_y2())
+    })
+    output$Theta_y <- renderPlot({
+        plot.mat(vech.pinv(theta_y()))
+    })
+
+    output$expectationPlot <- renderPlot({
+        plot.mat(matrix(E_y(), p[1], p[2]), add.values = TRUE, zlim = c(0, 1))
+    }, res = 108)
+    output$covariancePlot <- renderPlot({
+        plot.mat(Cov_y())
+    })
+    output$covRange <- renderText({
+        paste(round(range(Cov_y()), 3), collapse = " - ")
+    })
+    output$cond_expectations <- renderPlot({
+        plot.mat(cond_expectations(), zlim = 0:1)
+    })
+    output$sample_sorted_y <- renderPlot({
+        plot.mat(random_sample())
+    })
+    output$sample_sorted_X <- renderPlot({
+        X <- random_sample()
+        plot.mat(X[do.call(order, as.data.frame(X)), ])
+    })
+    output$sampleMean <- renderPlot({
+        Xmean <- matrix(colMeans(random_sample()), p[1], p[2])
+        plot.mat(Xmean, add.values = TRUE, zlim = c(0, 1))
+    }, res = 108)
+    output$sampleCov <- renderPlot({
+        plot.mat(cov(random_sample()))
+    })
+
+    observeEvent(input$reset, {
+        updateNumericInput(session, "c1",  value = 1)
+        updateNumericInput(session, "c2",  value = 1)
+        updateNumericInput(session, "y",   value = 0)
+        updateNumericInput(session, "rho", value = -0.55)
+        updateRadioButtons(session, "OmegaType", selected = "AR")
+        updateRadioButtons(session, "alphaType", selected = "poly")
+    })
+}
+
+# launch Shiny Application (start server)
+shinyApp(ui = ui, server = server)