Compare commits
3 Commits
7a68948d26
...
b1f25b89da
Author  SHA1  Date 

Daniel Kapla  b1f25b89da  
Daniel Kapla  5b3912f21b  
Daniel Kapla  8fb2fd057d 
File diff suppressed because it is too large
Load Diff

@ 0,0 +1,505 @@


\documentclass[aos]{imsart}




%% Packages


\RequirePackage{amsthm,amsmath,amsfonts,amssymb}


\RequirePackage[numbers,sort&compress]{natbib}


%\RequirePackage[authoryear]{natbib}%% uncomment this for authoryear citations


\RequirePackage[colorlinks,citecolor=blue,urlcolor=blue]{hyperref}


\RequirePackage{graphicx}




\startlocaldefs


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% %%


%% Uncomment next line to change %%


%% the type of equation numbering %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\numberwithin{equation}{section}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% %%


%% For Axiom, Claim, Corollary, Hypothesis, %%


%% Lemma, Theorem, Proposition %%


%% use \theoremstyle{plain} %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\theoremstyle{plain}


\newtheorem{axiom}{Axiom}


\newtheorem{claim}[axiom]{Claim}


\newtheorem{theorem}{Theorem}[section]


\newtheorem{lemma}[theorem]{Lemma}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% %%


%% For Assumption, Definition, Example, %%


%% Notation, Property, Remark, Fact %%


%% use \theoremstyle{remark} %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\theoremstyle{remark}


\newtheorem{definition}[theorem]{Definition}


\newtheorem*{example}{Example}


\newtheorem*{fact}{Fact}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Please put your definitions here: %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




\endlocaldefs




\begin{document}




\begin{frontmatter}


\title{A sample article title}


%\title{A sample article title with some additional note\thanksref{t1}}


\runtitle{A sample running head title}


%\thankstext{T1}{A sample additional note to the title.}




\begin{aug}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Only one address is permitted per author. %%


%% Only division, organization and email is %%


%% included in the address. %%


%% Additional information can be included in %%


%% the Acknowledgments section if necessary. %%


%% ORCID can be inserted by command: %%


%% \orcid{0000000000000000} %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\author[A]{\fnms{First}~\snm{Author}\ead[label=e1]{first@somewhere.com}},


\author[B]{\fnms{Second}~\snm{Author}\ead[label=e2]{second@somewhere.com}\orcid{0000000000000000}}


\and


\author[B]{\fnms{Third}~\snm{Author}\ead[label=e3]{third@somewhere.com}}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Addresses %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\address[A]{Department,


University or Company Name\printead[presep={,\ }]{e1}}




\address[B]{Department,


University or Company Name\printead[presep={,\ }]{e2,e3}}


\end{aug}




\begin{abstract}


The abstract should summarize the contents of the paper.


It should be clear, descriptive, selfexplanatory and not longer


than 200 words. It should also be suitable for publication in


abstracting services. Formulas should be used as sparingly as


possible within the abstract. The abstract should not make


reference to results, bibliography or formulas in the body


of the paperit should be selfcontained.




This is a sample input file. Comparing it with the output it


generates can show you how to produce a simple document of


your own.


\end{abstract}




\begin{keyword}[class=MSC]


\kwd[Primary ]{00X00}


\kwd{00X00}


\kwd[; secondary ]{00X00}


\end{keyword}




\begin{keyword}


\kwd{First keyword}


\kwd{second keyword}


\end{keyword}




\end{frontmatter}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Please use \tableofcontents for articles %%


%% with 50 pages and more %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\tableofcontents




\section{Introduction}




This template helps you to create a properly formatted \LaTeXe\ manuscript.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% `\ ' is used here because TeX ignores %%


%% spaces after text commands. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Prepare your paper in the same style as used in this sample .pdf file.


Try to avoid excessive use of italics and bold face.


Please do not use any \LaTeXe\ or \TeX\ commands that affect the layout


or formatting of your document (i.e., commands like \verb\textheight,


\verb\textwidth, etc.).




\section{Section headings}


Here are some subsections:


\subsection{A subsection}


Regular text.


\subsubsection{A subsubsection}


Regular text.




\section{Text}


\subsection{Lists}




The following is an example of an \emph{itemized} list,


two levels deep.


\begin{itemize}


\item


This is the first item of an itemized list. Each item


in the list is marked with a ``tick.'' The document


style determines what kind of tick mark is used.


\item


This is the second item of the list. It contains another


list nested inside it.


\begin{itemize}


\item This is the first item of an itemized list that


is nested within the itemized list.


\item This is the second item of the inner list. \LaTeX\


allows you to nest lists deeper than you really should.


\end{itemize}


This is the rest of the second item of the outer list.


\item


This is the third item of the list.


\end{itemize}




The following is an example of an \emph{enumerated} list of one level.




\begin{longlist}


\item This is the first item of an enumerated list.


\item This is the second item of an enumerated list.


\end{longlist}




The following is an example of an \emph{enumerated} list, two levels deep.


\begin{longlist}


\item[1.]


This is the first item of an enumerated list. Each item


in the list is marked with a ``tick.'' The document


style determines what kind of tick mark is used.


\item[2.]


This is the second item of the list. It contains another


list nested inside of it.


\begin{longlist}


\item


This is the first item of an enumerated list that


is nested within.


\item


This is the second item of the inner list. \LaTeX\


allows you to nest lists deeper than you really should.


\end{longlist}


This is the rest of the second item of the outer list.


\item[3.]


This is the third item of the list.


\end{longlist}




\subsection{Punctuation}


Dashes come in three sizes: a hyphen, an intraword dash like ``$U$statistics'' or ``the timehomogeneous model'';


a medium dash (also called an ``endash'') for number ranges or between two equal entities like ``12'' or ``CauchySchwarz inequality'';


and a punctuation dash (also called an ``emdash'') in place of a comma, semicolon,


colon or parentheseslike this.




Generating an ellipsis \ldots\ with the right spacing


around the periods requires a special command.




\subsection{Citation}


Only include in the reference list entries for which there are text citations,


and make sure all citations are included in the reference list.


Simple cite: \cite{r1}.


Multiple bibliography items cite: \cite{r1,r2,r3,r4}.


Citing bibliography with object: \cite{r1}, Theorem 1.




\section{Fonts}


Please use text fonts in text mode, e.g.:


\begin{itemize}


\item[]\textrm{Roman}


\item[]\textit{Italic}


\item[]\textbf{Bold}


\item[]\textsc{Small Caps}


\item[]\textsf{Sans serif}


\item[]\texttt{Typewriter}


\end{itemize}


Please use mathematical fonts in mathematical mode, e.g.:


\begin{itemize}


\item[] $\mathrm{ABCabc123}$


\item[] $\mathit{ABCabc123}$


\item[] $\mathbf{ABCabc123}$


\item[] $\boldsymbol{ABCabc123\alpha\beta\gamma}$


\item[] $\mathcal{ABC}$


\item[] $\mathbb{ABC}$


\item[] $\mathsf{ABCabc123}$


\item[] $\mathtt{ABCabc123}$


\item[] $\mathfrak{ABCabc123}$


\end{itemize}


Note that \verb\mathcal, \mathbb belongs to capital lettersonly font typefaces.




\section{Notes}


Footnotes\footnote{This is an example of a footnote.}


pose no problem.\footnote{Note that footnote number is after punctuation.}




\section{Quotations}




Text is displayed by indenting it from the left margin. There are short quotations


\begin{quote}


This is a short quotation. It consists of a


single paragraph of text. There is no paragraph


indentation.


\end{quote}


and longer ones.


\begin{quotation}


This is a longer quotation. It consists of two paragraphs


of text. The beginning of each paragraph is indicated


by an extra indentation.




This is the second paragraph of the quotation. It is just


as dull as the first paragraph.


\end{quotation}




\section{Environments}




\subsection{Examples for \emph{\texttt{plain}}style environments}


\begin{axiom}\label{ax1}


This is the body of Axiom \ref{ax1}.


\end{axiom}




\begin{proof}


This is the body of the proof of the axiom above.


\end{proof}




\begin{claim}\label{cl1}


This is the body of Claim \ref{cl1}. Claim \ref{cl1} is numbered after


Axiom \ref{ax1} because we used \verb[axiom] in \verb\newtheorem.


\end{claim}




\begin{theorem}\label{th1}


This is the body of Theorem \ref{th1}. Theorem \ref{th1} numbering is


dependent on section because we used \verb[section] after \verb\newtheorem.


\end{theorem}




\begin{theorem}[Title of the theorem]\label{th2}


This is the body of Theorem \ref{th2}. Theorem \ref{th2} has additional title.


\end{theorem}




\begin{lemma}\label{le1}


This is the body of Lemma \ref{le1}. Lemma \ref{le1} is numbered after


Theorem \ref{th2} because we used \verb[theorem] in \verb\newtheorem.


\end{lemma}






\begin{proof}[Proof of Lemma \ref{le1}]


This is the body of the proof of Lemma \ref{le1}.


\end{proof}




\subsection{Examples for \emph{\texttt{remark}}style environments}


\begin{definition}\label{de1}


This is the body of Definition \ref{de1}. Definition \ref{de1} is numbered after


Lemma \ref{le1} because we used \verb[theorem] in \verb\newtheorem.


\end{definition}




\begin{example}


This is the body of the example. Example is unnumbered because we used \verb\newtheorem*


instead of \verb\newtheorem.


\end{example}




\begin{fact}


This is the body of the fact. Fact is unnumbered because we used \verb\newtheorem*


instead of \verb\newtheorem.


\end{fact}




\section{Tables and figures}


Crossreferences to labeled tables: As you can see in Table~\ref{sphericcase}


and also in Table~\ref{parset}.




\begin{table*}


\caption{The spherical case ($I_1=0$, $I_2=0$)}


\label{sphericcase}


\begin{tabular}{@{}lrrrrc@{}}


\hline


Equil. \\


points & \multicolumn{1}{c}{$x$}


& \multicolumn{1}{c}{$y$} & \multicolumn{1}{c}{$z$}


& \multicolumn{1}{c}{$C$} & S \\


\hline


$L_1$ & $$2.485252241 & 0.000000000 & 0.017100631 & 8.230711648 & U \\


$L_2$ & 0.000000000 & 0.000000000 & 3.068883732 & 0.000000000 & S \\


$L_3$ & 0.009869059 & 0.000000000 & 4.756386544 & $$0.000057922 & U \\


$L_4$ & 0.210589855 & 0.000000000 & $$0.007021459 & 9.440510897 & U \\


$L_5$ & 0.455926604 & 0.000000000 & $$0.212446624 & 7.586126667 & U \\


$L_6$ & 0.667031314 & 0.000000000 & 0.529879957 & 3.497660052 & U \\


$L_7$ & 2.164386674 & 0.000000000 & $$0.169308438 & 6.866562449 & U \\


$L_8$ & 0.560414471 & 0.421735658 & $$0.093667445 & 9.241525367 & U \\


$L_9$ & 0.560414471 & $$0.421735658 & $$0.093667445 & 9.241525367 & U \\


$L_{10}$ & 1.472523232 & 1.393484549 & $$0.083801333 & 6.733436505 & U \\


$L_{11}$ & 1.472523232 & $$1.393484549 & $$0.083801333 & 6.733436505 & U \\


\hline


\end{tabular}


\end{table*}




\begin{table}


\caption{Sample posterior estimates for each model}


\label{parset}


%


\begin{tabular}{@{}lcrcrrr@{}}


\hline


&& & &\multicolumn{3}{c}{Quantile} \\


\cline{57}


Model &Parameter &


\multicolumn{1}{c}{Mean} &


Std. dev.&


\multicolumn{1}{c}{2.5\%} &


\multicolumn{1}{c}{50\%}&


\multicolumn{1}{c@{}}{97.5\%} \\


\hline


{Model 0} & $\beta_0$ & $$12.29 & 2.29 & $$18.04 & $$11.99 & $$8.56 \\


& $\beta_1$ & 0.10 & 0.07 & $$0.05 & 0.10 & 0.26 \\


& $\beta_2$ & 0.01 & 0.09 & $$0.22 & 0.02 & 0.16 \\[6pt]


{Model 1} & $\beta_0$ & $$4.58 & 3.04 & $$11.00 & $$4.44 & 1.06 \\


& $\beta_1$ & 0.79 & 0.21 & 0.38 & 0.78 & 1.20 \\


& $\beta_2$ & $$0.28 & 0.10 & $$0.48 & $$0.28 & $$0.07 \\[6pt]


{Model 2} & $\beta_0$ & $$11.85 & 2.24 & $$17.34 & $$11.60 & $$7.85 \\


& $\beta_1$ & 0.73 & 0.21 & 0.32 & 0.73 & 1.16 \\


& $\beta_2$ & $$0.60 & 0.14 & $$0.88 & $$0.60 & $$0.34 \\


& $\beta_3$ & 0.22 & 0.17 & $$0.10 & 0.22 & 0.55 \\


\hline


\end{tabular}


%


\end{table}




\begin{figure}


\includegraphics{figure1}


\caption{Pathway of the penicillin G biosynthesis.}


\label{penG}


\end{figure}




Sample of crossreference to figure.


Figure~\ref{penG} shows that it is not easy to get something on paper.




\section{Equations and the like}




Two equations:


\begin{equation}


C_{s} = K_{M} \frac{\mu/\mu_{x}}{1\mu/\mu_{x}} \label{ccs}


\end{equation}


and


\begin{equation}


G = \frac{P_{\mathrm{opt}}  P_{\mathrm{ref}}}{P_{\mathrm{ref}}} 100(\%).


\end{equation}




Equation arrays:


\begin{eqnarray}


\frac{dS}{dt} & = &  \sigma X + s_{F} F,\\


\frac{dX}{dt} & = & \mu X,\\


\frac{dP}{dt} & = & \pi X  k_{h} P,\\


\frac{dV}{dt} & = & F.


\end{eqnarray}


One long equation:


\begin{eqnarray}


\mu_{\text{normal}} & = & \mu_{x} \frac{C_{s}}{K_{x}C_{x}+C_{s}} \nonumber\\


& = & \mu_{\text{normal}}  Y_{x/s}\bigl(1H(C_{s})\bigr)(m_{s}+\pi /Y_{p/s})\\


& = & \mu_{\text{normal}}/Y_{x/s}+ H(C_{s}) (m_{s}+ \pi /Y_{p/s}).\nonumber


\end{eqnarray}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Example with single Appendix: %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\begin{appendix}


\section*{Title}\label{appn} %% if no title is needed, leave empty \section*{}.


Appendices should be provided in \verb{appendix} environment,


before Acknowledgements.




If there is only one appendix,


then please refer to it in text as \ldots\ in the \hyperref[appn]{Appendix}.


\end{appendix}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Example with multiple Appendixes: %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\begin{appendix}


\section{Title of the first appendix}\label{appA}


If there are more than one appendix, then please refer to it


as \ldots\ in Appendix \ref{appA}, Appendix \ref{appB}, etc.




\section{Title of the second appendix}\label{appB}


\subsection{First subsection of Appendix \protect\ref{appB}}




Use the standard \LaTeX\ commands for headings in \verb{appendix}.


Headings and other objects will be numbered automatically.


\begin{equation}


\mathcal{P}=(j_{k,1},j_{k,2},\dots,j_{k,m(k)}). \label{path}


\end{equation}




Sample of crossreference to the formula (\ref{path}) in Appendix \ref{appB}.


\end{appendix}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Support information, if any, %%


%% should be provided in the %%


%% Acknowledgements section. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\begin{acks}[Acknowledgments]


The authors would like to thank the anonymous referees, an Associate


Editor and the Editor for their constructive comments that improved the


quality of this paper.


\end{acks}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Funding information, if any, %%


%% should be provided in the %%


%% funding section. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\begin{funding}


The first author was supported by NSF Grant DMS????????.




The second author was supported in part by NIH Grant ???????????.


\end{funding}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Supplementary Material, including data %%


%% sets and code, should be provided in %%


%% {supplement} environment with title %%


%% and short description. It cannot be %%


%% available exclusively as external link. %%


%% All Supplementary Material must be %%


%% available to the reader on Project %%


%% Euclid with the published article. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\begin{supplement}


\stitle{Title of Supplement A}


\sdescription{Short description of Supplement A.}


\end{supplement}


\begin{supplement}


\stitle{Title of Supplement B}


\sdescription{Short description of Supplement B.}


\end{supplement}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% The Bibliography %%


%% %%


%% imsart???.bst will be used to %%


%% create a .BBL file for submission. %%


%% %%


%% Note that the displayed Bibliography will not %%


%% necessarily be rendered by Latex exactly as specified %%


%% in the online Instructions for Authors. %%


%% %%


%% MR numbers will be added by VTeX. %%


%% %%


%% Use \cite{...} to cite references in text. %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




%% if your bibliography is in bibtex format, uncomment commands:


%\bibliographystyle{imsartnumber} % Style BST file (imsartnumber.bst or imsartnameyear.bst)


%\bibliography{bibliography} % Bibliography file (usually '*.bib')




%% or include bibliography directly:


\begin{thebibliography}{4}


%%


\bibitem{r1}


\textsc{Billingsley, P.} (1999). \textit{Convergence of


Probability Measures}, 2nd ed.


Wiley, New York.




\bibitem{r2}


\textsc{Bourbaki, N.} (1966). \textit{General Topology} \textbf{1}.


AddisonWesley, Reading, MA.




\bibitem{r3}


\textsc{Ethier, S. N.} and \textsc{Kurtz, T. G.} (1985).


\textit{Markov Processes: Characterization and Convergence}.


Wiley, New York.




\bibitem{r4}


\textsc{Prokhorov, Yu.} (1956).


Convergence of random processes and limit theorems in probability


theory. \textit{Theory Probab. Appl.}


\textbf{1} 157214.


\end{thebibliography}




\end{document}


@ 0,0 +1,197 @@


%% Template for the submission to:


%% The Annals of Statistics [AOS]


%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% In this template, the places where you %%


%% need to fill in your information are %%


%% indicated by '???'. %%


%% %%


%% Please do not use \input{...} to include %%


%% other tex files. Submit your LaTeX %%


%% manuscript as one .tex document. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




\documentclass[aos]{imsart}




%% Packages


\RequirePackage{amsthm,amsmath,amsfonts,amssymb}


\RequirePackage[numbers]{natbib}


%\RequirePackage[authoryear]{natbib}%% uncomment this for authoryear citations


%\RequirePackage[colorlinks,citecolor=blue,urlcolor=blue]{hyperref}%% uncomment this for coloring bibliography citations and linked URLs


%\RequirePackage{graphicx}%% uncomment this for including figures




\startlocaldefs


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% %%


%% Uncomment next line to change %%


%% the type of equation numbering %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\numberwithin{equation}{section}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% %%


%% For Axiom, Claim, Corollary, Hypothesis, %%


%% Lemma, Theorem, Proposition %%


%% use \theoremstyle{plain} %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\theoremstyle{plain}


%\newtheorem{???}{???}


%\newtheorem*{???}{???}


%\newtheorem{???}{???}[???]


%\newtheorem{???}[???]{???}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% %%


%% For Assumption, Definition, Example, %%


%% Notation, Property, Remark, Fact %%


%% use \theoremstyle{remark} %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\theoremstyle{remark}


%\newtheorem{???}{???}


%\newtheorem*{???}{???}


%\newtheorem{???}{???}[???]


%\newtheorem{???}[???]{???}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Please put your definitions here: %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




\endlocaldefs




\begin{document}




\begin{frontmatter}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% %%


%% Enter the title of your article here %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\title{???}


%\title{A sample article title with some additional note\thanksref{T1}}


\runtitle{???}


%\thankstext{T1}{A sample of additional note to the title.}




\begin{aug}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Only one address is permitted per author. %%


%% Only division, organization and email is %%


%% included in the address. %%


%% Additional information can be included in %%


%% the Acknowledgments section if necessary. %%


%% ORCID can be inserted by command: %%


%% \orcid{0000000000000000} %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\author[A]{\fnms{???}~\snm{???}\ead[label=e1]{???@???}},


\author[B]{\fnms{???}~\snm{???}\ead[label=e2]{???@???}}


\and


\author[B]{\fnms{???}~\snm{???}\ead[label=e3]{???@???}}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Addresses %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\address[A]{???\printead[presep={,\ }]{e1}}




\address[B]{???\printead[presep={,\ }]{e2,e3}}


\end{aug}




\begin{abstract}


???.


\end{abstract}




\begin{keyword}[class=MSC]


\kwd[Primary ]{???}


\kwd{???}


\kwd[; secondary ]{???}


\end{keyword}




\begin{keyword}


\kwd{???}


\kwd{???}


\end{keyword}




\end{frontmatter}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Please use \tableofcontents for articles %%


%% with 50 pages and more %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\tableofcontents




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%% Main text entry area:






%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Single Appendix: %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\begin{appendix}


%\section*{???}%% if no title is needed, leave empty \section*{}.


%\end{appendix}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Multiple Appendixes: %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\begin{appendix}


%\section{???}


%


%\section{???}


%


%\end{appendix}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Support information, if any, %%


%% should be provided in the %%


%% Acknowledgements section. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\begin{acks}[Acknowledgments]


% The authors would like to thank ...


%\end{acks}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Funding information, if any, %%


%% should be provided in the %%


%% funding section. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\begin{funding}


% The first author was supported by ...


%


% The second author was supported in part by ...


%\end{funding}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% Supplementary Material, including data %%


%% sets and code, should be provided in %%


%% {supplement} environment with title %%


%% and short description. It cannot be %%


%% available exclusively as external link. %%


%% All Supplementary Material must be %%


%% available to the reader on Project %%


%% Euclid with the published article. %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%\begin{supplement}


%\stitle{???}


%\sdescription{???.}


%\end{supplement}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% The Bibliography %%


%% %%


%% imsart???.bst will be used to %%


%% create a .BBL file for submission. %%


%% %%


%% Note that the displayed Bibliography will not %%


%% necessarily be rendered by Latex exactly as specified %%


%% in the online Instructions for Authors. %%


%% %%


%% MR numbers will be added by VTeX. %%


%% %%


%% Use \cite{...} to cite references in text. %%


%% %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




%% if your bibliography is in bibtex format, uncomment commands:


%\bibliographystyle{imsartnumber} % Style BST file (imsartnumber.bst or imsartnameyear.bst)


%\bibliography{bibliography} % Bibliography file (usually '*.bib')




%% or include bibliography directly:


% \begin{thebibliography}{}


% \bibitem{b1}


% \end{thebibliography}




\end{document}

File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff

@ 0,0 +1,153 @@


%% This is file `imsart.cls'


%%


%% LaTeX 2e class file for the processing of LaTeX2e files


%% of the following IMS/BS journals:


%%


%% The Annals of Probability


%% The Annals of Applied Probability


%% The Annals of Statistics


%% The Annals of Applied Statistics


%% Statistical Science


%% Probability Surveys


%% Statistics Surveys


%% Electronic Journal of Statistics


%% Bernoulli


%% Annales de l'Institut Henri Poincar\'e  Probabilit\'es et Statistiques


%% Brazilian Journal of Probability and Statistics


%% Bayesian Analysis


%%


%% Institute of Mathematical Statistics, U.S.A.


%% Bernoulli Society


%% Institut Henry Poincare


%% Brazilian Statistical Association


%% International Society for Bayesian Analysis


%%


%% Macros written by Vytas Statulevicius, VTeX, Lithuania


%% Maintained by Deimantas Galcius, VTeX, Lithuania


%% for Institute of Mathematical Statistics, U.S.A.


%% Please submit bugs or your comments to latexsupport@vtex.lt


%%


%% The original distribution is located at:


%% http://www.epublications.org/ims/support


%%


%% This class file loads standard "article.cls" with appropriate


%% settings and then style file "imsart.sty" with additional macros


%%


%% You are free to use this style file as you see fit, provided


%% that you do not make changes to the file.


%% If you DO make changes, you are required to rename this file.


%%


%% It may be distributed under the terms of the LaTeX Project Public


%% License, as described in lppl.txt in the base LaTeX distribution.


%% Either version 1.0 or, at your option, any later version.


%%


%% \CharacterTable


%% {Uppercase \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z


%% Lowercase \a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z


%% Digits \0\1\2\3\4\5\6\7\8\9


%% Exclamation \! Double quote \" Hash (number) \#


%% Dollar \$ Percent \% Ampersand \&


%% Acute accent \' Left paren \( Right paren \)


%% Asterisk \* Plus \+ Comma \,


%% Minus \ Point \. Solidus \/


%% Colon \: Semicolon \; Less than \<


%% Equals \= Greater than \> Question mark \?


%% Commercial at \@ Left bracket \[ Backslash \\


%% Right bracket \] Circumflex \^ Underscore \_


%% Grave accent \` Left brace \{ Vertical bar \


%% Right brace \} Tilde \~}


%%


%%


%% Bug fixes and changes: at end of file


%


% TeX programming: Vytas Statulevicius, VTeX, Lithuania.


% TeX programming: Deimantas Galcius, VTeX, Lithuania.


% TeX programming: Edgaras Sakuras, VTeX, Lithuania.


% Requires Latex2e, ver.2000.06


%


\NeedsTeXFormat{LaTeX2e}


\ProvidesClass{imsart}[2022/04/21 driver class for package imsart.sty]


%


% set various options for different journals:


\DeclareOption{ps}{%


\PassOptionsToClass{10pt,oneside}{article}%


}


%


\DeclareOption{ss}{%


\PassOptionsToClass{10pt,oneside}{article}%


}


%


\DeclareOption{ejs}{%


\PassOptionsToClass{10pt,oneside}{article}%


}


%


\DeclareOption{aap}{%


\PassOptionsToClass{11pt,oneside,leqno}{article}%


\PassOptionsToPackage{leqno}{amsmath}%


}


%


\DeclareOption{aop}{%


\PassOptionsToClass{11pt,oneside,leqno}{article}%


\PassOptionsToPackage{leqno}{amsmath}%


}


%


\DeclareOption{aos}{%


\PassOptionsToClass{11pt,oneside,leqno}{article}%


\PassOptionsToPackage{leqno}{amsmath}%


}


%


\DeclareOption{aoas}{%


\PassOptionsToClass{11pt,oneside,leqno}{article}%


\PassOptionsToPackage{leqno}{amsmath}%


}


%


\DeclareOption{sts}{%


\PassOptionsToClass{11pt,oneside,leqno}{article}%


\PassOptionsToPackage{leqno}{amsmath}%


}


%


\DeclareOption{bj}{%


\PassOptionsToClass{10pt,oneside}{article}%


\g@addto@macro\@classoptionslist{,nosetpagesize}%


\PassOptionsToPackage{sort&compress}{natbib}%


}


%


\DeclareOption{aihp}{%


\PassOptionsToClass{10pt,oneside,leqno}{article}%


\PassOptionsToPackage{leqno}{amsmath}%


\PassOptionsToPackage{numbers,sort&compress}{natbib}%


}


%


\DeclareOption{bjps}{%


\PassOptionsToClass{11pt,twoside}{article}%


}


%


\DeclareOption{ba}{%


\PassOptionsToClass{10pt,oneside}{article}%


}


%


\PassOptionsToPackage{cmex10}{amsmath}


%


\DeclareOption*{\PassOptionsToClass{\CurrentOption}{article}}


\ProcessOptions*


%


\LoadClass{article}


%


\IfFileExists{imsart.sty}%


{}%


{%


\ClassError{imsart}%


{The complimentary style file "imsart.sty" is required}%


{%


The complimentary style file "imsart.sty" is required\MessageBreak


You need to install file "imsart.sty" in your system\MessageBreak


File could be downloaded from http://www.epublications.org/ims/support


}%


}


%


% Load additional macros and modifications for "article.cls"


\RequirePackage{imsart}


%


\endinput


%%


%% End of file `imsart.cls'.

File diff suppressed because it is too large
Load Diff
2106
LaTeX/main.bib
2106
LaTeX/main.bib
File diff suppressed because it is too large
Load Diff
275
LaTeX/paper.tex
275
LaTeX/paper.tex

@ 13,7 +13,7 @@


\usepackage{environ} % for dynamic TikZ picture scaling


\usepackage{algorithm, algpseudocode} % Pseudo Codes / Algorithms


\usepackage[


style = apa, % citation style


style = authoryear, % citation style


isbn = false, % show isbn?


maxbibnames = 50, % maximal number of names in bibilography


maxcitenames = 2, % maximal number of names in text before et al.



@ 28,8 +28,8 @@


sortcites = false,


natbib = false,


dashed = true,


url = false,


doi = false,


url = true,


doi = true,


bibencoding = utf8


]{biblatex}





@ 87,7 +87,7 @@


\newtheorem{remark}{Remark}






\crefalias{section}{appendix}


% \crefalias{section}{appendix}




\crefname{condition}{Condition}{Conditions}


\Crefname{condition}{Condition}{Conditions}



@ 272,9 +272,9 @@


\maketitle




\begin{abstract}


We consider regression or classification problems where the independent variable is matrix or tensorvalued. We derive a multilinear sufficient reduction for the regression or classification problem modeling the conditional distribution of the predictors given the response as a member of the quadratic exponential family. Using manifold theory, we prove the consistency and asymptotic normality of the sufficient reduction. We develop estimation procedures of


sufficient reductions for both continuous and binary tensorvalued predictors. For continuous predictors, the algorithm is highly computationally efficient and is also applicable to situations where the dimension of


the reduction exceeds the sample size. We demonstrate the superior performance of our approach in simulations and realworld data examples for both continuous and binary tensorvalued predictors. The \textit{Chess data} analysis results agree with a human player's understanding of the game and confirm the relevance of our approach.


We consider supervised learning (regression/classification) problems where the independent variable is tensor valued. We derive a multilinear sufficient reduction for the regression or classification problem modeling the conditional distribution of the predictors given the response as a member of the quadratic exponential family. Using manifold theory, we prove the consistency and asymptotic normality of the sufficient reduction. We develop estimation procedures of


sufficient reductions for both continuous and binary tensor valued predictors. For continuous predictors, the algorithm is highly computationally efficient and is also applicable to situations where the dimension of


the reduction exceeds the sample size. We demonstrate the superior performance of our approach in simulations and realworld data examples for both continuous and binary tensor valued predictors. The \textit{Chess data} analysis results agree with a human player's understanding of the game and confirm the relevance of our approach.


\end{abstract}




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



@ 283,50 +283,35 @@




Tensors are a mathematical tool to represent data of complex structure in statistics. \textit{Tensors} are considered as a generalization of matrices to higher dimensions: A tensor is a multidimensional array of numbers. For example, a secondorder tensor can be represented as a matrix, while a thirdorder tensor can be represented as a cube of matrices.




Complex data are collected at different times and/or under several conditions often involving a large number of multiindexed variables represented as tensorvalued data \parencite{KoldaBader2009}. They occur in largescale longitudinal studies \parencite[e.g.][]{Hoff2015}, in agricultural experiments and chemometrics and spectroscopy \parencite[e.g.][]{LeurgansRoss1992,Burdick1995}, in signal and video processing where sensors produce multiindexed data, e.g. over spatial, frequency, and temporal dimensions \parencite[e.g.][]{DeLathauwerCastaing2007,KofidisRegalia2005}, and in telecommunications \parencite[e.g.][]{DeAlmeidaEtAl2007}. Other examples of multiway data include 3D images of the brain, where the modes are the 3 spatial dimensions, and spatiotemporal weather imaging data, a set of image sequences represented as 2 spatial modes and 1 temporal mode.


Complex data are collected at different times and/or under several conditions often involving a large number of multiindexed variables represented as tensor valued data \parencite{KoldaBader2009}. They occur in largescale longitudinal studies \parencite[e.g.][]{Hoff2015}, in agricultural experiments and chemometrics and spectroscopy \parencite[e.g.][]{LeurgansRoss1992,Burdick1995}, in signal and video processing where sensors produce multiindexed data, e.g. over spatial, frequency, and temporal dimensions \parencite[e.g.][]{DeLathauwerCastaing2007,KofidisRegalia2005}, and in telecommunications \parencite[e.g.][]{DeAlmeidaEtAl2007}. Other examples of multiway data include 3D images of the brain, where the modes are the 3 spatial dimensions, and spatiotemporal weather imaging data, a set of image sequences represented as 2 spatial modes and 1 temporal mode.




% \begin{itemize}


% \item Review \cite{ZhouLiZhu2013} and see how you compare with them. They focus on the forward regression model with a scalar response but they claim that "Exploiting the array structure in imaging data, the new method substantially reduces the dimensionality of imaging data, which leads to efficient estimation and prediction."


% \item Read \cite{ZhouEtAl2023} to figure out the distribution they use for the tensorvalued predictors and briefly describe what they do.


% \item Read \cite{RabusseauKadri2016} to figure out what they do. They seem to draw both the response and the predictors from tensornormal with iid N(0,1) entries: "In order to leverage the tensor structure of the output data, we formulate the problem as the minimization of a least squares criterion subject to a multilinear rank constraint on the regression tensor. The rank constraint enforces the model to capture lowrank structure in the outputs and to explain dependencies between inputs and outputs in a lowdimensional multilinear subspace."


% \end{itemize}


Tensor regression models have been proposed to leverage the structure inherent in tensorvalued data. For instance, \textcite{HaoEtAl2021,ZhouLiZhu2013} focus on tensor covariates, while \textcite{RabusseauKadri2016,LiZhang2017,ZhouLiZhu2013} focus on tensor responses, and \textcite{Hoff2015,Lock2018} consider tensor on tensor regression. \textcite{HaoEtAl2021} modeled a scalar response as a flexible nonparametric function of tensor covariates. \textcite{ZhouLiZhu2013} assume the scalar response has a distribution in the exponential family given the tensorvalued predictors with the link modeled as a multilinear function of the predictors. \textcite{RabusseauKadri2016} model the tensorvalued response as a linear model with tensorvalued regression coefficients subject to a multilinear rank constraint. \textcite{LiZhang2017} approach the problem with a similar linear model but instead of a lowrank constraint the error term is assumed to have a separable Kronecker product structure while using a generalization of the envelope model \parencite{CookLiChiaromonte2010}. \textcite{ZhouEtAl2023} focus on partially observed tensor response given vectorvalued predictors with modewise sparsity constraints in the regression coefficients. \textcite{Hoff2015} extends an existing bilinear regression model to a tensor on tensor of conformable modes and dimensions regression model based on a Tucker product. \textcite{Lock2018} uses a tensor contraction to build a penalized least squares model for a tensor with arbitrary number of modes and dimensions.




%  RabusseauKadri2016 Y  x for tensor Y with vector x (HOLRR; Higher Order LowRank Regression)


%  LiZhang2017 Y  x for tensoe Y with vector x (envelope model)


%  ZhouEtAl2023 Y  x for tensor Y with vector x (sparse and partially observed)


Our approach considers the general regression problem of fitting a response of general form (univariate, multivariate, tensorvalued) on a tensorvalue predictor. We operate in the context of sufficient dimension reduction \parencite[e.g.]{Cook1998,Li2018} based on inverse regression, which leads us to regressing the tensor valued predictor on the response. In our setting, this necessitates transforming the response to tensorvalued functions, regardless of whether it is itself tensor valued. Because of the setting, our method shares commonalities with the tensor regression models referred to above, yet the modeling and methodology are novel. Specifically, our tensortotensor regression model is a generalized multilinear model similar to the generalized linear model of \parencite{ZhouLiZhu2013}. To bypass the explosion of the number of parameters to estimate, we assume the inverse regression error covariance has Kronecker product structure as do \textcite{LiZhang2017}. Our maximum likelihoodbased estimation does not require any penalty terms in contrast to the least squares and/or sparse approaches \parencite{ZhouLiZhu2013}. In the case of a tensor (multilinear) normal, given the tensor valued function of the response, our model exhibits similarities to the multilinear modeling of \textcite{Hoff2015}, but we use a generalized multilinear model and estimate the parameters with maximum likelihood instead of least squares. Moreover, a common issue in multilinear tensor regression models is the unidentifiability of the parameters, which we address in a completely different manner. For example, \textcite{LiZhang2017} developed theory that is based on orthogonal projection matrices to uniquely identify a subspace, while our approach is more general as it uses manifold theory.




%  ZhouLiZhu2013 y\in R (GLM) for y  Z, X for tensor X


%  HaoEtAl2021 y\in R for y  X for tensor X (sparse, element wise Bsplines)




% Tensor regression models have been proposed to exploit the special structure of tensor covariates, e.g. \cite{HaoEtAl2021,ZhouLiZhu2013}, or tensor responses \cite{RabusseauKadri2016,LiZhang2017,ZhouEtAl2023} \cite{HaoEtAl2021} modeled a scalar response as a flexible nonparametric function of tensor covariates. \cite{ZhouLiZhu2013} assume the scalar response has a distribution in the exponential family given the tensorvalued predictors and model the link function as a multilinear function of the predictors. \cite{LiZhang2017} model the tensorvalued response as tensor normal. Rather than using $L_1$ type penalty functions to induce sparsity, they employ the envelope method (Cook, Li, and Chiaromonte Citation2010) to estimate the unknown regression coefficient. Moreover, the envelope method essentially identifies and uses the material information jointly. They develop an estimation algorithm and study the asymptotic properties of the estimator. the scalar response as These models try to utilize the sparse and lowrank structures in the tensors  either in the regression coefficient tensor or the response tensor  to boost performance on the regression task by reducing the number of free parameters.




Tensor regression models have been proposed to leverage the structure inherent in tensor valued data. For instance, \textcite{HaoEtAl2021,ZhouLiZhu2013} focus on tensor covariates, while \textcite{RabusseauKadri2016,LiZhang2017,ZhouLiZhu2013} focus on tensor responses, and \textcite{Hoff2015,Lock2018} consider tensor on tensor regression. \textcite{HaoEtAl2021} modeled a scalar response as a flexible nonparametric function of tensor covariates. \textcite{ZhouLiZhu2013} assume the scalar response has a distribution in the exponential family given the tensorvalued predictors with the link modeled as a multilinear function of the predictors. \textcite{RabusseauKadri2016} model the tensorvalued response as a linear model with tensor valued regression coefficients subject to a multilinear rank constraint. \textcite{LiZhang2017} approach the problem with a similar linear model but instead of a low rank constraint the error term is assumed to have a separable Kronecker product structure while using a generalization of the envelope model \parencite{CookLiChiaromonte2010}. \textcite{ZhouEtAl2023} focus on partially observed tensor response given vectorvalued predictors with modewise sparsity constraints in the regression coefficients. \textcite{Hoff2015} extends an existing bilinear regression model to a tensor on tensor of conformable modes and dimensions regression model based on a Tucker product. \textcite{Lock2018} uses a tensor contraction to build a penalized least squares model for a tensor with arbitrary number of modes and dimensions.




Our approach considers the general regression problem of fitting a response of general form (univariate, multivariate, tensorvalued) on a tensorvalue predictor. We operate in the context of sufficient dimension reduction \parencite[e.g.]{Cook1998,Li2018} based on inverse regression, which leads us to regressing the tensorvalued predictor on the response. In our setting, this necessitates transforming the response to tensorvalued functions, regardless of whether it is itself tensorvalued. Because of the setting, our method shares commonalities with the tensor regression models referred to above, yet the modeling and methodology are novel.


Specifically, our tensortotensor regression model is a generalized multilinear model similar to the generalized linear model of \cite{ZhouLiZhu2013}. % but with tensor valued response by applying (a known) tensor valued function to the response in an inverse regression setting, reversing the role of response and predictors.


To bypass the explosion of number of parameters to estimate, we assume the inverse regression error covariance has Kronecker product structure as do \textcite{LiZhang2017}. Our maximum likelihoodbased estimation does not require any penalty terms in contrast to the least squares and/or sparse approaches \cite{????} . In the case of a tensor (multilinear) normal, given the tensorvalued function of the response, our model exhibits similarities to the multilinear modeling of \textcite{Hoff2015}, but we use a generalized multilinear model and estimate the parameters with maximum likelihood instead of least squares. Moreover, a common issue in multilinear tensor regression models is the unidentifiability of the parameters, which we address in a completely different manner. For example, \cite{LiZhang2017} develop theory that is based on orthogonal projection matrices to uniquely identify a subspace, while our approach is more general as it uses manifold theory.


In this paper, we present a modelbased \emph{Sufficient Dimension Reduction} (SDR) method for tensorvalued data with distribution in the quadratic exponential family assuming a separable Kronecker product structure of the first and second moment. The quadratic exponential family contains the multilinear normal and the multilinear Ising distributions, for continuous and binary tensorvalued random variables, respectively. By generalizing the parameter space to embedded manifolds we %obtain consistency and asymptotic normality results while


allow great modeling flexibility in the sufficient dimension reduction.






In this paper we present a modelbased \emph{Sufficient Dimension Reduction} (SDR) method for tensorvalued data with distribution in the quadratic exponential family assuming a separable Kronecker product structure of the first and second moment. By generalizing the parameter space to embedded manifolds we obtain consistency and asymptotic normality results while allowing great modeling flexibility in the linear sufficient dimension reduction.


Multilinear normal models have been used in various applications, including medical imaging \parencite{BasserPajevic2007,DrydenEtAl2009}, spatiotemporal data analysis \parencite{GreenewaldHero2014}, regression analysis for longitudinal relational data \parencite{Hoff2015}. One of the most important uses of the multilinear normal (MLN) distribution, and hence tensor analysis, is perhaps in magnetic resonance imaging (MRI) \parencite{OhlsonEtAl2013}. A recent survey \parencite{WangEtAl2022} and references therein contain more information and potential applications of multilinear tensor normal models.




The quadratic exponential family contains the tensor normal and the tensor Ising distributions, for continuous and binary tensorvalued random variables, respectively.


The Ising\footnote{Also known as the \emph{LenzIsing} model as the physical assumptions of the model where developed by both Lenz and Ising \parencite{Niss2005} where Ising gave a closed form solution for the 1D lattice \parencite{Ising1925}.} model \parencite{Lenz1920,Ising1925,Niss2005} is a mathematical model originating in statistical physics to study ferromagnetism in a thermodynamic setting. It describes magnetic dipoles (atomic ``spins'' with values $\pm 1$) under an external magnetic field (first moments) while allowing twoway interactions (second moments) between direct neighbors on a lattice, a discrete grid. The Ising problem, as known in statistical physics, is to compute observables such as the magnetizations and correlations under the Boltzmann distribution\footnote{The Boltzmann distribution is a probability distribution over the states of a physical system in thermal equilibrium (constant temperature) that assigns higher probabilities to states with lower energy.} while the interaction structure and the magnetic fields are given. The ``reverse'' problem, where the couplings and fields are unknown and to be determined from observations of the spins, as in statistical inference, is known as the \emph{inverse Ising problem} \parencite{NguyenEtAl2017}. From this point of view, the Ising model is a member of a discrete quadratic exponential family \parencite{CoxWermuth1994,JohnsonEtAl1997} for multivariate binary outcomes where the interaction structure (nonzero correlations) is determined by the lattice. Generally, neither the values of couplings nor the interaction structure are known.




Multilinear tensor normal models have been used in various applications, including medical imaging \parencite{BasserPajevic2007,DrydenEtAl2009}, spatiotemporal data analysis \parencite{GreenewaldHero2014}, regression analysis for longitudinal relational data \parencite{Hoff2015}. One of the most important uses of the multilinear normal (MLN) distribution, and hence tensor analysis, is perhaps in magnetic resonance imaging (MRI) \parencite{OhlsonEtAl2013}. A recent survey \parencite{WangEtAl2022} and references therein contain more information and potential applications of multilinear tensor normal models.


In consequence, the Ising model is mostly used to model multivariate binary data in statistics. The states are ${0, 1}$ instead of $\pm 1$, and full interaction structure. It is related to a multitude of other models, among which the most prominent are: \emph{Graphical Models} and \emph{Markov Random Fields} to describe conditional dependence \parencite{Lauritzen1996,WainwrightJordan2008,LauritzenRichardson2002}, \emph{Potts models} \parencite{Besag1974,ChakrabortyEtAl2022} which generalize the Ising model to multiple states, the \emph{multivariate Bernoulli distribution} \parencite{Whittaker1990,JohnsonEtAl1997,DaiDingWahba2013} that also accommodates higherorder interactions (threeway and higher), \emph{(restricted) Botlzmann machines} \parencite{Smolensky1986,Hinton2002,FischerIgel2012} that introduce additional hidden variables for learning binary distributions. Most of these models can be used both in supervised and unsupervised settings. Applications of the Ising model (and variations thereof) range from modeling neural firing patterns \parencite{SchneidmanEtAl2006}, gene expression data analysis \parencite{LezonEtAl2006}, and modeling financial markets \parencite{Bury2013}. See also \textcite{NguyenEtAl2017}.




The Ising model for multivariate binary outcomes belongs to the class of discrete exponential families. Its defining feature is that the sufficient statistic involves a quadratic term to capture correlations arising from pairwise interactions.


The tensor Ising model is a higherorder Ising model for tensorvalued binary outcomes.


%From \cite{MukherjeeEtAl2020}


Higherorder Ising models arise naturally in the study of multiatom interactions in lattice gas models, such as the squarelattice eightvertex model, the AshkinTeller model, and Suzuki's pseudo3D anisotropic model (cf. [6, 33, 36, 37, 49, 55, 56, 61, 62] and the references therein). More recently, higherorder spin systems have also been proposed for modeling peergroup effects in social networks [22]. \textcite{MukherjeeEtAl2020} proposed a maximum pseudolikelihood estimation algorithm for a oneparameter tensorIsing model. \efi{Daniel: comment on what these guys do and contrast with your setting} In our approach, the parameter is not constrained to be scalar


We derive maximum likelihood estimates for all first and second order interactions and propose a gradientbased optimization algorithm.


The $r$tensor Ising model in statistical physics is a generalization of the Ising model to $r$order interactions. \textcite{MukherjeeEtAl2022} study the oneparameter discrete exponential family for modeling dependent binary data where the interaction structure is given. In \textcite{LiuEtAl2023} the tensor structure itself is to be inferred. These models are fundamentally different from our approach where we rely on properties of the quadratic exponential family which models up to secondorder interactions. Another important difference is that we adopt the multilinear formulation as it is inherently linked to the observable structure of multiway data as opposed to describing the model coefficients with an $r$order tensor structure.




Our results in the framework of the quadratic exponential family for tensorvalued variables; i.e., consistency and asymptotic normality, apply to both tensor normal and tensor Ising models.


Our main contributions are (a) formulating the dimension reduction problem via the quadratic exponential family, which allows us to derive the sufficient dimension reduction in closed form, (b) defining the parameter space as an embedded manifold, which provides great flexibility in modeling, (c) deriving the maximum likelihood estimator of the sufficient reduction subject to multilinear constraints and overcoming parameter nonidentifiability, (d) developing estimation algorithms which in the case of multilinear normal predictors is fast and efficient, and (e) establishing the consistency and asymptotic normality of the estimators.




Even though our motivation is rooted in the SDR perspective, our proposal concerns inference on any regression model with a tensor valued response and predictors of any type. Thus, our approach can be used as a standalone model for such data regardless of whether one is interested in deriving sufficient reductions and/or reducing the dimension of the data. Our results in the framework of the quadratic exponential family for tensor valued variables; i.e., consistency and asymptotic normality, apply to both multilinear normal \parencite{KolloVonRosen2005,Hoff2011,OhlsonEtAl2013} and multilinear Ising models, as defined in \cref{sec:ising_estimation}.




The structure of this paper is as follows. We introduce our notation in \cref{sec:notation}. \Cref{sec:problemformulation} details the problem we consider and in \cref{sec:gmlmmodel} we introduce our model. Continuing in \cref{sec:mlestimation} we provide the basis for a general maximum likelihood estimation procedure and derive specialized methods for multilinear normal as well as the multilinear Ising distributions. \Cref{sec:manifolds} gives a short introduction into manifolds and provides the basis for applying the consistency and asymtotic normality results from \cref{sec:asymtotics}. Simulations for continuous and binary predictors are the subject of \cref{sec:simulations}. Finally, in \cref{sec:dataanalysis} we apply our model to EEG data and perform a proof of concept data analysis where a chess board is interpreted as a collection of binary $8\times 8$ matrices.




The structure of this paper is as follows. In \cref{sec:notation} we introduce our notation. \Cref{sec:problemformulation} decribes the exact problem and in \cref{sec:gmlmmodel} we introduce our model. Continuing in \cref{sec:mlestimation} we provide the basis for a general maximum likelihood estimation procedure and derive specialized methods for tensor normal as well as the tensor Ising distributions. \Cref{sec:manifolds} gives a short introduction into manifolds and provides the basis for applying the consistency and asymtotic normality results from \cref{sec:asymtotics}. Simulations for continuous and binary predicotrs are subject of \cref{sec:simulations}. Finally, in \cref{sec:dataanalysis} we apply our model to EEG data and perform a prove of concept data analysis example where a chess board is interprated as a collection of binary $8\times 8$ matrices.




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\section{Notation}\label{sec:notation}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Let $\ten{A}\in\mathbb{R}^{q_1\times \ldots\times q_r}$ denotes an order\footnote{Also referred to as rank, therefore the variable name $r$, but this term is \emph{not} used as it leads to confusion with the concept of rank of a matrix.} $r$ tensor, where $r\in\mathbb{N}$ is the number of modes or axes (dimensions) of $\ten{A}$ and $\ten{A}_{i_1,...,i_r} \in \mathbb{R}$ is its $(i_1, \ldots, i_r)$th entry. For example, a $p \times q$ matrix $\mat{B}$ has two modes, the rows and columns. For matrices $\mat{B}_k\in\mathbb{R}^{p_k\times q_k}$, $k\in[r] = \{1, 2, \ldots, r\}$, the \emph{multilinear multiplication}, or \emph{Tucker operator} \parencite{Kolda2006}, is defined element wise as


Let $\ten{A}\in\mathbb{R}^{q_1\times \ldots\times q_r}$ denotes an order\footnote{Also referred to as rank, therefore the variable name $r$, but this term is \emph{not} used as it leads to confusion with the concept of rank of a matrix.} $r$ tensor, where $r\in\mathbb{N}$ is the number of modes or axes (dimensions) of $\ten{A}$ and $\ten{A}_{i_1,...,i_r} \in \mathbb{R}$ is its $(i_1, \ldots, i_r)$th entry. For example, a $p \times q$ matrix $\mat{B}$ has two modes, rows and columns. For matrices $\mat{B}_k\in\mathbb{R}^{p_k\times q_k}$, $k\in[r] = \{1, 2, \ldots, r\}$, the \emph{multilinear multiplication}, or \emph{Tucker operator} \parencite{Kolda2006}, is defined elementwise as


\begin{displaymath}


(\ten{A}\times\{\mat{B}_1, \ldots, \mat{B}_r\})_{j_1, \ldots, j_r} = \sum_{i_1, \ldots, i_r = 1}^{q_1, \ldots, q_r} \ten{A}_{i_1, \ldots, i_r}(\mat{B}_{1})_{j_1, i_1} \cdots (\mat{B}_{r})_{j_r, i_r}


\end{displaymath}



@ 334,25 +319,24 @@ which results in an order $r$ tensor of dimension $p_1\times ...\times p_k$. The


\begin{displaymath}


\ten{A}\times_k\mat{B}_k = \ten{A}\times\{\mat{I}_{q_1}, \ldots, \mat{I}_{q_{k1}}, \mat{B}_{k}, \mat{I}_{q_{k+1}}, \ldots, \mat{I}_{q_r}\}.


\end{displaymath}


The notation $\ten{A}\mlm_{k\in S}\mat{B}_k$ is short hand for the iterative application of the mode product for all indices in $S\subseteq[r]$. For example $\ten{A}\mlm_{k\in\{2, 5\}}\mat{B}_k = \ten{A}\times_2\mat{B}_2\times_5\mat{B}_5$. By only allowing $S$ to be a set, this notation is unambiguous because the mode product commutes for different modes; i.e., $\ten{A}\times_j\mat{B}_j\times_k\mat{B}_k = \ten{A}\times_k\mat{B}_k\times_j\mat{B}_j$ for $j\neq k$. For example, let $\mat{A}, \mat{B}_1, \mat{B}_2$ be matrices (of matching dimensions). The bilinear modeproduct and multilinear multiplication relate to the well known matrixmatrix multiplications as


The notation $\ten{A}\mlm_{k\in S}\mat{B}_k$ is shorthand for the iterative application of the mode product for all indices in $S\subseteq[r]$. For example $\ten{A}\mlm_{k\in\{2, 5\}}\mat{B}_k = \ten{A}\times_2\mat{B}_2\times_5\mat{B}_5$. By only allowing $S$ to be a set, this notation is unambiguous because the mode product commutes for different modes; i.e., $\ten{A}\times_j\mat{B}_j\times_k\mat{B}_k = \ten{A}\times_k\mat{B}_k\times_j\mat{B}_j$ for $j\neq k$. For example, let $\mat{A}, \mat{B}_1, \mat{B}_2$ be matrices (of matching dimensions). The bilinear modeproduct and multilinear multiplication relate to the wellknown matrixmatrix multiplications as


\begin{displaymath}


\mat{A}\times_1\mat{B}_1 = \mat{B}_1\mat{A}, \qquad


\mat{A}\times_2\mat{B}_2 = \mat{A}\t{\mat{B}_2}, \qquad


\mat{A}\mlm_{k = 1}^2\mat{B}_k = \mat{A}\mlm_{k \in \{1, 2\}}\mat{B}_k = \mat{B}_1\mat{A}\t{\mat{B}_2}.


\end{displaymath}




%Matrices and tensors can be \emph{vectorized} by the \emph{vectorization} operator $\vec$.


The operator $\vec$ maps an array to a vector. Specifically, $\vec(\mat{B})$ stands for the $pq \times 1$ vector of the $p \times q$ matrix $\mat{B}$ resulting from stacking the columns of $\mat{B}$ one after the other. For a tensor $\ten{A}$ of order $r$ and dimensions $q_1, \ldots, q_r$, $\vec(\ten{A})$ is the $q_1 q_2 \ldots q_r \times 1$ vector with the elements of $\ten{A}$ stacked one after the other in the order $r$ then $r1$, and so on. For example, if $\ten{A}$ is a 3dimensional array, $\vec(\ten{A})=\t{(\t{\vec(\ten{A}_{:,:,1})},\t{\vec(\ten{A}_{:,:,2})},\ldots,\t{\vec(\ten{A}_{:,:,q_3})})}$. We use the notation $\ten{A}\equiv \ten{B}$ for objects $\ten{A}, \ten{B}$ of any shape if and only if $\vec(\ten{A}) = \vec(\ten{B})$.




The \emph{inner product} between two tensors of the same order and dimensions is


\begin{displaymath}


\langle\ten{A}, \ten{B}\rangle = \sum_{i_1, \ldots, i_r} \ten{A}_{i_1, \ldots, i_r}\ten{B}_{i_1, \ldots, i_r}


\end{displaymath}


This leads to the definition of the \emph{Frobenius norm} for tensors, $\\ten{A}\_F = \sqrt{\langle\ten{A}, \ten{A}\rangle}$ and is the straightforward extension of the Frobenius norm for matrices and vectors. The \emph{outer product} between two tensors $\ten{A}$ of dimensions $q_1, \ldots, q_r$ and $\ten{B}$ of dimensions $p_1, \ldots, p_l$ is a tensor $\ten{A}\circ\ten{B}$ of order $r + l$ and dimensions $q_1, \ldots, q_r, p_1, \ldots, p_l$, such that


This leads to the definition of the \emph{Frobenius norm} for tensors, $\\ten{A}\_F = \sqrt{\langle\ten{A}, \ten{A}\rangle}$ and is the straightforward extension of the Frobenius norm for matrices and vectors. The \emph{outer product} between two tensors $\ten{A}$ of dimensions $q_1, \ldots, q_r$ and $\ten{B}$ of dimensions $p_1, \ldots, p_l$ is a tensor $\ten{A}\circ\ten{B}$ of order $r + l$ and dimensions $q_1, \ldots, q_r, p_1, \ldots, p_l$, such that


\begin{displaymath}


\ten{A}\circ\ten{B} \equiv (\vec\ten{A})\t{(\vec{\ten{B}})}.


\end{displaymath}


Let $\ten{K} : \mathbb{R}^{q_1, ..., q_{2 r}}\to\mathbb{R}^{q_1 q_{r + 1}, ..., q_r q_{2 r}}$ be defined element wise with indices $1\leq i_j + 1\leq q_j q_{r + j}$ for $j = 1, ..., r$ as


Let $\ten{K} : \mathbb{R}^{q_1, ..., q_{2 r}}\to\mathbb{R}^{q_1 q_{r + 1}, ..., q_r q_{2 r}}$ be defined elementwise with indices $1\leq i_j + 1\leq q_j q_{r + j}$ for $j = 1, ..., r$ as


\begin{align*}


\ten{K}(\ten{A})_{i_1 + 1, ..., i_r + 1} &= \ten{A}_{\lfloor i_1 / q_{r + 1}\rfloor + 1, ..., \lfloor i_r / q_{2 r} \rfloor + 1, (i_1\operatorname{mod}q_{r + 1}) + 1, ..., (i_r\operatorname{mod}q_{2 r}) + 1}


\end{align*}



@ 381,7 +365,7 @@ where the vectorized quantities $\vec{\ten{X}}\in\mathbb{R}^p$ and $\vec\ten{F}(


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\section{Problem Formulation}\label{sec:problemformulation}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Our goal is to infer the cumulative distribution function (cdf) $F$ of $Y\mid \ten{X}$, where $\ten{X}$ is assumed to admit $r$tensor structure of dimension $p_1\times ... \times p_r$ with continuous or discrete entries, and the response $Y$ is unconstrained. The predictor $\ten{X}$ is a complex object; to simplify the problem we assume there exists a tensorvalued function of lower dimension $\ten{R}:\ten{X}\mapsto \ten{R}(\ten{X})$ such that


Our goal is to infer the cumulative distribution function (cdf) $F$ of $Y\mid \ten{X}$, where $\ten{X}$ is assumed to admit $r$tensor structure of dimension $p_1\times ... \times p_r$ with continuous or discrete entries, and the response $Y$ is unconstrained. The predictor $\ten{X}$ is a complex object; to simplify the problem we assume there exists a tensor valued function of lower dimension $\ten{R}:\ten{X}\mapsto \ten{R}(\ten{X})$ such that


\begin{displaymath}


F(Y\mid \ten{X}) = F(Y\mid \ten{R}(\ten{X})).


\end{displaymath}



@ 395,19 +379,19 @@ To find such a reduction $\ten{R}$, we leverage the equivalence relation pointed


\end{equation}


According to \eqref{eq:inverseregressionsdr}, a \textit{sufficient statistic} $\ten{R}(\ten{X})$ for $Y$ in the inverse regression $\ten{X}\mid Y$, where $Y$ is considered as a parameter indexing the model, is also a \textit{sufficient reduction} for $\ten{X}$ in the forward regression $Y\mid\ten{X}$. %The equivalent inverse regression in \eqref{eq:inverseregressionsdr} provides exhaustive characterization of $\ten{R}(\ten{X})$.




The factorization theorem is the usual tool to identify sufficient statistics and requires a distributional model. In this paper, we assume the distribution of $\ten{X}\mid Y$ belongs to the \emph{quadratic exponential family} in order to (a) simplify modeling and (b) keep estimation feasible. We assume that $\ten{X}\mid Y$ is a full rank quadratic exponential family with density


The factorization theorem is the usual tool to identify sufficient statistics and requires a distributional model. In this paper, we assume the distribution of $\ten{X}\mid Y$ belongs to the \emph{quadratic exponential family} in order to (a) simplify modeling and (b) keep estimation feasible. We assume that $\ten{X}\mid Y$ is a full rank quadratic exponential family with density


\begin{align}


f_{\mat{\eta}_y}(\ten{X}\mid Y = y)


&= h(\ten{X})\exp(\t{\mat{\eta}_y}\mat{t}(\ten{X})  b(\mat{\eta}_y)) \nonumber \\


&= h(\ten{X})\exp(\langle \mat{t}_1(\ten{X}), \mat{\eta}_{1y} \rangle + \langle \mat{t}_2(\ten{X}), \mat{\eta}_{2y} \rangle  b(\mat{\eta}_{y})) \label{eq:quaddensity}


\end{align}


where $\mat{t}_1(\ten{X})=\vec \ten{X}$ and $\mat{t}_2(\ten{X})$ is linear in $\ten{X}\circ\ten{X}$. The dependence of $\ten{X}$ on $Y$ is fully captured in the natural parameter $\mat{\eta}_y$. The function $h$ is nonnegative realvalued and $b$ is assumed to be at least twice continuously differentiable and strictly convex. An important feature of the \emph{quadratic exponential family} is that the distribution of its members is fully characterized by their first two moments. Distributions within the quadratic exponential family include the \emph{tensor normal} (\cref{sec:tensornormalestimation}) and \emph{tensor Ising model} (\cref{sec:ising_estimation}, a generalization of the (inverse)\footnote{\todo{}} Ising model which is multivariate Bernoulli with up to second order interactions) and mixtures of these two.


where $\mat{t}_1(\ten{X})=\vec \ten{X}$ and $\mat{t}_2(\ten{X})$ is linear in $\ten{X}\circ\ten{X}$. The dependence of $\ten{X}$ on $Y$ is fully captured in the natural parameter $\mat{\eta}_y$. The function $h$ is nonnegative realvalued and $b$ is assumed to be at least twice continuously differentiable and strictly convex. An important feature of the \emph{quadratic exponential family} is that the distribution of its members is fully characterized by their first two moments. Distributions within the quadratic exponential family include the \emph{multilinear normal} (\cref{sec:tensornormalestimation}) and \emph{multilinear Ising model} (\cref{sec:ising_estimation}, a generalization of the (inverse) Ising model which is a multivariate Bernoulli with up to second order interactions) and mixtures of these two.




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\section{The Generalized MultiLinear Model}\label{sec:gmlmmodel}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




In model \eqref{eq:quaddensity}, the dependence of $\ten{X}$ and $Y$ is absorbed in $\mat{\eta}_y$, and $\mat{t}(\ten{X})$ is the minimal sufficient statistic for the \textit{pseudo}parameter\footnote{$\mat{\eta}_y$ is a function of the response $Y$, thus it is not a parameter in the formal statistical sense. It is considered as a parameter when using the equivalence in \eqref{eq:inverseregressionsdr} and view $Y$ as a parameter as a device to derive the sufficient reduction from the inverse regression.} $\mat{\eta}_y = (\mat{\eta}_{1y}, \mat{\eta}_{2y})$ with


In model \eqref{eq:quaddensity}, the dependence of $\ten{X}$ and $Y$ is absorbed in $\mat{\eta}_y$, and $\mat{t}(\ten{X})$ is the minimal sufficient statistic for the \textit{pseudo}parameter\footnote{$\mat{\eta}_y$ is a function of the response $Y$, so it is not a parameter in the formal statistical sense. We view it as a parameter in order to leverage \eqref{eq:inverseregressionsdr} and derive the sufficient reduction from the inverse regression.} $\mat{\eta}_y = (\mat{\eta}_{1y}, \mat{\eta}_{2y})$ with


\begin{align}\label{eq:tstat}


\mat{t}(\ten{X}) &= (\mat{t}_1(\ten{X}),\mat{t}_2(\ten{X}))=(\vec{\ten{X}}, \mat{T}_2\vech((\vec\ten{X})\t{(\vec\ten{X})})),


\end{align}



@ 421,7 +405,7 @@ where $\mat{D}_p$ is the \emph{duplication matrix} from \textcite[Ch.~11]{Abadir


\begin{equation}\label{eq:quadraticexpfam}


f_{\eta_y}(\ten{X}\mid Y = y) = h(\ten{X})\exp\left(\langle \vec \ten{X}, \mat{\eta}_{1y} \rangle + \langle \vec(\ten{X}\circ\ten{X}), \t{(\mat{T}_2\pinv{\mat{D}_p})}\mat{\eta}_{2y} \rangle  b(\mat{\eta}_y)\right)


\end{equation}


The exponential family in \eqref{eq:quadraticexpfam} is easily generalizable to any order. This, though, would result in the number of parameters becoming prohibitive to estimate, which is also the reason why we opted for the second order exponential family in our formulation.


The exponential family in \eqref{eq:quadraticexpfam} is easily generalizable to any order. This, though, would result in the number of parameters becoming prohibitive to estimate, which is also the reason why we opted for the secondorder exponential family in our formulation.




By the equivalence in \eqref{eq:inverseregressionsdr}, in order to find the sufficient reduction $\ten{R}(\ten{X})$ we need to infer $\mat{\eta}_{1y}$, and $\mat{\eta}_{2y}$. This is reminiscent of generalized linear modeling, which we extend to a multilinear formulation next.


Suppose $\ten{F}_y$ is a known mapping of $y$ with zero expectation $\E_Y\ten{F}_Y = 0$. We assume the dependence of $\ten{X}$ and $Y$ is reflected only in the first parameter and let



@ 429,7 +413,7 @@ Suppose $\ten{F}_y$ is a known mapping of $y$ with zero expectation $\E_Y\ten{F}


\mat{\eta}_{1y} &= \vec{\overline{\ten{\eta}}} + \mat{B}\vec\ten{F}_y, \label{eq:eta1manifold} \\


\mat{\eta}_{2} &= \t{(\pinv{(\mat{T}_2\pinv{\mat{D}_p})})}\vec(c\,\mat{\Omega}), \label{eq:eta2manifold}


\end{align}


where $\overline{\ten{\eta}}\in\mathbb{R}^{p_1\times\ldots\times p_r}$, $\mat{\Omega} \in \mathbb{R}^{p \times p}$ is positive definite with $p = \prod_{j = 1}^{r} p_j$, and $c\in\mathbb{R}$ is a known constant determined by the distribution to ease modeling. That is, we assume that only $\mat{\eta}_{1y}$ depends on $Y$ through $\mat{B}$. The second parameter $\mat{\eta}_2$ captures the second order interaction structure of $\ten{X}$, which we assume not to depend on the response $Y$. In order to relate individual modes of $\ten{X}$ to the response, allowing flexibility in modeling, we assume $\ten{F}_y$ takes values in $\mathbb{R}^{q_1\times ...\times q_r}$; that is, $\ten{F}_y$ is a tensor valued independent variable. This, in turn, leads to imposing corresponding tensor structure to the regression parameter $\mat{B}$. Thus, \eqref{eq:eta1manifold} becomes


where $\overline{\ten{\eta}}\in\mathbb{R}^{p_1\times\ldots\times p_r}$, $\mat{\Omega} \in \mathbb{R}^{p \times p}$ is positive definite with $p = \prod_{j = 1}^{r} p_j$, and $c\in\mathbb{R}$ is a known constant determined by the distribution to ease modeling. That is, we assume that only $\mat{\eta}_{1y}$ depends on $Y$ through $\mat{B}$. The second parameter $\mat{\eta}_2$ captures the second order interaction structure of $\ten{X}$, which we assume not to depend on the response $Y$. To relate individual modes of $\ten{X}$ to the response, allowing flexibility in modeling, we assume $\ten{F}_y$ takes values in $\mathbb{R}^{q_1\times ...\times q_r}$; that is, $\ten{F}_y$ is a tensor valued independent variable. This, in turn, leads to imposing a corresponding tensor structure to the regression parameter $\mat{B}$. Thus, \eqref{eq:eta1manifold} becomes


\begin{align}


\mat{\eta}_{1y} &=


\vec\biggl(\overline{\ten{\eta}} + \ten{F}_y\mlm_{j = 1}^{r}\mat{\beta}_j\biggr), \label{eq:eta1}



@ 478,11 +462,9 @@ The reduction in vectorized form is $\vec\ten{R}(\ten{X})=\t{\mat{B}}\vec(\ten{X


\begin{align*}


f_{\theta}(\mat{x}\mid Y = y)


&= h(\mat{x})\exp(\langle\mat{x}, \mat{\eta}_{1y}(\mat{\theta})\rangle + \langle\vec(\mat{x}\circ\mat{x}), \mat{\eta}_2(\mat{\theta})\rangle  b(\mat{\eta}_y(\mat{\theta}))) \\


% &= h(\mat{x})\exp(\t{\mat{\eta}_{1y}(\theta)}\mat{x} + \t{\vec(\mat{x}\circ\mat{x})}\mat{\eta}_2(\mat{\theta})  b(\mat{\eta}_y(\mat{\theta}))) \\


&= h(\mat{x})\exp(\t{(\overline{\mat{\eta}} + \mat{\beta}\mat{f}_y)}\mat{x} + c\,\t{\mat{x}}\mat{\Omega}\,\mat{x}  b(\mat{\eta}_y(\mat{\theta}))).


\end{align*}


using the relation of $\mat{\theta}$ to the natural parameters given by $\mat{\eta}_{1y}(\mat{\theta}) = \overline{\mat{\eta}} + \mat{\beta}\mat{f}_y$ and $\mat{\eta}_2(\theta) = c\,\mat{\Omega}$.


% where the number of unknown parameters is $p + \dim(\StiefelNonCompact{p}{q}) + \dim(\SymPosDefMat{p}) = p\frac{p + 2 q + 3}{2}$.


\end{example}




\begin{example}[Matrix valued $\mat{X}$ ($r = 2$)]



@ 507,7 +489,7 @@ The maximum likelihood estimate of $\mat{\theta}_0$ is the solution to the optim


\end{equation}


with $\hat{\mat{\theta}}_n = (\vec\widehat{\overline{\ten{\eta}}}, \vec\widehat{\mat{B}}, \vech\widetilde{\mat{\Omega}})$ where $\widehat{\mat{B}} = \bigkron_{k = r}^{1}\widehat{\mat{\beta}}_k$ and $\widehat{\mat{\Omega}} = \bigkron_{k = r}^{1}\widehat{\mat{\Omega}}_k$.




A straightforward and general method for parameter estimation is \emph{gradient descent}. To apply gradient based optimization, we compute the gradients of $l_n$ in \cref{thm:grad}.


A straightforward and general method for parameter estimation is \emph{gradient descent}. To apply gradientbased optimization, we compute the gradients of $l_n$ in \cref{thm:grad}.




\begin{theorem}\label{thm:grad}


For $n$ i.i.d. observations $(\ten{X}_i, y_i), i = 1, ..., n$ the loglikelihood is of the form in \eqref{eq:loglikelihood} with $\mat{\theta}$ being the collection of all GMLM parameters $\overline{\ten{\eta}}$, ${\mat{B}} = \bigkron_{k = r}^{1}{\mat{\beta}}_k$ and ${\mat{\Omega}} = \bigkron_{k = r}^{1}{\mat{\Omega}}_k$ for $k = 1, ..., r$. Let $\ten{G}_2(\mat{\eta}_y)$ be a tensor of dimensions $p_1, \ldots, p_r$ such that



@ 524,42 +506,31 @@ A straightforward and general method for parameter estimation is \emph{gradient


If $\mat{T}_2$ is the identity matrix $\mat{I}_{p(p + 1) / 2}$, then $\ten{G}_2(\mat{\eta}_y) = \ten{g}_2(\mat{\eta}_y)$.


\end{theorem}




Although the general case of any GMLM model can be fitted via gradient descent using \cref{thm:grad}, this may be very inefficient. In \cref{thm:grad}, $\mat{T}_2$ can be used to introduce flexible second moment structures. For example, it allows modeling effects differently for predictor components, as described in \cref{sec:ising_estimation} after Eqn. \eqref{eq:isingcondprob}. In the remainder, we focus on $\mat{T}_2$'s that are identity matrices. This approach simplifies the estimation algorithm and the speed of the numerical calculation in the case of tensor normal predictors. % In the case of the tensor normal distribution,


An iterative cyclic updating scheme is derived in \cref{sec:tensornormalestimation}, which has much faster convergence, is stable and does not require hyperparameters, as will be discussed later. On the other hand, the Ising model does not allow such a scheme. There we need to use a gradientbased method, which is the subject of \cref{sec:ising_estimation}.


Although the general case of any GMLM model can be fitted via gradient descent using \cref{thm:grad}, this may be very inefficient. In \cref{thm:grad}, $\mat{T}_2$ can be used to introduce flexible second moment structures. For example, it allows modeling effects differently for predictor components, as described in \cref{sec:ising_estimation} after Eqn. \eqref{eq:isingcondprob}. In the remainder, we focus on $\mat{T}_2$'s that are identity matrices. This approach simplifies the estimation algorithm and the speed of the numerical calculation in the case of multilinear normal predictors. An iterative cyclic updating scheme is derived in \cref{sec:tensornormalestimation}, which has much faster convergence, is stable, and does not require hyperparameters, as will be discussed later. On the other hand, the Ising model does not allow such a scheme. There we need to use a gradientbased method, which is the subject of \cref{sec:ising_estimation}.




%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\subsection{Tensor Normal}\label{sec:tensornormalestimation}


\subsection{MultiLinear Normal}\label{sec:tensornormalestimation}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The tensor normal, also known as the \emph{multilinear normal}, is the extension of the matrix normal to tensorvalued random variables and a member of the quadratic exponential family \eqref{eq:quadraticexpfam} under \eqref{eq:eta2}. \textcite{Dawid1981,Arnold1981} introduced the term matrix normal and, in particular, \textcite{Arnold1981} provided several theoretical results, such as its density, moments and conditional distributions of its components. The matrix normal distribution is a bilinear normal distribution; a distribution of a twoway array, each component


representing a vector of observations \parencite{OhlsonEtAl2013}. \textcite{KolloVonRosen2005,Hoff2011,OhlsonEtAl2013} presented the extension of the bilinear to the multilinear normal distribution, what we call tensor normal, using a parallel extension of bilinear matrices to multilinear tensors \parencite{Comon2009}.


The \emph{multilinear normal} is the extension of the matrix normal to tensor valued random variables and a member of the quadratic exponential family \eqref{eq:quadraticexpfam} under \eqref{eq:eta2}. \textcite{Dawid1981,Arnold1981} introduced the term matrix normal and, in particular, \textcite{Arnold1981} provided several theoretical results, such as its density, moments and conditional distributions of its components. The matrix normal distribution is a bilinear normal distribution; a distribution of a twoway array, each component


representing a vector of observations \parencite{OhlsonEtAl2013}. \textcite{KolloVonRosen2005,Hoff2011,OhlsonEtAl2013} presented the extension of the bilinear to the multilinear normal distribution, using a parallel extension of bilinear matrices to multilinear tensors \parencite{Comon2009}.




The defining feature of the matrix normal distribution, and its tensor extension, is the Kronecker product structure of its covariance. This formulation, where the covariates are multivariate normal with multiway covariance structure modeled as a Kronecker product of matrices of much lower dimension, aims to overcome the significant modeling and computational challenges arising from the high computational complexity of manipulating tensor representations \parencite[see, e.g.,][]{HillarLim2013,WangEtAl2022}.


The defining feature of the matrix normal distribution, and its multilinear extension, is the Kronecker product structure of its covariance. This formulation, where the covariates are multivariate normal with multiway covariance structure modeled as a Kronecker product of matrices of much lower dimension, aims to overcome the significant modeling and computational challenges arising from the high computational complexity of manipulating tensor representations \parencite[see, e.g.,][]{HillarLim2013,WangEtAl2022}.




Multilinear tensor normal %Kroneckerseparable covariance


models have been used in various applications, including


medical imaging \parencite{BasserPajevic2007,DrydenEtAl2009 