An Alternative Representation of $R^2$

Problem Statement

This question was originally posted on CrossValidated. In more compact notations, it asks:

Given a linear model $$ \begin{align*} Y := \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} e & x_1 & x_2 & \cdots & x_p \end{bmatrix}\beta + \varepsilon, \end{align*} $$ show that the well-known coefficient of determination has the following representation: $$ \begin{align*} R^2 = r_{YX}^\top r_{XX}^{-1}r_{YX}, \end{align*} $$ where $$ \begin{align*} r_{YX} := \begin{bmatrix} r_{Y, x_1} \\ \vdots \\ r_{Y, x_p} \end{bmatrix}, \quad r_{XX} := \begin{bmatrix} r_{x_i, x_j} \end{bmatrix}. \end{align*} $$

In other words, $R^2$ is a function of pair-wise Pearson correlations, which is surprisingly elegant.

The Proof

To begin with, rewrite the linear model as $$ \begin{align*} Y := \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} e & x_1 & x_2 & \cdots & x_p \end{bmatrix}\beta + \varepsilon := \begin{bmatrix} e & \tilde{X}\end{bmatrix}\begin{bmatrix} \beta_0 \\ \gamma \end{bmatrix} + \varepsilon, \tag{1} \end{align*} $$ where $x_j = \begin{bmatrix} x_{1j} & \cdots & x_{nj}\end{bmatrix}^\top$, $j = 1, \ldots, p$, $\beta_0 \in \mathbb{R}$ is the coefficient corresponding to $e$, $\gamma \in \mathbb{R}^p$ is the coefficient corresponding to $\tilde{X} = \begin{bmatrix} x_1 & \cdots & x_p\end{bmatrix} \in \mathbb{R}^{n \times p}$.

The proposed form of $R^2$ inspires me considering the following representation of a regression model (known as Standardized Multiple Regression Model, see Applied Linear Statistical Models by Kutner et al., Section 7.5) – this is how I managed to bring all the Pearson correlations to the party: $$ \begin{align*} Y^* = X^*\beta^* + \varepsilon^*, \tag{2} \end{align*} $$

where $$ \begin{align*} Y^* = \begin{bmatrix} \frac{y_1 - \bar{y}}{\sqrt{n - 1}s_Y} \\ \vdots \\ \frac{y_n - \bar{y}}{\sqrt{n - 1}s_Y} \end{bmatrix} = (\sqrt{n - 1}s_Y)^{-1}(I - n^{-1}ee^\top)Y \in \mathbb{R}^{n \times 1}, \tag{3.1} \end{align*} $$

$$ \begin{align*} X^* = \begin{bmatrix} \frac{x_1 - \bar{x}_1e}{\sqrt{n - 1}s_1} & \cdots & \frac{x_p - \bar{x}_pe}{\sqrt{n - 1}s_p} \end{bmatrix} = (I - n^{-1}ee^\top)\tilde{X}(\sqrt{n - 1}\Lambda)^{-1} \in \mathbb{R}^{n \times p}, \tag{3.2} \end{align*} $$

$$ \begin{align*} \bar{y} = \frac{1}{n}\sum_{i = 1}^ny_i, \quad s_Y^2 = \frac{1}{n - 1}\sum_{i = 1}^n(y_i - \bar{y})^2, \tag{3.3} \end{align*} $$

$$ \begin{align*} \bar{x_j} = n^{-1}\sum_{i = 1}^n x_{ij}, \quad s_j^2 = \frac{1}{n - 1}\sum_{i = 1}^n(x_{ij} - \bar{x_j})^2, j = 1, \ldots, p, \tag{3.4} \end{align*} $$

$$ \begin{align*} \Lambda = \operatorname{diag}(s_1, \ldots, s_p) \in \mathbb{R}^{p \times p}. \tag{3.5} \end{align*} $$

In words, $Y^*$ and $X^*$ are simply standardized versions of original inputs $Y$ and $\tilde{X}$.

With the above notations, on one hand, it is easy to verify that $$ \begin{align*} r_{YX} := \begin{bmatrix} r_{Y, x_1} \\ \vdots \\ r_{Y, x_p} \end{bmatrix} = X^{*\top}Y^*, \quad r_{XX} := \begin{bmatrix} r_{x_i, x_j} \end{bmatrix} = X^{*\top}X^*, \tag{4} \end{align*} $$ hence $$ \begin{align*} \hat{\beta^*} = (X^{*\top}X^*)^{-1}X^{*\top}Y^* = r_{XX}^{-1}r_{YX}. \end{align*} $$

On the other hand, one can express the OLS estimate of $\beta$ in $(1)$ in terms of the OLS estimate of $\beta^*$ in $(2)$ as follows (this is easy to verify by substituting transformation definitions $(3.1)$ – $(3.5)$ into $(2)$ then compare it with $(1)$, see also the aforementioned reference for derivation details): $$ \begin{align*} & \hat{\gamma} = s_Y\Lambda^{-1}\hat{\beta^*} = s_Y\Lambda^{-1}r_{XX}^{-1}r_{YX}, \tag{5.1} \\ & \hat{\beta}_0 = \bar{Y} - \begin{bmatrix}\bar{x}_1 & \cdots & \bar{x}_p \end{bmatrix}\hat{\gamma} = n^{-1}e^\top Y - n^{-1}e^\top \tilde{X}\hat{\gamma}. \tag{5.2} \end{align*} $$

It then follows by $(5.1), (5.2), (3.1), (3.2)$ that $$ \begin{align*} Y - \hat{Y} = Y - e\hat{\beta_0} - \tilde{X}\hat{\gamma} = (I - n^{-1}ee^\top)Y - (I - n^{-1}ee^\top)\tilde{X}\hat{\gamma} = s_Y\sqrt{n - 1}Y^* - s_Y\sqrt{n - 1}X^*r_{XX}^{-1}r_{YX}, \end{align*} $$ whence by $(4)$ we conclude $$ \begin{align*} & (Y - \hat{Y})^\top(Y - \hat{Y}) \\ =& (n - 1)s_Y^2 Y^{*\top}Y^* - 2(n - 1)s_Y^2Y^{*\top}X^{*}r_{XX}^{-1}r_{YX} + (n - 1)s_Y^2r_{YX}^\top r_{XX}^{-1}X^{*\top}X^*r_{XX}^{-1}r_{YX} \\ =& (n - 1)s_Y^2(Y^{*\top}Y^* - r_{YX}^\top r_{XX}^{-1}r_{YX}). \tag{6} \end{align*} $$ It then follows by $(6)$ and the definition of $R^2$ that $$ \begin{align*} & R^2 = 1 - \frac{(Y - \hat{Y})^\top(Y - \hat{Y})}{Y^\top (I - n^{-1}ee^\top)Y} = 1 - \frac{(n - 1)s_Y^2(Y^{*\top}Y^* - r_{YX}^\top r_{XX}^{-1}r_{YX})}{(n - 1)s_Y^2 Y^{*\top}Y^*} = r_{YX}^\top r_{XX}^{-1}r_{YX}. \end{align*} $$ In the last equality, we used the identity $Y^{*\top}Y^* = 1$ (which follows from $(3.1)$ and $(3.3)$). This completes the proof.

More Details

Below is a more detailed derivation of $(5.1)$ and $(5.2)$.

By model $(1)$, we have $$ \begin{align*} \begin{bmatrix} \hat{\beta}_0 \\ \hat{\gamma} \end{bmatrix} = \begin{bmatrix} n & e^\top\tilde{X} \\ \tilde{X}^\top e & \tilde{X}^\top\tilde{X} \end{bmatrix}^{-1} \begin{bmatrix} e^\top Y \\ \tilde{X}^\top Y \end{bmatrix}. \tag{A.1} \end{align*} $$ Denote the matrix $\tilde{X}^\top(I - n^{-1}ee^\top)\tilde{X}$ by $C$, which by $(3.2)$ is equal to $(n - 1)\Lambda X^{*\top}X^*\Lambda$. It follows by the block matrix inversion formula that $$ \begin{align*} \begin{bmatrix} n & e^\top\tilde{X} \\ \tilde{X}^\top e & \tilde{X}^\top\tilde{X} \end{bmatrix}^{-1} = \begin{bmatrix} n^{-1} + n^{-2}e^\top\tilde{X}C^{-1}\tilde{X}^\top e & -n^{-1}e^\top\tilde{X}C^{-1} \\ -n^{-1}C^{-1}\tilde{X}^\top e & C^{-1} \end{bmatrix}. \tag{A.2} \end{align*} $$ Substituting $(A.2)$ into $(A.1)$ and using the idempotency of the matrix $I - n^{-1}ee^\top$ (also plugging definitions $(3.1)$ and $(3.2)$) then give $$ \begin{align*} & \hat{\gamma} = -n^{-1}C^{-1}\tilde{X}^\top ee^\top Y + C^{-1}\tilde{X}^\top Y \\ =& C^{-1}\tilde{X}^\top(I - n^{-1}ee^\top)Y \\ =& C^{-1}((I - n^{-1}ee^\top)\tilde{X})^\top (I - n^{-1}ee^\top)Y \\ =& (n - 1)^{-1}\Lambda^{-1}(X^{*\top}X^*)^{-1}\Lambda^{-1}(\sqrt{n - 1}X^*\Lambda)^\top \sqrt{n - 1}s_YY^* \\ =& s_Y\Lambda^{-1}(X^{*\top}X^*)^{-1}X^{*\top}Y^* \\ =& s_Y\Lambda^{-1}\hat{\beta^*}. \\[1em] & \hat{\beta}_0 = n^{-1}e^\top Y + n^{-2}e^\top \tilde{X}C^{-1}\tilde{X}^\top ee^\top Y - n^{-1}e^\top\tilde{X}C^{-1}\tilde{X}^\top Y \\ =& n^{-1}e^\top Y - n^{-1}e^\top\tilde{X}(-n^{-1}C^{-1}\tilde{X}^\top ee^\top Y + C^{-1}\tilde{X}^\top Y) \\ =& n^{-1}e^\top Y - n^{-1}e^\top\tilde{X}\hat{\gamma}, \end{align*} $$ which are exactly $(5.1)$ and $(5.2)$.