4 Wishart Random Matrices

Definition. Wishart matrix

and denote by the data matrix. (Obverse times for 观察 维参数). is an estimator for .

4.1. Concentration for the largest eigenvalue of Wishart matrices

Remark
  • 数据集矩阵最大的奇异值是一个满足 L 条件的范数, 所以它集中在它的期望值附近.
  • 接着我们还需要思考这个期望值到底是多少.

We want to understand the spectral properties of , i.e. the eigenvalues of , or the singular values of . Let us first restrict to the largest singular value:
It is Lipschitz: for ,

Thus we can apply our Gaussian concentration inequality for Lipschitz functions. Note that corresponds to .

Theorem. 4.1. 标准数据一定可以中心化

Suppose is a standard Gaussian random matrix, i.e. . Then

Question 中心值是多少?

Step 1 Simpler Question
Note that we can write
But there are infinitely many terms. Too hard to control! So we go to finitely many conditions by approximating all and by elements from -nets.
It's a bit easier to do this for balls than for spheres, so let us write
and let now be an -net for
i.e. such that
Step 2
We want to be as small as possible; it is easy to see that there exists an -net with
Construct an -net by choosing a new point such that
Thus all are disjoint and
你会注意到由于体积大小的限制, 只能是有限的. 这确保了 是一个 -net,因为如果存在一个点 使得对于所有 都有 ,那么这个 就可以在我们的构造中被选作 . 于是我们便构造出了满足条件的 -net.

Step 3
Set then .

Then we can choose an -net for and an -net for with and .

Let and be the maximizer for (note that finite dimensions & compact unit ball the supremum is indeed a maximum);
then there exist and such that
Then

Thus we have

Hence we need now the concentration inequality only for the finitely many summands on the right-hand side. Note that

thus
or now with replaced with for ,

This then yields

Put , then and thus

it is a rough estimate.

We will not pursue this any further, but instead we will now look at the collection of all singular values of or of all eigenvalues of ; i.e., we want now to understand the asymptotics of the histograms of the eigenvalues.

4.2. Eigenvalue distribution of Wishart matrices & Marchenko-Pastur law

Set are independent vectors with & , then
What can we say about the eigenvalues of ?

So first let us try to shrink it to Lipschitz condition.

Let be the eigenvalues of a symmetric matrix .

Then one has, as for the maximal eigenvalue,




Thus
i.e. the maps for are Lipschitz and thus also the map
is Lipschitz.

However, since is our matrix with independent Gaussian entries, we are interested in the mapping

For this, the Lipschitz constant is modified as follows:

Note that the estimate is not helpful, since we know that . (Reason) is not a useful bound.

But let us have a closer look on this, as it also reveals the difference between classical and modern regimes.

  • In the classical regime,
    which would give good concentration.
  • But in the modern regime,
    which does not give good concentration.
  • So let's keep the operator norm in in Section 4.1; for this we already know that we have good concentration around and thus with high probability
    By Theorem 3.2, this then gives concentration of around its expected value with
    This means in the modern regime, the eigenvalue distribution of concentrates on its average (The scaling factor in ensures that we have a limit for ), then we can get:

Theorem. 4.2 Marchenko-Pastur Law [1]

Let & . If . (观察次数 数据维数)
Then the histogram of the eigenvalues of converges to the Marchenko-Pastur density
where & .

Remark.

  • 这里的归一化常数不是伽马分布那种强行归一化计算出来的, 而是通过自洽方程自然推导出来的.
  • Note that the statement is of the form
    where 𝟙 and are the eigenvalues of .
    Proving (1) directly is not so clear, but can be achieved by proving analogous statements for other classes of functions. Instead of proving (1) for
  • (i) all 𝟙 for all , (直接看落在 区间的特征值占比)
  • (ii) all moments for all , (证明特征值的各阶矩( 的平均值)都对得上)
  • (iii) all resolvents for all . ( denotes the complex upper half plane.) (证明预解式 的平均值对得上)
    By concentration, it suffices to prove in each case the version for the average, i.e. one has to prove
    Note for this that and (if we restrict them to a compact interval) are Lipschitz functions.

Proof. (by Self-consistent equation)
Step 1

Definition. 4.4 Stieltjes Transform
  • For Wishart matrices
  • For the Marchenko-Pastur distribution

So what we have to prove is the convergence of the Stieltjes transforms:
And take some prepares for it:

Lemma. 4.3

Let be a symmetric matrix with eigenvalues . Then, for any , we have
where is the normalized trace on .

Lemma. 4.6 很重要的一个技巧

Let be a standard Gaussian vector in and a deterministic or random matrix independent from . Then

Step 2
By Resolvent we can get

In fact, can be replaced by where is arbitrary. Thus we have

So left and right side becomes:

Finally we can calculate

But it turn out without density, then we find a way to calculate out the density of MP distribution:

Step 3

Lemma 4.7 (Stieltjes Inversion Formula).

Let be a continuous probability density on . Then its Stieltjes transform
has a continuous extension to and

Apply this to
then we get the form of the Marchenko-Pastur distribution as claimed in Theorem 4.2


  1. VA Marchenko and LA Pastur, The distribution of eigenvalues in certain sets of random matrices math, Math. USSR-Sbornik 1 (1967), 457–483.↩︎