<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Math]]></title><description><![CDATA[Obsidian digital garden]]></description><link>http://github.com/dylang/node-rss</link><image><url>site-lib/media/favicon.jpg</url><title>Math</title><link></link></image><generator>Webpage HTML Export plugin for Obsidian</generator><lastBuildDate>Sat, 14 Mar 2026 06:39:28 GMT</lastBuildDate><atom:link href="site-lib/rss.xml" rel="self" type="application/rss+xml"/><pubDate>Sat, 14 Mar 2026 06:39:27 GMT</pubDate><ttl>60</ttl><dc:creator></dc:creator><item><title><![CDATA[4 Wishart Random Matrices]]></title><description><![CDATA[Definition. Wishart matrix and denote by the data matrix. (Obverse times for 观察 次 维参数). is an estimator for .
<img src="attachments/4-wishart-random-matrices-1772624976366.webp" target="_self" style="width: 494px; max-width: 100%;">Remark 数据集矩阵最大的奇异值是一个满足 L 条件的范数, 所以它集中在它的期望值附近. 接着我们还需要思考这个期望值到底是多少. We want to understand the spectral properties of , i.e. the eigenvalues of , or the singular values of . Let us first restrict to the largest singular value:
It is Lipschitz: for ,
Thus we can apply our Gaussian concentration inequality for Lipschitz functions. Note that corresponds to .Theorem. 4.1. 标准数据一定可以中心化
Suppose is a standard Gaussian random matrix, i.e. . Then Proof.<br>
By <a data-tooltip-position="top" aria-label="3 Concentration of Gaussian random vectors for non-linear Lipschitz function &gt; ^fx0z71" data-href="3 Concentration of Gaussian random vectors for non-linear Lipschitz function#^fx0z71" href="随机矩阵与机器学习/3-concentration-of-gaussian-random-vectors-for-non-linear-lipschitz-function.html" class="internal-link" target="_self" rel="noopener nofollow">Theorem. 3.2 Gaussian concentration for Lipschitz functions</a>Question 中心值是多少? Step 1 Simpler Question
Note that we can write
But there are infinitely many terms. Too hard to control! So we go to finitely many conditions by approximating all and by elements from -nets.
It's a bit easier to do this for balls than for spheres, so let us write
and let now be an -net for
i.e. such that
Step 2
We want to be as small as possible; it is easy to see that there exists an -net with
Construct an -net by choosing a new point such that
Thus all are disjoint and
你会注意到由于体积大小的限制, 只能是有限的. 这确保了 是一个 -net，因为如果存在一个点 使得对于所有 都有 ，那么这个 就可以在我们的构造中被选作 . 于是我们便构造出了满足条件的 -net.Step 3
Set then . Then we can choose an -net for and an -net for with and . Let and be the maximizer for (note that finite dimensions &amp; compact unit ball the supremum is indeed a maximum);
then there exist and such that
ThenThus we have
Hence we need now the concentration inequality only for the finitely many summands on the right-hand side. Note thatthus
or now with replaced with for ,This then yields Put , then and thus<br>
<img src="attachments/4-wishart-random-matrices-1773058455477.webp" target="_self" style="width: 445px; max-width: 100%;">
it is a rough estimate.We will not pursue this any further, but instead we will now look at the collection of all singular values of or of all eigenvalues of ; i.e., we want now to understand the asymptotics of the histograms of the eigenvalues.Set are independent vectors with &amp; , then
What can we say about the eigenvalues of ?So first let us try to shrink it to Lipschitz condition.Let be the eigenvalues of a symmetric matrix .<br>
<img src="attachments/4-wishart-random-matrices-1773059107308.webp" target="_self" style="width: 456px; max-width: 100%;">
Then one has, as for the maximal eigenvalue, Thus
i.e. the maps for are Lipschitz and thus also the map
is Lipschitz. However, since is our matrix with independent Gaussian entries, we are interested in the mappingFor this, the Lipschitz constant is modified as follows:<br>Note that the estimate is not helpful, since we know that . <a data-tooltip-position="top" aria-label="2 Gaussian random vectors and linear concentration of Chebyshev and Bernstein type &gt; ^lrt756" data-href="2 Gaussian random vectors and linear concentration of Chebyshev and Bernstein type#^lrt756" href="随机矩阵与机器学习/2-gaussian-random-vectors-and-linear-concentration-of-chebyshev-and-bernstein-type.html" class="internal-link" target="_self" rel="noopener nofollow">(Reason)</a> is not a useful bound.But let us have a closer look on this, as it also reveals the difference between classical and modern regimes.
In the classical regime,
which would give good concentration.
But in the modern regime,
which does not give good concentration. So let's keep the operator norm in in Section 4.1; for this we already know that we have good concentration around and thus with high probability<br>
By <a data-tooltip-position="top" aria-label="3 Concentration of Gaussian random vectors for non-linear Lipschitz function &gt; ^fx0z71" data-href="3 Concentration of Gaussian random vectors for non-linear Lipschitz function#^fx0z71" href="随机矩阵与机器学习/3-concentration-of-gaussian-random-vectors-for-non-linear-lipschitz-function.html" class="internal-link" target="_self" rel="noopener nofollow">Theorem 3.2</a>, this then gives concentration of around its expected value with
This means in the modern regime, the eigenvalue distribution of concentrates on its average (The scaling factor in ensures that we have a limit for ), then we can get:
<br><img src="attachments/4-wishart-random-matrices-1772177853460.webp" target="_self" style="width: 583px; max-width: 100%;"><br>Theorem. 4.2 Marchenko-Pastur Law <a data-footref="mp" href="#fn-1-48672ee7560f97e2" class="footnote-link" target="_self" rel="noopener nofollow">[1]</a>
Let &amp; . If . (观察次数 数据维数)
Then the histogram of the eigenvalues of converges to the Marchenko-Pastur density
where &amp; .
Remark. 这里的归一化常数不是伽马分布那种强行归一化计算出来的, 而是通过自洽方程自然推导出来的.
Note that the statement is of the form
where and are the eigenvalues of .
Proving (1) directly is not so clear, but can be achieved by proving analogous statements for other classes of functions. Instead of proving (1) for
(i) all for all , (直接看落在 区间的特征值占比)
(ii) all moments for all , (证明特征值的各阶矩（ 的平均值）都对得上)
(iii) all resolvents for all . ( denotes the complex upper half plane.) (证明预解式 的平均值对得上)
By concentration, it suffices to prove in each case the version for the average, i.e. one has to prove
Note for this that and (if we restrict them to a compact interval) are Lipschitz functions.
Proof. (by Self-consistent equation)
Step 1Definition. 4.4 Stieltjes Transform For Wishart matrices For the Marchenko-Pastur distribution So what we have to prove is the convergence of the Stieltjes transforms:
And take some prepares for it:Lemma. 4.3 Let be a symmetric matrix with eigenvalues . Then, for any , we have
where is the normalized trace on .
<br>By <a data-tooltip-position="top" aria-label="Char2 Wigner Ensemble &amp; SemiCircle Law &gt; ^v7zq7x" data-href="Char2 Wigner Ensemble &amp; SemiCircle Law#^v7zq7x" href="随机矩阵第一节课/char2-wigner-ensemble-&amp;-semicircle-law.html" class="internal-link" target="_self" rel="noopener nofollow">The eigenvalues of Resolvent</a>Lemma. 4.6 很重要的一个技巧
Let be a standard Gaussian vector in and a deterministic or random matrix independent from . Then Proof.
Let . We have Step 2<br>
By <a data-tooltip-position="top" aria-label="Char2 Wigner Ensemble &amp; SemiCircle Law &gt; ^yxh4qp" data-href="Char2 Wigner Ensemble &amp; SemiCircle Law#^yxh4qp" href="随机矩阵第一节课/char2-wigner-ensemble-&amp;-semicircle-law.html" class="internal-link" target="_self" rel="noopener nofollow">Resolvent</a> we can get<br>By <a data-tooltip-position="top" aria-label="Char1 矩阵基础知识 &gt; ^rzmtuw" data-href="Char1 矩阵基础知识#^rzmtuw" href="随机矩阵第一节课/char1-矩阵基础知识.html" class="internal-link" target="_self" rel="noopener nofollow">Sherman-Morrison Formula</a> we can getIn fact, can be replaced by where is arbitrary. Thus we haveSo left and right side becomes:Finally we can calculate But it turn out without density, then we find a way to calculate out the density of MP distribution:Step 3Lemma 4.7 (Stieltjes Inversion Formula). Let be a continuous probability density on . Then its Stieltjes transform
has a continuous extension to and Proof.
For all , we have
and thus Apply this to
then we get the form of the Marchenko-Pastur distribution as claimed in Theorem 4.2<br><img src="attachments/4-wishart-random-matrices-1773209233653.webp" target="_self" style="width: 379px; max-width: 100%;"> <br>VA Marchenko and LA Pastur, The distribution of eigenvalues in certain sets of random matrices math, Math. USSR-Sbornik 1 (1967), 457–483.<a href="#fnref-1-48672ee7560f97e2" class="footnote-backref footnote-link" target="_self" rel="noopener nofollow">↩︎</a>
]]></description><link>4-wishart-random-matrices.html</link><guid isPermaLink="false">随机矩阵与机器学习/4 Wishart Random Matrices.md</guid><pubDate>Fri, 13 Mar 2026 14:38:17 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item></channel></rss>