时变函数

时变函数 | Mathful Review of Linear Mode

市面上讲线性模型/回归的书,大多从motivation到结构,一股浓郁的计量经济味儿扑面而来——我、不、喜、欢。

我决定自己写个线性模型的review,希望过年前写得完。

本文预备知识:(高等)数理统计、高等代数(矩阵论)

◆♡◆

上课的时候,我总觉得线性模型是高等代数版的数理统计,实则理解上出现了偏差——与其说它是XX版的XXXX,毋宁说,它之于数理统计,相当于高等代数之于解析几何。

广泛些,将统计系的三门基础课——线性模型、概率论、数理统计——与数学的三大分支对应,应是「线模同代数,概率如分析,数统似几何」

如果让我写本高代的书,我一定会最先从矩阵讲起。

同理,如果写一个线性模型的review,我必然会从随机矩阵(向量)讲起——从高斯马尔科夫定理讲起,到底什么玩意儿……

好了,我开始讲(zhuangbi)了。


1、随机向量与随机矩阵

1.1 引言

我们考虑多元回归方程组:

\begin{equation} \left\{              \begin{array}{lr}             {\rm y}_1=\beta_0+\beta_1{\rm x}_{11}+\cdot\cdot\cdot+\beta_p{\rm x}_{1p}+\epsilon_1 \\             {\rm y}_2=\beta_0+\beta_1{\rm x}_{21}+\cdot\cdot\cdot+\beta_p{\rm x}_{2p}+\epsilon_2\\ \vdots\\             {\rm y}_n=\beta_0+\beta_1{\rm x}_{n1}+\cdot\cdot\cdot+\beta_p{\rm x}_{np}+\epsilon_n                \end{array} \right. \end{equation}

这个方程组的通俗(不严谨)解释是:我们手上总共有n组样本,每组样本假定具有p个与响应变量相关的自变量,和一个常数项,外带一丢丢误差。

在高等代数中,多元线性方程组可以写成矩阵形式,同样的,多元回归方程组也可以写成矩阵形式:

\textbf{y}=\textbf{x}\pmb{\beta}+\pmb{\epsilon}

其中, \textbf{y}=\left( \begin{matrix}\rm y_1\\\rm y_2\\\vdots\\{\rm y}_n\end{matrix}\right),  \textbf{x}=\left( \begin{matrix}1&\rm x_{11}&\cdots &{\rm x}_{1p} \\1&\rm x_{21}&\cdots &x_{2p}\\\vdots&\vdots &\ddots&\vdots\\1&{\rm x}_{n1}&\cdots &{\rm x}_{np}\end{matrix}\right), \pmb{\beta}=\left( \begin{matrix}\beta_0 \\\beta_1\\\vdots\\\beta_p\end{matrix}\right),\pmb{\epsilon}=\left( \begin{matrix}\epsilon_1\\\epsilon_2\\\vdots\\\epsilon_n\end{matrix}\right)

有时,我们用 \textbf x_i来表示矩阵 \textbf{x} 的第i个行向量 \begin{matrix}(1&{\rm x}_{11}&\cdots&{\rm x}_{1p})\end{matrix} . 故,线性回归方程(组)又可表为:

{\rm y}_i=\textbf x_{i}\pmb{\beta}+\epsilon_i

1.2 均值、方差、协方差和相关系数

假设我们熟知数理统计中的那一套对随机变量的均值、方法、协方差和相关系数的定义。

1.2.1 均值向量

设 \textbf y 为 n\times1 随机向量,i.e. \textbf{y}=\left( \begin{matrix}\rm y_1\\\rm y_2\\\vdots\\{\rm y}_n\end{matrix}\right) ,则

\mathbb{E}({\textbf y})=\mathbb{E}\left( \begin{matrix}\rm y_1\\\rm y_2\\\vdots\\{\rm y}_n\end{matrix}\right)=\left( \begin{matrix}\mathbb{E}(\rm y_1)\\\mathbb{E}(\rm y_2)\\\vdots\\\mathbb{E}({\rm y}_n)\end{matrix}\right)=\left( \begin{matrix}\mu_1\\\mu_2\\\vdots\\\mu_n\end{matrix}\right)=\pmb{\mu}

由向量的加法和期望的性质可知,对 n\times1 随机向量 \textbf{a},\textbf{b} 有

\mathbb{E}(\textbf{a}+\textbf{b})=\mathbb{E}(\textbf{a})+\mathbb{E}(\textbf{b})

1.2.2 方差、协方差矩阵

假设 \sigma^2_1,\cdots,\sigma^2_n 为 {\rm y}_1,\cdots,{\rm y}_n 的方差, \sigma_{ij} 表示 {\rm y}_i,{\rm y}_j,i\ne j 的协方差。

则协方差矩阵

\pmb{\Sigma}={\rm cov}(\textbf{y}) =\left( \begin{matrix}\sigma_{11}&\sigma_{12}&\cdots &\sigma_{1n} \\\sigma_{21}&\sigma_{22}&\cdots &\sigma_{2n}\\\vdots&\vdots &\ddots&\vdots\\\sigma_{n1}&\sigma_{n2}&\cdots &\sigma_{nn}\end{matrix}\right) \\=\left( \begin{matrix}\rm cov(y_1,y_1)&\rm cov(y_1,y_2)&\cdots &{\rm cov}({\rm y}_1,{\rm y}_n) \\\rm cov(y_2,y_1)&\rm cov(y_2,y_2)&\cdots &{\rm cov}({\rm y}_2,{\rm y}_n)\\\vdots&\vdots &\ddots&\vdots\\\rm cov(y_n,y_1)&\rm cov(y_n,y_2)&\cdots &{\rm cov}({\rm y}_n,{\rm y}_n)\end{matrix}\right) \\=\left( \begin{matrix}\mathbb{E}({\rm y}_1{\rm y}_1)-\mu_1\mu_1&\mathbb{E}({\rm y}_1{\rm y}_2)-\mu_1\mu_2&\cdots &\mathbb{E}({\rm y}_1{\rm y}_n)-\mu_1\mu_n \\\mathbb{E}({\rm y}_2{\rm y}_1)-\mu_2\mu_1&\mathbb{E}({\rm y}_2{\rm y}_2)-\mu_2\mu_2&\cdots &\mathbb{E}({\rm y}_2{\rm y}_n)-\mu_2\mu_n \\\vdots&\vdots &\ddots&\vdots \\\mathbb{E}({\rm y}_n{\rm y}_1)-\mu_n\mu_1&\mathbb{E}({\rm y}_n{\rm y}_2)-\mu_1\mu_2&\cdots &\mathbb{E}({\rm y}_n{\rm y}_n)-\mu_n\mu_n\end{matrix}\right) =\left( \begin{matrix}\mathbb{E}({\rm y}_1{\rm y}_1)&\mathbb{E}({\rm y}_1{\rm y}_2)&\cdots &\mathbb{E}({\rm y}_1{\rm y}_n) \\\mathbb{E}({\rm y}_2{\rm y}_1)&\mathbb{E}({\rm y}_2{\rm y}_2)&\cdots &\mathbb{E}({\rm y}_2{\rm y}_n) \\\vdots&\vdots &\ddots&\vdots \\\mathbb{E}({\rm y}_n{\rm y}_1)&\mathbb{E}({\rm y}_n{\rm y}_2)&\cdots &\mathbb{E}({\rm y}_n{\rm y}_n)\end{matrix}\right)-\left( \begin{matrix}\mu_1\mu_1&\mu_1\mu_2&\cdots &\mu_1\mu_n) \\\mu_2\mu_1&\mu_2\mu_2&\cdots &\mu_2\mu_n \\\vdots&\vdots &\ddots&\vdots \\\mu_n\mu_1&\mu_1\mu_2&\cdots &\mu_n\mu_n\end{matrix}\right)=\mathbb{E}(\textbf{yy}^T)-\pmb{\mu\mu}^T  \\=\left( \begin{matrix}\mathbb{E}(({\rm y}_1-\mu_1)^2)& \mathbb{E}(({\rm y}_1-\mu_1)({\rm y}_2-\mu_2))&\cdots &\mathbb{E}(({\rm y}_1-\mu_1)({\rm y}_n-\mu_n)) \\\mathbb{E}(({\rm y}_2-\mu_2) ({\rm y}_1-\mu_1))& \mathbb{E}(({\rm y}_2-\mu_2)^2)&\cdots &\mathbb{E}(({\rm y}_2-\mu_2)({\rm y}_n-\mu_n)) \\\vdots&\vdots &\ddots&\vdots \\\mathbb{E}(({\rm y}_n-\mu_n) ({\rm y}_1-\mu_1))& \mathbb{E}(({\rm y}_n-\mu_n) ({\rm y}_2-\mu_2))&\cdots &\mathbb{E}(({\rm y}_n-\mu_n)^2) \end{matrix}\right)=\mathbb{E}\left( \begin{matrix} ({\rm y}_1-\mu_1)^2& ({\rm y}_1-\mu_1)({\rm y}_2-\mu_2)&\cdots &({\rm y}_1-\mu_1)({\rm y}_n-\mu_n) \\({\rm y}_2-\mu_2) ({\rm y}_1-\mu_1)& ({\rm y}_2-\mu_2)^2&\cdots &({\rm y}_2-\mu_2)({\rm y}_n-\mu_n) \\\vdots&\vdots &\ddots&\vdots \\({\rm y}_n-\mu_n)({\rm y}_1-\mu_1)&({\rm y}_n-\mu_n)({\rm y}_2-\mu_2)&\cdots &({\rm y}_n-\mu_n)^2\end{matrix}\right)=\mathbb{E}\left(\left(\begin{matrix}{\rm y}_1-\mu_1\\{\rm y}_2-\mu_2\\\vdots\\{\rm y}_n-\mu_n\end{matrix}\right)\left(\begin{matrix}{\rm y}_1-\mu_1&{\rm y}_2-\mu_2&\cdots&{\rm y}_n-\mu_n\end{matrix}\right)\right)=\mathbb{E}((\textbf{y}-\pmb{\mu})(\textbf{y}-\pmb{\mu})^T)

即:

\pmb{\Sigma}={\rm cov}(\textbf{y})=\mathbb{E}((\textbf{y}-\pmb{\mu})(\textbf{y}-\pmb{\mu})^T)=\mathbb{E}(\textbf{yy}^T)-\pmb{\mu\mu}^T

广义方差(Generalized Variance):随机向量 \textbf{y} 的广义方差为其协方差阵的行列式。

GVar(\textbf{y})=\det(\pmb{\Sigma})=\left| \pmb{\Sigma} \right|

Standard Distance:也称为Mahalanobis Distance

D_s=(\textbf{y}-\pmb{\mu})^T\pmb{\Sigma}(\textbf{y}-\pmb{\mu})

1.2.3 相关系数矩阵

\pmb{P}_\rho=\left(\begin{matrix}1&\rho_{12}&\cdots&\rho_{1n} \\\rho_{21}&1&\cdots&\rho_{1n} \\\vdots&\vdots&\ddots&\vdots \\\rho_{n1}&\rho_{n2}&\cdots&1 \end{matrix}\right)

其中, \rho_{ij}=\sigma_{ij}/\sigma_i\sigma_j 为 {\rm y}_i,{\rm y}_j 的相关系数。

令 \pmb{D}_\sigma=({\rm diag}(\pmb{\Sigma}))^{1/2}={\rm diag}(\sigma_1,\sigma_2,\cdots,\sigma_n) ,有

\pmb{P}_\rho=\pmb{D}_\sigma^{-1}\pmb{\Sigma}\pmb{D}_\sigma^{-1} \\\pmb{\Sigma}=\pmb{D}_\sigma\pmb{P}_\rho\pmb{D}_\sigma

1.2.4 分块随机向量

分块矩阵(向量)的结论,放到随机矩阵(向量)中,也make sense。

A simple example:

Suppose that the random vector \textbf{v} is partitioned into two subsets of variables, which we denote by \textbf{y} and \textbf{x} :

\textbf{v}=\left(\begin{matrix}\textbf{y}\\\textbf{x}\end{matrix}\right)=\left(\begin{matrix}{\rm y}_1\\\vdots\\{\rm y}_n\\{\rm x}_1\\\vdots\\{\rm x}_m\end{matrix}\right)

Thus there are n+m random variables in \textbf{v} .

\pmb{\mu}=\mathbb{E}(\textbf{v})=\mathbb{E}\left(\left(\begin{matrix}\textbf{y}\\\textbf{x}\end{matrix}\right)\right)=\left(\begin{matrix}\mathbb{E}(\textbf{y})\\\mathbb{E}(\textbf{x})\end{matrix}\right)=\left(\begin{matrix}\pmb{\mu}_{\rm y}\\\pmb{\mu}_{\rm x}\end{matrix}\right) \\ \pmb{\Sigma}={\rm cov}(\textbf{v})={\rm cov}\left(\left(\begin{matrix}\textbf{y}\\\textbf{x}\end{matrix}\right)\right)=\left(\begin{matrix}\pmb{\Sigma}_{\rm yy}&\pmb{\Sigma}_{\rm yx}\\\pmb{\Sigma}_{\rm xy}&\pmb{\Sigma}_{\rm xx}\end{matrix}\right)

由协方差性质可知, \pmb{\Sigma}_{\rm xy}=\pmb{\Sigma}_{\rm yx}^T

由分块矩阵的性质也可知:

\pmb{\Sigma}_{\rm yx}={\rm cov}(\textbf{y},\textbf{x})=\mathbb{E}((\textbf{y}-\pmb{\mu}_{\rm y})(\textbf{x}-\pmb{\mu}_{\rm x})^T)

1.2.5 随机向量的线性函数

我们时常需要考虑一些随机变量的线性组合构成的新的随机变量,为了方便,引入其向量表示。

{\rm z}=a_1{\rm y_1}+a_2{\rm y_2}+\cdots+a_n{\rm y_n}=\pmb{a}^T\textbf{y}

当我们拥有一系列(k组)关于随机变量y的线性组合时,

\begin{equation} \left\{ \begin{array}{lr} {\rm z}_1=a_{11}{\rm y_1}+a_{12}{\rm y_2}+\cdots+a_{1n}{\rm y_n}=\pmb{a}_1^T\textbf{y} \\{\rm z}_2=a_{21}{\rm y_1}+a_{22}{\rm y_2}+\cdots+a_{2n}{\rm y_n}=\pmb{a}_2^T\textbf{y} \\\vdots \\{\rm z}_k=a_{k1}{\rm y_1}+a_{k2}{\rm y_2}+\cdots+a_{kn}{\rm y_n}=\pmb{a}_k^T\textbf{y} \end{array} \right. \end{equation}

其中, \pmb{a}_i^T=(a_{i1},a_{i2},\cdots,a_{in}),\textbf{y}=({\rm y}_1,{\rm y}_2,\cdots,{\rm y}_n)^T

就能得到一个k维随机向量 \textbf{z}=\textbf{Ay}

其中, \textbf{z}=\left(\begin{matrix}{\rm z}_1\\{\rm z}_2\\\vdots\\{\rm z}_k\end{matrix}\right),\textbf{A}=\left(\begin{matrix}\pmb{a}_1^T\\\pmb{a}_2^T\\\vdots\\\pmb{a}_k^T\end{matrix}\right)=\left( \begin{matrix}a_{11}&a_{12}&\cdots &a_{1n} \\a_{21}&a_{22}&\cdots &a_{2n}\\\vdots&\vdots &\ddots&\vdots\\a_{k1}&a_{k2}&\cdots &a_{kn}\end{matrix}\right)

由随机变量的期望、方差、协方差性质,很容易推广到随机向量的同类性质。

\mathbb{E}(\textbf{Ay}+\pmb{b})=\textbf{A}\mathbb{E}(\textbf{y})+\pmb{b} \\{\rm cov}(\textbf{z})={\rm cov}(\textbf{Ay})=\textbf{A}\pmb{\Sigma}\textbf{A}^T \\{\rm cov}(\textbf{z},\textbf{w})={\rm cov}(\textbf{Ay},\textbf{By})=\textbf{A}\pmb{\Sigma}\textbf{B}^T \\{\rm cov}(\textbf{Ay}+\pmb{b})=\textbf{A}\pmb{\Sigma}\textbf{A}^T \\{\rm cov}(\textbf{Ay},\textbf{Bx})=\textbf{A}\pmb{\Sigma}_{\rm yx}\textbf{B}^T

最后一条性质,由分块矩阵的性质可证。

令 \textbf{v}=\left(\begin{matrix}\textbf{y}\\\textbf{x}\end{matrix}\right),\textbf{C}=\left(\begin{matrix}\textbf{A}&\textbf{0}\\\textbf{0}&\textbf{B}\end{matrix}\right) ,求 {\rm cov}(\textbf{Cv}) 即可。

2、分布理论

3、点估计

4、假设检验

5、区间估计

6、方差分析

6.1 One-way ANOVA

6.2 Two-way ANOVA

7、Linear Mixed Model

8、数据分析

Advertisements
标准
时变函数

时变函数 | MLE of a Linear Model Where the Error from Logistic Distribution

Consider a linear model 

Y=X\beta+\epsilon

where the error \epsilon is from Logistic Distribution with density f(x)=\frac{e^{-x}}{(1-e^{-x})^2}

The log-likelihood function is

\ell (x) =\sum_{i=1}^n{(x_i\beta-y_i)}-2\sum_{i=1}^{n} {\ln (1+e^{x_i\beta-y_i})}

thus, \hat\beta_{MLE}=\arg\max_{\beta}\ell (x) =\arg\max_{\beta} \sum_{i=1}^n{(x_i\beta-y_i)}-2\sum_{i=1}^{n} {\ln (1+e^{x_i\beta-y_i})}

易知,这是个凹函数(二阶导恒小于0)。

对β求导,得到一阶条件:

\sum_{i=1}^n{x_i\frac{1-\exp{(x_i\beta-y_i)}}{1+\exp{(x_i\beta-y_i)}}}

是个p阶(\beta的阶数)超越方程,没有解析解。

但考虑和Lasso类似的想法,可以用于p>>n时的降维,具体降维规则我还在编。编好了大概可以先水一篇小paper?

而当p<n时,必有解。

标准
Markov恋

Markov恋 |想要和未来恋人完成的梦想

Firestone Memorial Library和他背对背坐在地上静静地看书,就算世界变成原子也无所谓。

手牵手走过长长的Washington Road

一起学数学,召唤Fine Hall里远去的幽灵。

Whitman Theater,等一场精心准备的中文话剧开演。

Thomas Sweet吃着冰淇淋,欣赏烈日下的熙熙攘攘;在First Campus Center喝着热可可,倚窗听雪落凡尘。

周末的午后,在同一间自习室看书,他的桌子在我的对面,他写他的paper,我跑我的R。

累了,就站起来,走到他身边,默默注视着凝聚着他心血的半成品,轻轻一笑,转身为他打一杯热水,刚刚好的温度,放在他手边。

轻云晚雨薄霞,
碧藤青虎绯花,
远道苍桐古塔,
归鸿影下,
有君何惧天涯。

所有没有用来陪数学的时间,我都用来陪他——陪他鲜衣怒马,踏遍天涯海角,山川万里;陪他煮雪烹茶,看尽繁花似锦,云卷云舒。

一起发论文。跟我抢一作就罚他吃一星期的内江牛肉面当夜宵;抢二作就罚叶儿粑、赖汤圆、韩包子、担担面、红油抄手、麻酱凉面、椒盐锅盔、四川油茶、绵阳米粉、宜宾燃面、富顺豆花、豆腐脑水粉、广元米凉粉、酥肉豆腐脑轮着当早餐吃两星期!

一起养一条大鱼,叫阿翔。当他问我为什么的时候,我会佯装嗔怪,嘟囔着:“一看就知道只背了《逍遥游》没玩过《古剑奇谭》,友尽,哼!”╭(╯^╰)╮

一起远行,一起回家,一起从美国的飞机上下来,不再带着青葱懵懂、年少轻狂,只是一直怀揣对数学的热爱,回到自己的故乡,为几代人都未完成的事业添砖添瓦。

他从公文包里取出文件夹,那是一篇即将发刊的文章。

然后……
然后……
然后……

然后……
……

……
……
……
……
……
……
……
……
……
……
……


然后我从梦中吓醒,发现自己连结课论文都还没写完!!!

━━━━━━━━━━◆♡◆━━━━━━━━━

那是我们一起合作的文章。

嗯。共同一作

他把文章戒指都交到了我的手上。

标准
时变函数

时变函数 | 从条件期望谈起

毋庸置疑,条件期望条件概率的一般形式。对于事件A,考虑其示性函数𝕀A,那么很自然的,我们就有ℙ(A|B)=𝔼(𝕀A|B). 故只考虑条件期望即可。

定义

条件期望相当于将随机变量限定在原来的概率空间(Ω, ℱ0 ,ℙ)的一个子空间(Ω, ℱB, ℙ)上,再取期望。于是,一个很自然的想法就是,由于不同的对概率空间的划分,所得到的条件期望自然也不同当为一个关于这个划分的映射B(θ),是一个(Ω, ℱB, ℙ)→ℝ的映射,即:一个随机变量

定义:考虑一个测度空间(Ω,ℱ0,ℙ)上的一个随机变量X,满足𝔼|X|<∞。

考虑一个子σ域ℱ ⊂ℱ0。X对ℱ的条件期望𝔼(X|ℱ)是一个随机变量Y,满足

(i)Y对ℱ可测.

(ii)对于任意A∈ℱ,有\int_A Xd\mathbb{P}=\int_A Yd\mathbb{P}.

存在性

Lemma: Radon-Nikodym定理

令μ和ν是(Ω,ℱ )上的两个σ有限的测度。如果ν ≪ μ ,则存在对ℱ可测的函数f,对于任意A∈ℱ,有\int_Afd\mu=\nu(A) . f一般记作\frac{d\nu}{d\mu},叫做Radon-Nikodym导数。

(ν ≪ μ ,即ν对μ绝对连续,如果对A∈ℱ 有μ(A)=0,则ν (A)=0.)

证:不妨设X≥0. 令μ =ℙ ,, ∀ A∈ℱ .

易知,ν 是一个σ 有限的测度,并且有ν ≪ μ 。那么由上述定理可得,存在Radon-Nikodym导数,s.t.

\int_AXd\mathbb{P}=\nu(A)=\int_A\frac{d\nu}{d\mu}d\mathbb{P}.

不难验证,Radon-Nikodym导数\frac{d\nu}{d\mu}就是我们要的条件期望。对于一般的X,取其正部和负部分别处理即可。

此外,对条件期望的存在性,还存在泛函分析上的一个直观证明。

L²(Ω, ℱ0 ,ℙ),即(Ω, ℱ0 ,ℙ)上的L²(平方可积)函数,是一个Hilbert空间,而(Ω, ℱ ,ℙ)上的L²函数是其子空间。

考虑一个L²的随机变量X,其相对于ℱ的条件期望就是其在L²(Ω, ℱ ,ℙ)这一子空间上的正交投影。

记投影为X’, X-X’=Z. 注意Z和L²(Ω, ℱ ,ℙ)正交,所以\langle 1_A,Z\rangle =\int_\Omega 1_A Zd\mathbb{P}=0. ∀A∈ℱ, . 于是,

\int_A (X-X')d\mathbb{P}=\int_A Zd\mathbb{P}=\int_\Omega 1_A Zd\mathbb{P}=0 .

所以,X’确实是条件期望。对于L1的变量,可以通过L2的变量逼近获得。

性质

对概率空间的划分无论怎么取,在平均之后始终不影响原随机变量的性质。因此有

X∈ℱ

 𝔼(X|ℱ)∈ℱ = X

因而,也就有了「切块」取平均的「局部平均」法:

 𝔼(𝔼(X|ℱ))=𝔼(X)

以及将随机变量X映射到实数域ℝ上取具体值,有

𝔼(f(x)|ℱ)=f(x)

或在概率空间中做进一步变换,则有

𝔼(g(X)|ℱ)=g(X)

考虑两个形式简洁的小题:

1、X, Y是i.i.d. r.v. ,求𝔼(X|X+Y)

2、X是r.v. 且对称,求:

(1) 𝔼(X²|X)

(2) 𝔼(X|X²)

基于上述熟知的性质,我们很自然地能把第二题解决:

𝔼(X²|X)= 𝔼(X²|σ(X))=X²

𝔼(X|X²)=f(X²)

而由对称性,

𝔼(-X|X²)=f((-X)²)= f(X²)=𝔼(X²|X)

由期望的性质,

-𝔼(X|X²)=𝔼(-X|X²)

因此,

𝔼(X|X²)=𝔼(-X|X²)=0

从直观上,

X给定了,X²自然也可以确定,直接去掉期望符号开出来即可。

类似的,X²给定后(=a),X也能确定下来(=±a),取平均后得0。

如此,我们便可断言,对服从位置-尺度族对称分布的 X,设位置参数为θ,尺度参数为σ,有

\mathbb{E}(X|(\frac{X-\mu}{\sigma})^2)=\mu

从而再次印证尺度参数不影响期望这个事实。

对第一题,考虑

均值不等式\frac{a+b}{2} \leq \sqrt{ab}

从概率/统计的观点来看,均值不等式(族)在说的是这么一件事:当随机变量的(线性)组合结构给定时,各个随机变量质量均匀时,无论以什么样的方式怎么取平均,其结果都是相等的。

由于X,Y是i.i.d.随机变量,所以在条件期望中𝔼(X|X+Y),它们的权重是相等的,两个随机变量等价。所以自然有:

𝔼(X|X+Y)= 𝔼(Y|X+Y)

那么这个问题就变成了

\mathbb{E}(X|X+Y)=\frac{1}{2}(\mathbb{E}(X|X+Y)+\mathbb{E}(X|X+Y)) \\=\frac{1}{2}(\mathbb{E}(X|X+Y)+\mathbb{E}(Y|X+Y))

这一步变换同时也体现了局部平均那个「切块取平均,再平均」的思想。

再由条件期望可加性:

\mathbb{E}(X|X+Y)=\frac{1}{2}(\mathbb{E}(X+Y|X+Y)) \\=\frac{1}{2}(X+Y)

于是,我们推广一下,能算出这么几个东西:

Xi, Y, 2Z, i=1, …, n 是i.i.d.随机变量。

\mathbb{E}(X|X+2Y)=\frac{1}{3}(\mathbb{E}(X|X+2Y)+\mathbb{E}(X|X+2Y)+\mathbb{E}(X|X+2Y)) \\=\frac{1}{3}(\mathbb{E}(X|X+2Y)+\mathbb{E}(Y|X+2Y)+\mathbb{E}(Y|X+2Y)) \\=\frac{1}{3}(\mathbb{E}(X|X+2Y)+2\mathbb{E}(Y|X+2Y))\\=\frac{1}{3}(\mathbb{E}(X|X+2Y)+\mathbb{E}(2Y|X+2Y)) \\=\frac{1}{3}\mathbb{E}(X+2Y|X+2Y) \\=\frac{1}{3}(X+2Y)

\mathbb{E}(X|X+2Z)=\frac{1}{2}(\mathbb{E}(X|X+2Z)+\mathbb{E}(X|X+2Z)) \\=\frac{1}{2}(\mathbb{E}(X|X+2Z)+\mathbb{E}(2Z|X+2Z))\\=\frac{1}{2}\mathbb{E}(X+2Z|X+2Z) \\=\frac{1}{2}(X+2Z)

此两个操作的不相同,与上述均值不等式的易错点有着异曲同工之妙~

推广到N个随机变量:

\mathbb{E}(X_i|\sum_{i=1}^n{X_i}) \\=\frac{1}{n}\times n\mathbb{E}(X_i|\sum_{i=1}^n{X_i}) \\=\frac{1}{n}\sum_{i=1}^n\mathbb{E}(X_i|\sum_{i=1}^n{X_i}) \\=\frac{1}{n}\mathbb{E}(\sum_{i=1}^nX_i|\sum_{i=1}^n{X_i}) \\=\frac{1}{n}\sum_{i=1}^nX_i=\bar X

最后一个式子说明了,当样本均值给定下,某个随机样本的期望等于样本均值。

我们知道单个样本的期望等于总体期望,即𝔼(Xi)= 𝔼(X);样本均值的期望等于总体的期望即\mathbb{E}(\bar X)=\mathbb{E}(X)

而根据局部均值的思想, \mathbb{E}(\mathbb{E}(X_i|\sum_{i=1}^n{X_i}))=\mathbb{E}(X_i)=\mathbb{E}(X)=\mathbb{E}(\bar X)

事实上,这一条性质也是Rao-Blackwell定理的直接应用,因为是一个充分统计量,所以对于任意的无偏差统计量T来说(这里是对于任意Xi),𝔼得到的将会是一个更好的统计量。

同时,从回归的角度理解,当我们对y=a0×∑y + u做回归的时候,∀ i, \hat{y}=\bar{y}

标准
张量几何

张量几何 | Science writing writes scientifically

It is hard to summarize the skills of science writing. Maybe we can claim that thereare three rules for writing scientifically, however, unfortunately, no two snowflakes are alike-for so many variables to consider.

WHAT is the subject matter?

Above all, as a (mathematical) statistics student, I must write professionally and seriouslyall the time. Therefore, thought I am also a fiction writer, I write nonfictionstories most of the time. Because in my opinion, academic or expository writingjust like tell the others a story scientifically.

I am accustomed to coming up with an outline first. It includes the main idea, new model and fancy technical detail of every section as well as the description of the research problem background with suitable quotes from sources. I’ve kept this habit since my freshman year (when I was in Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University), and I’m so proud it echoes with what I have been taught about structure since I became a graduate student.

During my part time, I just write novel as a fictionist. For fiction, I will start from thinking how make the characters lively and interesting. As such, think through their quotes and their tones, create contradictions and conflicts to highlight the contrast between the roles. Ultimately, write down the story that I really want to tell through their experiences. It’s very scientifically, I am sure.

WHO is going to be the target audience?

All my familiar friend would know that I’m keeping a diary and research jotting whose reader is only myself. I always write very fast and never revise. Because of my peculiarity and absolutely narcissistic, I prefer to use orange pen with black ink and thick journals with purple covers, meanwhile, I hardly type for indulging in my effortless calligraphy. My diary and research jotting are great inspiration sources for other formal writings, and that’s why I would like to keep it raw. Calligraphy is everything.

If I will have readers, thus I will not actually「write」. I will type because I know a(n) (journal) article is going to be a permanent record of my work and I have to make hundreds and thousands of revisions before I finally feel ok to show it to others, so as not to cause embarrassments everal years from now. Undoubtedly, LaTeX, a standard format of mathematics is more efficient and effective (in some sense).

WHERE am I?

I gradually realized the necessity of keeping the personal separated from the professional. My colleagues are almost all my friends, they visiting my web page, however, probably not interested in my hobbies or opinions. They prefer to get my papers, preprints, curriculum vitae as well as contact info and curious about my research progress. Conversely, only my friends instead of my colleagues are just probably not interested in my research papers. 

Hence, I made a detailed column of this blog. 「Markov恋」is the snapshot of my private life, my work patterners could skip it; 「时变函数」is used to update on my own research and expository articles; 「调和分析」is discussion of interesting problems and some lecture notes; 「张量几何」is various other topics, usually related to mathematics, statistics and love.

Moreover,I LOVE being alone in my bedroom or apartment rather than library or office when I’m writing, so I can think out loud lightheartedly. I HATE to be witnessed or interrupted. But as long as I’m sure everybody else being busy with their own duties and responsibilities won’t disturb me, I also can work and write in public places.

WHY do I write?

When I was a seventeen-year-old girl, I was absolutely interested in all the math around me. Furthermore, I want to spend my energies on creating new mathematics than trying to fight over the old mathematics. This is the ultimate reason for all my writing. 

By the way, we all want to work out world-renowned BIG CONJECTURE which struck us for a long time, but have to do something in silence that only a professional could understand. 

Anyhow, I must strive to find my own voice-think big, even if you can’t temporarily.

WHEN is the deadline?

Submitting a paper is a technical work. For every graduate student, it has an urgent deadline when the thesis paper due. However, I always like to describe myself as slow. Unlike the other mathematicians or statisticians who solve problems with quicksilver brilliance, I gravitate toward deep problems which I can chew on for years. Thus, if my boss doesn’t push me, I will procrastinate and procrastinate and procrastinate. In fact, he did never push me=)

It sounds so fantastic.

Okay, start science writing and writing scientifically:)

开始写吧!

标准