The third assignment of Multivariate Statistics. The assignment is written in Rmarkdown, a smart syntax supported by RStudio helping with formula, plot visualization and plugin codes running.
most recommend: click here for html version of assignment, you can see codes as well as plots.
You may also find the PDF Version of this assignment from github. Or if you can cross the fire wall, just see below :
1
density function for the multivariate normal distribution:
f ( x ; μ , Σ ) = 1 ( 2 π ) p / 2 ∣ Σ ∣ 1 / 2 e − 1 2 ( x − μ ) ′ Σ − 1 ( x − μ ) f(x; \mu, \Sigma) =\frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}}e^{-\frac{1}{2}(x-\mu)^{'}\Sigma^{-1}(x-\mu)} f ( x ; μ , Σ ) = ( 2 π ) p /2 ∣Σ ∣ 1/2 1 e − 2 1 ( x − μ ) ′ Σ − 1 ( x − μ )
the likelihood function for the two independent sample:
L ( μ 1 , μ 2 , Σ ) = Π i = 1 n 1 f ( X i ; μ 1 , Σ ) Π j = 1 n 2 f ( X j ; μ 2 , Σ ) = L ( μ 1 , Σ ) L ( μ 2 , Σ ) L(\mu_1,\mu_2,\Sigma) = \Pi_{i=1}^{n_1}f(X_i;\mu_1,\Sigma)\Pi_{j=1}^{n_2}f(X_j;\mu_2,\Sigma) \\
=L(\mu_1,\Sigma)L(\mu_2,\Sigma) L ( μ 1 , μ 2 , Σ ) = Π i = 1 n 1 f ( X i ; μ 1 , Σ ) Π j = 1 n 2 f ( X j ; μ 2 , Σ ) = L ( μ 1 , Σ ) L ( μ 2 , Σ )
So the likelihood function can be defined as:
L ( μ 1 , μ 2 , Σ ) = 1 ( 2 π ) N p / 2 ∣ Σ ∣ N / 2 e x p ( − 1 2 t r [ Σ − 1 ( ∑ i = 1 n 1 Φ 1 ( X i ) + ∑ j = 1 n 2 Φ 2 ( X j ) ) ] ) Φ i ( x ) = ( x − μ i ) ( x − μ i ) ′ L(\mu_1,\mu_2,\Sigma) = \frac{1}{(2\pi)^{Np/2}|\Sigma|^{N/2}}
exp( -\frac{1}{2}tr[ \Sigma^{-1} (\sum_{i=1}^{n_1}\Phi_1(X_i)+\sum_{j=1}^{n_2}\Phi_2(X_j)) ] ) \\
\Phi_i(x) = (x-\mu_i)(x-\mu_i)^{'} L ( μ 1 , μ 2 , Σ ) = ( 2 π ) Np /2 ∣Σ ∣ N /2 1 e x p ( − 2 1 t r [ Σ − 1 ( i = 1 ∑ n 1 Φ 1 ( X i ) + j = 1 ∑ n 2 Φ 2 ( X j ))]) Φ i ( x ) = ( x − μ i ) ( x − μ i ) ′
Using MLE, the maximum is:
Σ ^ = 1 n 1 + n 2 ( ∑ i = 1 n 1 Φ 1 ( X i ) + ∑ j = 1 n 2 Φ 2 ( X j ) ) = 1 n 1 + n 2 [ ( n 1 − 1 ) S 1 + ( n 2 − 1 ) S 2 ] = n 1 + n 2 − 1 n 1 + n 2 S p o o l e d \hat \Sigma = \frac{1}{n_1+n_2} (\sum_{i=1}^{n_1} \Phi_1(X_i)+\sum_{j=1}^{n_2}\Phi_2(X_j)) \\
=\frac{1}{n_1+n_2}[(n_1-1)S_1+(n_2-1)S_2] = \frac{n_1+n_2-1}{n_1+n_2}S_{pooled} Σ ^ = n 1 + n 2 1 ( i = 1 ∑ n 1 Φ 1 ( X i ) + j = 1 ∑ n 2 Φ 2 ( X j )) = n 1 + n 2 1 [( n 1 − 1 ) S 1 + ( n 2 − 1 ) S 2 ] = n 1 + n 2 n 1 + n 2 − 1 S p oo l e d
2
(a)
It can be calculated that:
x ˉ = [ 4.64 , 45.4 , 9.965 ] T S = [ 2.879 10.01 − 1.81 10.01 199.788 − 5.64 − 1.81 − 5.64 3.682 ] \bar x=[4.64,45.4,9.965]^{T}\\
S=\left[
\begin{matrix}
2.879 &10.01 &-1.81 \\
10.01&199.788&-5.64 \\
-1.81&-5.64&3.682
\end{matrix}\right] x ˉ = [ 4.64 , 45.4 , 9.965 ] T S = 2.879 10.01 − 1.81 10.01 199.788 − 5.64 − 1.81 − 5.64 3.682
we can calculate S’s eigen value and its respective eigen vector:
v a l u e = [ 200.46209264 , 1.31804876 , 4.56885859 ] e 1 ′ = [ − 0.05084165 , − 0.82194341 , − 0.56729548 ] e 2 ′ = [ − 0.99828327 , 0.02528569 , 0.0528313 ] e 3 ′ = [ 0.02907988 , − 0.56900761 , 0.82181792 ] value = [ 200.46209264, 1.31804876, 4.56885859]\\
e_1^{'} = [-0.05084165, -0.82194341, -0.56729548] \\
e_2^{'} = [-0.99828327, 0.02528569, 0.0528313 ] \\
e_3^{'} = [ 0.02907988, -0.56900761, 0.82181792] \\ v a l u e = [ 200.46209264 , 1.31804876 , 4.56885859 ] e 1 ′ = [ − 0.05084165 , − 0.82194341 , − 0.56729548 ] e 2 ′ = [ − 0.99828327 , 0.02528569 , 0.0528313 ] e 3 ′ = [ 0.02907988 , − 0.56900761 , 0.82181792 ]
The axes of the region are:
λ i p ( n − 1 ) n ( n − p ) F p , n − p ( α ) p ( n − 1 ) n ( n − p ) F p , n − p ( α ) = 19 × 3 17 × 20 × F 3 , 17 ( 0.1 ) = 0.167 × 2.44 = 0.409 \sqrt \lambda_i \sqrt {\frac{p(n-1)}{n(n-p)}F_{p,n-p}(\alpha)} \\
\frac{p(n-1)}{n(n-p)}F_{p,n-p}(\alpha) = \frac{19 \times 3}{17 \times 20} \times F_{3,17}(0.1) = 0.167 \times 2.44 = 0.409 λ i n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n ( n − p ) p ( n − 1 ) F p , n − p ( α ) = 17 × 20 19 × 3 × F 3 , 17 ( 0.1 ) = 0.167 × 2.44 = 0.409
so the axes’ lengths are:
200.46209264 × 0.409 = 9.055 1.31804876 × 0.409 = 0.734 4.56885859 × 0.409 = 1.367 \sqrt {200.46209264} \times \sqrt {0.409} = 9.055 \\
\sqrt {1.31804876} \times \sqrt {0.409} = 0.734 \\
\sqrt {4.56885859} \times \sqrt {0.409} = 1.367 200.46209264 × 0.409 = 9.055 1.31804876 × 0.409 = 0.734 4.56885859 × 0.409 = 1.367
The directions of each aixs is determined by its corresponding eigen vector shown above.
(b)
# read the data
setwd ( '~/Desktop/三春/3多元统计分析/作业/作业3-1,3-2/' )
dat <- read.csv ( "data.csv" )
x1 <- dat $ x1
x2 <- dat $ x2
x3 <- dat $ x3
qqnorm (x1, main = "Normal Probability Plot" , pch = 19 )
qqline (x1)
qqnorm (x2, main = "Normal Probability Plot" , pch = 19 )
qqline (x2)
qqnorm (x3, main = "Normal Probability Plot" , pch = 19 )
qqline (x3)
plot (x1,x2)
plot (x1,x3)
plot (x2,x3)
It seems that each variable’s normality is fine and they don’t have a significant relationship with each other, so the multivariate normal assumption seems justied.
3
T^2 = \sqrt n (\bar X -\mu_0)^{'}{(\frac{\sum_{j=1}^{n}(X_j-\bar X)(X_j-\bar X)^{'}}{n-1})^{-1}} \sqrt n (\bar X -\mu_0)\\
\text{So it can be calculated that }T^2 = 9.74 \\
\frac{p(n-1)}{(n-p)}F_{p,n-p}(\alpha) = \frac{19 \times 3}{17} \times F_{3,17}(0.05) = \frac{19 \times 3}{17} \times 3.20 = 10.729\
the confidence region is defined as :
( x ˉ i − p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s i i n , x ˉ i + p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s i i n ) F p , n − p ( α ) = F 3 , 17 ( 0.05 ) = 3.20 , x ˉ 1 = 4.64 , x ˉ 2 = 45.4 , x ˉ 3 = 9.965 s 11 = 2.879 , s 22 = 199.788 , s 33 = 3.628 (\bar x_i - \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{ii}}{n}},\bar x_i + \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{ii}}{n}}) \\
F_{p,n-p}(\alpha) = F_{3,17}(0.05) =3.20, \ \bar x_1 = 4.64, \ \bar x_2 = 45.4,\ \bar x_3 =9.965 \\
s_{11} = 2.879, \ s_{22} = 199.788, \ s_{33} = 3.628 ( x ˉ i − n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s ii , x ˉ i + n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s ii ) F p , n − p ( α ) = F 3 , 17 ( 0.05 ) = 3.20 , x ˉ 1 = 4.64 , x ˉ 2 = 45.4 , x ˉ 3 = 9.965 s 11 = 2.879 , s 22 = 199.788 , s 33 = 3.628
so the three regions are:
( x ˉ 1 − p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s 11 n , x ˉ 1 + p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s 11 n ) = [ 3.397 , 5.882 ] ( x ˉ 2 − p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s 22 n , x ˉ 2 + p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s 22 n ) = [ 35.047 , 55.752 ] ( x ˉ 3 − p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s 22 n , x ˉ 3 + p ( n − 1 ) n ( n − p ) F p , n − p ( α ) s 33 n ) = [ 8.569 , 11.360 ] (\bar x_1 - \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{11}}{n}},\bar x_1 + \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{11}}{n}}) \\
= [3.397, 5.882]\\
(\bar x_2 - \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{22}}{n}},\bar x_2 + \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{22}}{n}}) \\
= [35.047, 55.752]\\
(\bar x_3 - \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{22}}{n}},\bar x_3 + \sqrt {\frac{p(n-1)}{n(n-p)} F_{p,n-p}(\alpha)} \sqrt{\frac{s_{33}}{n}}) \\
= [8.569, 11.360]\\ ( x ˉ 1 − n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s 11 , x ˉ 1 + n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s 11 ) = [ 3.397 , 5.882 ] ( x ˉ 2 − n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s 22 , x ˉ 2 + n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s 22 ) = [ 35.047 , 55.752 ] ( x ˉ 3 − n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s 22 , x ˉ 3 + n ( n − p ) p ( n − 1 ) F p , n − p ( α ) n s 33 ) = [ 8.569 , 11.360 ]
(b)
The Bonferroni region is defined as :
[ x ˉ p − t n − 1 ( α 2 p ) s p p n , x ˉ p + t n − 1 ( α 2 p ) s p p n ] t 19 ( 0.0083 ) = 2.625106 [\bar x_p -t_{n-1}(\frac{\alpha}{2p})\sqrt{\frac{s_{pp}}{n}}, \ \bar x_p +t_{n-1}(\frac{\alpha}{2p})\sqrt{\frac{s_{pp}}{n}}] \\
t_{19}(0.0083) = 2.625106 [ x ˉ p − t n − 1 ( 2 p α ) n s pp , x ˉ p + t n − 1 ( 2 p α ) n s pp ] t 19 ( 0.0083 ) = 2.625106
so the Bonferroni regions are:
[ 3.644 , 5.635 ] [ 37.103 , 53.696 ] [ 8.846 , 11.083 ] [3.644, 5.635] \\
[37.103, 53.696] \\
[8.846, 11.083] [ 3.644 , 5.635 ] [ 37.103 , 53.696 ] [ 8.846 , 11.083 ]
which are smaller than T 2 T^2 T 2 confidence region because it focus on single confidence interval.