Skip to content
WonderLand
Go back

Linear Regression Assignment 7

The seventh assignment of Linear Regression. The assignment is written in Rmarkdown, a smart syntax supported by RStudio helping with formula, plot visualization and plugin codes running.

most recommend: click here for html version of assignment, you can see codes as well as plots.

You may also find the PDF Version of this assignment from github. Or if you can cross the fire wall, just see below:

1

a

# read the data
setwd('~/Desktop/三春/5线性回归分析/作业/HW7/')
Data<-read.table("hw7.txt")
names(Data) = c("Hours","Cases","Costs","Holiday")
Fit = lm(Hours~Cases+Costs+Holiday, data=Data)
anova(Fit)
SSTO = sum( anova(Fit)[,2] )
MSE = anova(Fit)[4,3]
SSR = sum( anova(Fit)[1:3,2] )  
MSR = SSR / 3                  
SSE = anova(Fit)[4,2]

From the table we have: SSR(X1)=136366SSR(X_1) = 136366,SSE(X1,X2,X3)=985530SSE(X_1,X_2,X_3) = 985530

Fit2 = lm(Hours~Cases+Holiday, data=Data)
anova(Fit2)

SSR(X3X1)=2033565SSR(X_3|X_1) = 2033565

SSR(X2X1,X3)=SSE(X1,X3)SSE(X1,X2,X3)SSR(X_2|X_1,X_3) = SSE(X_1,X_3)-SSE(X_1,X_2,X_3) = 992204 - 985530 = 6674

b

H0:β2=0,Ha:β20.H_0: \beta_2 = 0, H_a: \beta_2 \neq 0. \\

From a we have:

SSR(X2X1,X3)=6674,SSE(X1,X2,X3)=985530F=(6674/1)985530/48=0.32491F(0.95,1,48)=4.04265If F4.04265 concludeH9,otherwise concludeHaPvalue=0.5713SSR(X_2|X_1,X_3) = 6674, SSE(X_1,X_2,X_3) = 985530\\ F^* = \frac{(6674/1)}{985530/48} = 0.32491 \\ F(0.95,1,48) = 4.04265 \\ If \ F^* \leqslant 4.04265 \ conclude H_9, otherwise \ conclude H_a \\ P-value = 0.5713

c

Fit2 = lm(Hours~Cases+Costs, data=Data)
anova(Fit2)
Fit3 = lm(Hours~Costs+Cases, data=Data)
anova(Fit3)

So we have SSR(X2X1)+SSR(X1)=136366+5726=11395+130697=SSR(X1X2)+SSR(X2)SSR(X_2|X_1) +SSR(X_1) = 136366 + 5726 = 11395+130697 =SSR(X_1|X_2) +SSR(X_2)

Yes, it is always true because: SSR(X2X1)+SSR(X1)=SSE(X1)SSE(X1,X2)+SSR(X1)=SSTOSSE(X1,X2)SSR(X_2|X_1)+SSR(X_1) = SSE(X_1) - SSE(X_1,X_2) +SSR(X_1) = SSTO -SSE(X_1,X_2)

SSR(X1X2)+SSR(X2)=SSE(X2)SSE(X1,X2)+SSR(X2)=SSTOSSE(X1,X2)SSR(X_1|X_2)+SSR(X_2) = SSE(X_2) - SSE(X_1,X_2) +SSR(X_2) = SSTO -SSE(X_1,X_2)

2

From question1, We have SSR(X1)=136366,SSR(X2)=5726,SSR=2176606,SSTO=3162136SSR(X_1) = 136366,SSR(X_2) = 5726 , SSR = 2176606,SSTO = 3162136 \ So RY12=0.0431,RY22=0.00181,R2=0.6883R^2_{Y_1} = 0.0431,R^2_{Y_2} =0.00181,R^2 = 0.6883 \ From homework6 we have r12=0.10059216r_12 = 0.10059216, so R122=0.0101R^2_{12} =0.0101 \

Fit4 = lm(Hours~Costs, data=Data)
anova(Fit4)
Fit5 = lm(Hours~Cases, data=Data)
anova(Fit5)

RY122=SSR(X1X2)SSE(X2)R^2_{Y1|2} = \frac{SSR(X_1|X_2)}{SSE(X_2)} = 130697/3150741 = 0.04148\ RY212=SSR(X2X1)SSE(X1)R^2_{Y2|1} = \frac{SSR(X_2|X_1)}{SSE(X_1)} = 5726/3025770 = 0.001892\

Fit6 = lm(Hours~Cases+Holiday, data=Data)
anova(Fit6)

RY2132=SSR(X2X1,X3)SSE(X1,X3)R^2_{Y2|13} = \frac{SSR(X_2|X_1,X_3)}{SSE(X_1,X_3)}

SSR(X2X1,X3)=SSE(X1,X3)SSE(X1,X2,X3)SSR(X_2|X_1,X_3) = SSE(X_1,X_3)-SSE(X_1,X_2,X_3) = 992204 - 985530 = 6674

SSE(X1,X3)SSE(X_1,X_3) = 992204, so RY2132R^2_{Y2|13} = 6674/992204 = 0.006726

3

a

Fit = lm(Hours~Cases, data=Data)
summary(Fit)

So regression function is Y^=4080+0.0009355X1\hat Y = 4080 +0.0009355X_1 \

b

The regression function in 6.10a is Y=0.0007871X113.17X2+623.6X3+4150Y=0.0007871X_1-13.17X_2+623.6X3+4150

The coefficient β1\beta_1 is bigger than coefficient in 6.10a.

c

No, from question 1, SSR(X1)=136366,SSR(X1X2)=130697SSR(X_1) = 136366,SSR(X_1|X_2)=130697. It’s not substantial

d

The correlation of X1,X2X_1,X_2 is highest in all predictors, so the SSR(X1)andSSR(X1X2)SSR(X_1) and SSR(X_1|X_2) don’t have substantial difference.

4

a

To run a polynomial regression model on one or more predictor variables, it is advisable to first center the variables by subtracting the corresponding mean of each, in order to reduce the intercorrelation among the variables.

x1 <- Data$Cases - mean(Data$Cases)
x3 <- Data$Holiday - mean(Data$Holiday)
x1sq <- x1^2
x3sq <- x3^2
x1x3 <- x1 * x3
Grocery <- cbind( Data, x1, x3, x1sq, x3sq, x1x3 )
Poly <- lm(  Hours ~ x1 + x3 + x1sq + x3sq + x1x3, data=Grocery )
summary(Poly)

So the model is Y^=4367+8.61×104X1+623.7X31.154×109X128.87×105X1X3\hat Y = 4367+8.61 \times 10^{-4} X_1+623.7 X_3-1.154 \times 10^{-9}X_1^2 -8.87 \times 10^{-5} X_1 X_3

b

anova(Poly)
Fit7 <-lm(  Hours ~ x1 + x3, data=Grocery )
anova(Fit7)
qf(0.95,3,46)
H_0: \beta_3,\beta_4,\beta_5 =0, H_a: \text{not all }\beta_k in H_0 = 0\\ F^* = \frac{SSR(X_1^2,X_3^2,X_1X_3|X_1,X_3)/3}{SSE(X_1^2,X_3^2,X_1X_3,X_1,X_3)/(n-6)}\\ =\frac{(SSE(X_1,X_3)-SSE(X_1^2,X_3^2,X_1X_3,X_1,X_3))/3}{991173/46}\\ =\frac{(992204-991173)/3}{991173/46}\\ =0.01594945\\ F(0.95,3,46) = 2.806845, So \ F^* < F(0.95,3,46), \text{Do not reject H_0.}
pf(0.01594945,3,46)

p-value = 0.002785933


Share this post on:

Previous Post
Assignments of Statistic Inferences
Next Post
Linear Regression Assignment 6