The seventh assignment of Linear Regression. The assignment is written in Rmarkdown, a smart syntax supported by RStudio helping with formula, plot visualization and plugin codes running.
most recommend: click here for html version of assignment, you can see codes as well as plots.
You may also find the PDF Version of this assignment from github. Or if you can cross the fire wall, just see below:
1
a
# read the data
setwd('~/Desktop/三春/5线性回归分析/作业/HW7/')
Data<-read.table("hw7.txt")
names(Data) = c("Hours","Cases","Costs","Holiday")
Fit = lm(Hours~Cases+Costs+Holiday, data=Data)
anova(Fit)
SSTO = sum( anova(Fit)[,2] )
MSE = anova(Fit)[4,3]
SSR = sum( anova(Fit)[1:3,2] )
MSR = SSR / 3
SSE = anova(Fit)[4,2]
From the table we have: SSR(X1)=136366,SSE(X1,X2,X3)=985530
Fit2 = lm(Hours~Cases+Holiday, data=Data)
anova(Fit2)
SSR(X3∣X1)=2033565
SSR(X2∣X1,X3)=SSE(X1,X3)−SSE(X1,X2,X3) = 992204 - 985530 = 6674
b
H0:β2=0,Ha:β2=0.
From a we have:
SSR(X2∣X1,X3)=6674,SSE(X1,X2,X3)=985530F∗=985530/48(6674/1)=0.32491F(0.95,1,48)=4.04265If F∗⩽4.04265 concludeH9,otherwise concludeHaP−value=0.5713
c
Fit2 = lm(Hours~Cases+Costs, data=Data)
anova(Fit2)
Fit3 = lm(Hours~Costs+Cases, data=Data)
anova(Fit3)
So we have SSR(X2∣X1)+SSR(X1)=136366+5726=11395+130697=SSR(X1∣X2)+SSR(X2)
Yes, it is always true because: SSR(X2∣X1)+SSR(X1)=SSE(X1)−SSE(X1,X2)+SSR(X1)=SSTO−SSE(X1,X2)
SSR(X1∣X2)+SSR(X2)=SSE(X2)−SSE(X1,X2)+SSR(X2)=SSTO−SSE(X1,X2)
2
From question1, We have SSR(X1)=136366,SSR(X2)=5726,SSR=2176606,SSTO=3162136 \
So RY12=0.0431,RY22=0.00181,R2=0.6883 \
From homework6 we have r12=0.10059216, so R122=0.0101 \
Fit4 = lm(Hours~Costs, data=Data)
anova(Fit4)
Fit5 = lm(Hours~Cases, data=Data)
anova(Fit5)
RY1∣22=SSE(X2)SSR(X1∣X2) = 130697/3150741 = 0.04148\
RY2∣12=SSE(X1)SSR(X2∣X1) = 5726/3025770 = 0.001892\
Fit6 = lm(Hours~Cases+Holiday, data=Data)
anova(Fit6)
RY2∣132=SSE(X1,X3)SSR(X2∣X1,X3)
SSR(X2∣X1,X3)=SSE(X1,X3)−SSE(X1,X2,X3) = 992204 - 985530 = 6674
SSE(X1,X3) = 992204, so RY2∣132 = 6674/992204 = 0.006726
3
a
Fit = lm(Hours~Cases, data=Data)
summary(Fit)
So regression function is Y^=4080+0.0009355X1 \
b
The regression function in 6.10a is Y=0.0007871X1−13.17X2+623.6X3+4150
The coefficient β1 is bigger than coefficient in 6.10a.
c
No, from question 1, SSR(X1)=136366,SSR(X1∣X2)=130697. It’s not substantial
d
The correlation of X1,X2 is highest in all predictors, so the SSR(X1)andSSR(X1∣X2) don’t have substantial difference.
4
a
To run a polynomial regression model on one or more predictor variables, it is advisable to first center the variables by subtracting the corresponding mean of each, in order to reduce the intercorrelation among the variables.
x1 <- Data$Cases - mean(Data$Cases)
x3 <- Data$Holiday - mean(Data$Holiday)
x1sq <- x1^2
x3sq <- x3^2
x1x3 <- x1 * x3
Grocery <- cbind( Data, x1, x3, x1sq, x3sq, x1x3 )
Poly <- lm( Hours ~ x1 + x3 + x1sq + x3sq + x1x3, data=Grocery )
summary(Poly)
So the model is Y^=4367+8.61×10−4X1+623.7X3−1.154×10−9X12−8.87×10−5X1X3
b
anova(Poly)
Fit7 <-lm( Hours ~ x1 + x3, data=Grocery )
anova(Fit7)
qf(0.95,3,46)
H_0: \beta_3,\beta_4,\beta_5 =0, H_a: \text{not all }\beta_k in H_0 = 0\\
F^* = \frac{SSR(X_1^2,X_3^2,X_1X_3|X_1,X_3)/3}{SSE(X_1^2,X_3^2,X_1X_3,X_1,X_3)/(n-6)}\\
=\frac{(SSE(X_1,X_3)-SSE(X_1^2,X_3^2,X_1X_3,X_1,X_3))/3}{991173/46}\\
=\frac{(992204-991173)/3}{991173/46}\\
=0.01594945\\
F(0.95,3,46) = 2.806845, So \ F^* < F(0.95,3,46), \text{Do not reject H_0.}
pf(0.01594945,3,46)
p-value = 0.002785933