· James Chen · school work · 3 min read · - views
Linear Regression Assignment 7
The seventh assignment of Linear Regression. The assignment is written in Rmarkdown, a smart syntax supported by RStudio helping with formula, plot visualization and plugin codes running.
The seventh assignment of Linear Regression. The assignment is written in Rmarkdown, a smart syntax supported by RStudio helping with formula, plot visualization and plugin codes running.
most recommend: click here for html version of assignment, you can see codes as well as plots.
You may also find the PDF Version of this assignment from github. Or if you can cross the fire wall, just see below:
1
a
# read the data
setwd('~/Desktop/三春/5线性回归分析/作业/HW7/')
Data<-read.table("hw7.txt")
names(Data) = c("Hours","Cases","Costs","Holiday")
Fit = lm(Hours~Cases+Costs+Holiday, data=Data)
anova(Fit)
SSTO = sum( anova(Fit)[,2] )
MSE = anova(Fit)[4,3]
SSR = sum( anova(Fit)[1:3,2] )
MSR = SSR / 3
SSE = anova(Fit)[4,2]From the table we have: ,
Fit2 = lm(Hours~Cases+Holiday, data=Data)
anova(Fit2)= 992204 - 985530 = 6674
b
From a we have:
c
Fit2 = lm(Hours~Cases+Costs, data=Data)
anova(Fit2)
Fit3 = lm(Hours~Costs+Cases, data=Data)
anova(Fit3)So we have
Yes, it is always true because:
2
From question1, We have \ So \ From homework6 we have , so \
Fit4 = lm(Hours~Costs, data=Data)
anova(Fit4)
Fit5 = lm(Hours~Cases, data=Data)
anova(Fit5)= 130697/3150741 = 0.04148\ = 5726/3025770 = 0.001892\
Fit6 = lm(Hours~Cases+Holiday, data=Data)
anova(Fit6)= 992204 - 985530 = 6674
= 992204, so = 6674/992204 = 0.006726
3
a
Fit = lm(Hours~Cases, data=Data)
summary(Fit)So regression function is \
b
The regression function in 6.10a is
The coefficient is bigger than coefficient in 6.10a.
c
No, from question 1, . It’s not substantial
d
The correlation of is highest in all predictors, so the don’t have substantial difference.
4
a
To run a polynomial regression model on one or more predictor variables, it is advisable to first center the variables by subtracting the corresponding mean of each, in order to reduce the intercorrelation among the variables.
x1 <- Data$Cases - mean(Data$Cases)
x3 <- Data$Holiday - mean(Data$Holiday)
x1sq <- x1^2
x3sq <- x3^2
x1x3 <- x1 * x3
Grocery <- cbind( Data, x1, x3, x1sq, x3sq, x1x3 )
Poly <- lm( Hours ~ x1 + x3 + x1sq + x3sq + x1x3, data=Grocery )
summary(Poly)So the model is
b
anova(Poly)Fit7 <-lm( Hours ~ x1 + x3, data=Grocery )
anova(Fit7)
qf(0.95,3,46)H_0: \beta_3,\beta_4,\beta_5 =0, H_a: \text{not all }\beta_k in H_0 = 0\\ F^* = \frac{SSR(X_1^2,X_3^2,X_1X_3|X_1,X_3)/3}{SSE(X_1^2,X_3^2,X_1X_3,X_1,X_3)/(n-6)}\\ =\frac{(SSE(X_1,X_3)-SSE(X_1^2,X_3^2,X_1X_3,X_1,X_3))/3}{991173/46}\\ =\frac{(992204-991173)/3}{991173/46}\\ =0.01594945\\ F(0.95,3,46) = 2.806845, So \ F^* < F(0.95,3,46), \text{Do not reject H_0.}pf(0.01594945,3,46)p-value = 0.002785933