The seventh assignment of Linear Regression. The assignment is written in Rmarkdown, a smart syntax supported by RStudio helping with formula, plot visualization and plugin codes running.
most recommend: click here for html version of assignment, you can see codes as well as plots.
You may also find the PDF Version of this assignment from github. Or if you can cross the fire wall, just see below:
1
a
1 | # read the data |
From the table we have: $SSR(X_1) = 136366$,$SSE(X_1,X_2,X_3) = 985530$
1 | Fit2 = lm(Hours~Cases+Holiday, data=Data) |
$SSR(X_3|X_1) = 2033565$
$SSR(X_2|X_1,X_3) = SSE(X_1,X_3)-SSE(X_1,X_2,X_3)$ = 992204 - 985530 = 6674
b
From a we have:
c
1 | Fit2 = lm(Hours~Cases+Costs, data=Data) |
So we have $SSR(X_2|X_1) +SSR(X_1) = 136366 + 5726 = 11395+130697 =SSR(X_1|X_2) +SSR(X_2)$
Yes, it is always true because: $SSR(X_2|X_1)+SSR(X_1) = SSE(X_1) - SSE(X_1,X_2) +SSR(X_1) = SSTO -SSE(X_1,X_2)$
$SSR(X_1|X_2)+SSR(X_2) = SSE(X_2) - SSE(X_1,X_2) +SSR(X_2) = SSTO -SSE(X_1,X_2)$
2
From question1, We have $SSR(X1) = 136366,SSR(X_2) = 5726 , SSR = 2176606,SSTO = 3162136$ \
So $R^2{Y1} = 0.0431,R^2{Y2} =0.00181,R^2 = 0.6883$ \
From homework6 we have $r_12 = 0.10059216$, so $R^2{12} =0.0101 $ \
1 | Fit4 = lm(Hours~Costs, data=Data) |
$R^2{Y1|2} = \frac{SSR(X_1|X_2)}{SSE(X_2)}$ = 130697/3150741 = 0.04148\
$R^2{Y2|1} = \frac{SSR(X_2|X_1)}{SSE(X_1)}$ = 5726/3025770 = 0.001892\
1 | Fit6 = lm(Hours~Cases+Holiday, data=Data) |
$R^2_{Y2|13} = \frac{SSR(X_2|X_1,X_3)}{SSE(X_1,X_3)}$
$SSR(X_2|X_1,X_3) = SSE(X_1,X_3)-SSE(X_1,X_2,X_3)$ = 992204 - 985530 = 6674
$SSE(X1,X_3)$ = 992204, so $R^2{Y2|13}$ = 6674/992204 = 0.006726
3
a
1 | Fit = lm(Hours~Cases, data=Data) |
So regression function is $\hat Y = 4080 +0.0009355X_1$ \
b
The regression function in 6.10a is $Y=0.0007871X_1-13.17X_2+623.6X3+4150$
The coefficient $\beta_1$ is bigger than coefficient in 6.10a.
c
No, from question 1, $SSR(X_1) = 136366,SSR(X_1|X_2)=130697$. It’s not substantial
d
The correlation of $X_1,X_2$ is highest in all predictors, so the $SSR(X_1) and SSR(X_1|X_2)$ don’t have substantial difference.
4
a
To run a polynomial regression model on one or more predictor variables, it is advisable to first center the variables by subtracting the corresponding mean of each, in order to reduce the intercorrelation among the variables.
1 | x1 <- Data$Cases - mean(Data$Cases) |
So the model is $\hat Y = 4367+8.61 \times 10^{-4} X_1+623.7 X_3-1.154 \times 10^{-9}X_1^2 -8.87 \times 10^{-5} X_1 X_3$
b
1 | anova(Poly) |
1 | Fit7 <-lm( Hours ~ x1 + x3, data=Grocery ) |
1 | pf(0.01594945,3,46) |
p-value = 0.002785933