SAS to R Notes
How I did part of a SAS assignment in R
I was working on a SAS assignment for my Regression Analysis class. The residual plot in SAS was corrupted, luckily I was able to recreate it in R. (And actually it looks better in R too.)
# read in data from file
gpa.data <- read.table("GPA.txt");
# create multiple linear regression model
gpa.fit <- lm(V4 ~ V1 + V2 + V3 + V5 + V6 + V7 + V8, data=gpa.data)
# save residuals to a variable
gpa.resid <- residuals(gpa.fit)
# save Predicted values to a variable
gpa.yhat <- fitted.values(gpa.fit)
# create png file, that plot will be saved to.
#png("resid.png")
# create a plot of residuals vs predicted values
plot(gpa.yhat,gpa.resid, ylab="Residuals", xlab="Predicted Values of Cum GPA", main="Plot of Residuals*Predicted Values")
# create a line#
abline(0,0)
# write plot to file
#dev.off()
ANOVA table of the Multiple Regression Model
Here is R’s ANOVA command. Run it on the regression model gpa.fit.
anova(gpa.fit)
Here is the output of the ANOVA command.
Analysis of Variance Table
Response: V4
Df Sum Sq Mean Sq F value Pr(> F)
V1 1 0.567 0.5669 1.8709 0.172001
V2 1 26.488 26.4877 87.4135 < 2.2e-16 ***
V3 1 9.096 9.0964 30.0194 6.839e-08 ***
V5 1 2.446 2.4459 8.0717 0.004683 **
V6 1 1.032 1.0324 3.4069 0.065524 .
V7 1 0.020 0.0199 0.0655 0.798068
V8 1 0.278 0.2784 0.9187 0.338288
Residuals 492 149.084 0.3030
---
codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The stars in R’s output are the signoificance code. In the above summary output, V2, V3, V5 are very significant. V1, V7, and V8 are not very significant; the regression model should be tested without them.
Summary of Linear Regression Model
summary(gpa.fit)
Call:
lm(formula = V4 ~ V1 + V2 + V3 + V5 + V6 + V7 + V8, data = gpa.data)
Residuals:
Min 1Q Median 3Q Max
-3.3345 -0.3387 -0.0059 0.3429 1.9661
Coefficients:
Estimate Std. Error t value Pr(> |t|)
(Intercept) 1.7792451 0.5165056 3.445 0.00062 ***
V1 -0.0383924 0.0527532 -0.728 0.46710
V2 0.2716557 0.0417213 6.511 1.84e-10 ***
V3 0.0010404 0.0003173 3.279 0.00111 **
V5 0.0010332 0.0003425 3.016 0.00269 **
V6 -0.0391177 0.0239319 -1.635 0.10279
V7 -0.0006554 0.0028471 -0.230 0.81805
V8 -0.0054118 0.0056462 -0.958 0.33829
---
codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5505 on 492 degrees of freedom
Multiple R-squared: 0.2112, Adjusted R-squared: 0.2
F-statistic: 18.82 on 7 and 492 DF, p-value: < 2.2e-16