Few questions related to R

Description

Please help me with these questions related to R. And provide the answer in R markdown PDF file. Thank you!

Don't use plagiarized sources. Get Your Custom Assignment on
Few questions related to R
From as Little as $13/Page

Study of Teenage Gambling in Britain

The teengamb dataset contains a survey conducted to study teenage gambling in Britain. The dataset has 47 observations and five variables:

sex: 0 = male, 1 = female
status: Socioeconomic status score based on parents’ occupation
income: income in pounds per week
verbal: verbal score in words out of 12 correctly defined
gamble: expenditure on gambling in pounds per year

Use the following statement to access the data: data(teengamb, package= “faraway”)

Question 1

Make sure to convert any categorical variable to a factor.

Create a regression with gamble as the outcome variable and sex, status, income, and verbal as predictors.

Hint: See Lesson 3, Slide 48.

Question 2

Create a standardized residuals vs. fitted plot. Do you think the variance is constant?

Hint: See Lesson 6, Slide 11.

Question 3

Create a Quantile-Quantile plot and a histogram based on the standardized residuals. Does the distribution of residuals look normal?

Hint: Go to Lesson 6, Slide 28.

Question 4

Print all the standardized residuals.

Are there any observations with standardized residuals greater than 3 or smaller than -3? If so, which ones? What are their standardized residuals?

Hint: Use the rstandard function

Question 5

Identify points with leverages that are at least two times the average leverage.

Did you find any points? If so, which points did you find?

Hint: Go to Lesson 6, Slide 22.

Question 6

Detect outliers using studentized residuals. Use the Bonferroni correction.

Did you detect any outliers? If so, show the studentized residual of any outlier you found.

Hint: Go to Lesson 6, Slide 40.

Question 7

Show the values of any outlier observation you found in question 6 (i.e., show the entire row of each such observation).

Hint: Assuming that observation 5 is an outlier, you can use teengamb[5,]

Question 8

Use Cook’s distances to search for influential points. Does any point have a Cook’s distance above 0.5?

Hint: See Lesson 6, Slide 48.

Question 9

Create a half normal plot of the Cook’s distances. Which four observations have the highest Cook’s distances?

Hint: See Lesson 6, Slide 48.


Unformatted Attachment Preview

Regression 1: Linear Regression and Modeling
Lesson 6: Diagnostics
(Version: 27 February 2024)
Erez Hatna
School of Global Public Health
New York University
Objectives
Part 1: Checking the Equal Variance Assumption
Part 2: Leverages
Part 3: Raw, Standardized Residuals and Normality
Part 4: Outliers and Studentized Residuals
Part 5: Influential Observations
Part 1: Checking the Equal Variance
Assumption
Checking the Assumptions of a Regression Model

Estimation and inference from a model depend upon assumptions. These assumptions should be checked.

Assumptions:
• ε ~ N(0, 2 ) and that the errors are independent.
• Structural part of the model
is correct.
• Unusual observations do not unduly affect the model.

Assumption checking may lead to revisions to the model – model building is really an iterative process.
Checking Error Assumptions
• Independence, constant variance, normality of ε = (ε1 , ε2 , …., ε )
• Consider ε = (ε1 , ε2 , …., ε ) versus ො = (ොε1 ,ොε2 , ….,ොε )
What is the difference?
εො = – yො where yො = β෠ 0 + β෠ 1 1 + β෠ 2 2
Note that we are not obtaining the errors from β0 + β1 1 + β2 2
The difference is that one is observable, and the other is not observable.
• Cannot check (ε1 , ε2 , …., ε ) but we can check (ොε1 ,ෝε2 , …., εො )
Checking Constant Variance: ε ~ N(0, )
• Constant variance means that the variation around the regression line is independent of the { }.
• Need to check this assumption – check whether the residuals are related to the { }.
• Useful approach to testing this assumption is by examining a plot of the residuals against the fitted values: ො

versus
• Constant variation (homoscedasticity) is desirable; heteroscedasticity (nonconstant variation) is undesirable.
Residuals vs. Fitted Plots
The Savings Data
The savings dataset contains personal saving rates in 50 countries. It contains the following variables:
• sr: saving rate – personal saving divided by disposable income
• pop15: percent population under the age of 15
• pop75: percent population over the age of 75
• dpi: per-capita disposable income in dollars
• ddpi: percent growth rate of dpi
The data is available from the faraway package. Let us look at the beginning of the table:
data(savings,package=”faraway”)
head(savings)
sr pop15 pop75
dpi ddpi
Australia 11.43 29.35 2.87 2329.68 2.87
Austria
12.07 23.32 4.41 1507.99 3.93
Belgium
13.17 23.80 4.43 2108.47 3.82
Bolivia
5.75 41.89 1.67 189.13 0.22
Brazil
12.88 42.19 0.83 728.47 4.56
Canada
8.79 31.72 2.85 2982.88 2.43
Fitting a Regression Model
Let us fit a regression in which the savings rate (sr) is the outcome, and the rest of the variables are predictors:
lmod |t|)
(Intercept) 28.5660865 7.3545161
3.884 0.000334 ***
pop15
-0.4611931 0.1446422 -3.189 0.002603 **
pop75
-1.6914977 1.0835989 -1.561 0.125530
dpi
-0.0003369 0.0009311 -0.362 0.719173
ddpi
0.4096949 0.1961971
2.088 0.042471 *
–Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.803 on 45 degrees of freedom
Multiple R-squared: 0.3385,
Adjusted R-squared:
F-statistic: 5.756 on 4 and 45 DF, p-value: 0.0007904
0.2797
Creating a Residuals vs. Fitted Plot
Let us create a plot with the fitted (predicted) values on the x-axis and the residuals on the y-axis:
plot(fitted(lmod), residuals(lmod), xlab = “Fitted”,ylab = “Residuals”, col = “blue”)
abline(h=0, col = “red”)
We see no cause for alarm in this plot.
Scale Location Plot: Another View
If we would like to examine the constant variance assumption more closely, it helps to plot the square root of the
෡ against :
absolute residual ( | |)

plot(fitted(lmod),sqrt(abs(residuals(lmod))),
xlab = “Fitted”,
ylab = expression(sqrt(abs(hat(epsilon)))),
col = “blue”)
The plot looks satisfactory.
Creating a Standardized Residuals vs. Fitted Plot
In Part 3 of this presentation, We will learn about Standardized Residuals. Let us create a plot
with the fitted (predicted) values on the x-axis and the standardized residuals on the y-axis:
plot(fitted(lmod), rstandard(lmod), xlab = “Fitted”,ylab = “Standardized Residuals”,
col = “blue”) abline(h=0, col = “red”)
As before we see no cause for alarm in this plot.
Numerical Test for Nonconstant Variance
෡ against ො :
A quick numerical test to check nonconstant variance can be achieved by regressing | |
lmodResFit |t|)
(Intercept)
2.16216
0.34788
6.215 1.17e-07 ***
fitted(lmod) -0.06137
0.03476 -1.766
0.0838 .
–Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6341 on 48 degrees of freedom
Multiple R-squared: 0.06099,
Adjusted R-squared:
F-statistic: 3.117 on 1 and 48 DF, p-value: 0.08382
0.04142
Simulations of Plots With Constant Variance
• It is often hard to judge residual plots
without prior experience, so it is helpful to
generate some artificial plots where the
true relationship is known.
• The following 9 plots show the case of
constant variance:
par(mfrow=c(3,3))
n
Purchase answer to see full
attachment