Description
solve in R
Unformatted Attachment Preview
Assignment 4 (Simple Linear Regression Analysis)
Problem 1: You are a data analyst working for a school district. The school district is interested in
understanding the relationship between the number of hours students study and their exam scores. You
are given a dataset exam_data.csv containing two columns: Hours_Studied (representing the number of
hours a student studied) and Exam_Score (representing the corresponding exam score).
1. Load the dataset by importing it from the csv file (exam_data.csv) into R and view the first few
rows of the dataset.
2. Explore the data by displaying the summary statistics of the dataset and creating a scatter plot
to visualize the relationship between Hours_Studied and Exam_Score.
3. Perform a simple linear regression to predict Exam_Score based on Hours_Studied.
4. Display the regression summary then add the line of best fit to the scatterplot.
5. Give an interpretation of the coefficients in the context of the problem, then discuss the
strength and direction of the relationship between the number of hours studied and exam
scores.
6. Use the regression equation to predict the exam score for a student who studied for 8 hours.
7. Calculate the coefficient of determination to assess the goodness of fit of the model. Discuss
what the value indicates about the model’s performance.
Problem 2: You are working for a retail company and are tasked with building a predictive model to
estimate monthly sales based on the amount spent on advertising in dollars. You are given a dataset
sales_data.csv containing two columns: Advertising_Spend (representing the advertising spend in
dollars) and Monthly_Sales (representing the corresponding monthly sales in dollars).
1. Import the dataset from the CSV file (sales_data.csv) into R.
2. Create a scatter plot to visualize the relationship between Advertising_Spend and
Monthly_Sales.
3. Perform a simple linear regression to predict Monthly_Sales based on Advertising_Spend.
4. Calculate the residuals (the differences between observed and predicted sales).
5. Create a scatter plot of residuals against advertising spend to check for patterns. Add horizontal
line at y = 0. Discuss what the pattern (or lack of it) in the residual plot indicates about the
regression model.
6. Use the regression equation to predict the monthly sales for new advertising spends of $10,000,
$15,000, and $20,000.
sales_data
Advertising_SpendMonthly_Sales
100
1500
200
2200
300
2800
400
3500
500
4200
600
4800
700
5500
800
6200
900
6800
1000
7500
1100
8200
1200
8800
1300
9500
1400
10200
1500
10900
1600
11500
1700
12200
1800
12900
1900
13600
2000
14200
2100
14900
2200
15600
2300
16200
2400
16900
2500
17600
2600
18200
2700
18900
2800
19600
2900
20300
3000
20900
1
exam_data
Hours_Studied Exam_Score
1.5
60
2
65
2.5
70
3
75
3.5
80
4
85
4.5
90
5
92
5.5
94
6
97
6.5
98
7
99
7.5
99.5
8
99.8
8.5
99.9
9
99.9
9.5
100
10
100
6
89
6
94
1
Purchase answer to see full
attachment