Description
Select one dataset from the datasets provided in the bellow link.
For 28 Data Analysis Projects to Boost Your Skills [2023 Guide]:
https://www.springboard.com/blog/data-analytics/data-analysis-projects/
For more free public datasets for EDA:
https://www.tableau.com/learn/articles/free-public-data-sets
After the dataset is selected (or assigned), analyze the data using Microsoft Excel to discover the structure of data, trends, patterns, or any anomalies in the data based on your own hypothesis.
Perform the following six tasks.
You should use visualization to aid your answers.
Your project will include two main parts:
The final project report which must incorporate all the following 6 tasks and written using the provided template. (10 marks distributed among the below tasks).
A presentation that illustrates your 6 tasks completed in the project. (4 marks)
==========================================================
Task 1: Understand and describe the nature and structure of the selected dataset. (2 marks)
Describe the dataset. Your description should answer the following questions: is it reliable? how was it collected? What its size?
Identify the features of dataset.
Propose hypothesis / assumptions (between 2 numerical variables) to validate.
Task 2: Check if your selected features have any of the following issues. Describe how you conducted the tests and how you addressed the issues. Support your answers with screenshots of the issues before and after the fixes. (1 mark)
Missing values (0.25 for the test, fix and screenshot)
Duplicate values (0.25)
Data outliers (0.25)
Any noise or irregularities (0.25)
Task 3: Provide descriptive statistics for the selected features using statistical method to understand the dataset more and answer the following analysis questions: (2 marks)
Include any of the measure of central tendency such as the mean, median, and mode.
Describe the spread of your data. This may include the measure of variance, standard deviation, skewness, and kurtosis.
(You are encouraged to impose other analysis questions based on any trend you notice in the dataset).
Task 4: Validate the hypothesis in Task 1 by investigating the relationship between two quantitative variables you have chosen using correlation, regression and R-squared with possible conclusions. (2 marks)
Task 5: Show visual representation of your analysis (hint: use the right chart/graph for your data analysis). (1 mark)
Task 6: Build an active Dashboard which summarizes the most crucial factors (variables) that will help in decision-making process, and then demonstrate the effectiveness of your selection of those factors in the decision-making process. (2 marks)
Project Report
Unformatted Attachment Preview
College of Computing and Informatics
Project
Deadline: Tuesday 05/12/2023 @ 23:59
[Total Mark for this Project is 14]
Group Details:
CRN:
Name:
ID:
Name:
ID:
Name:
ID:
Instructions:
• You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
Blackboard via the allocated folder. These files must not be in compressed format.
• It is your responsibility to check and make sure that you have uploaded both the correct files.
• Zero mark will be given if you try to bypass the SafeAssign (e.g., misspell words, remove spaces between
words, hide characters, use different character sets, convert text into image or languages other than English
or any kind of manipulation).
• Email submission will not be accepted.
• You are advised to make your work clear and well-presented. This includes filling your information on the cover
page.
• You must use this template, failing which will result in zero mark.
• You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
the question.
• Late submission will result in ZERO mark.
• The work should be your own, copying from students or other resources will result in ZERO mark.
• Use Times New Roman font for all your answers.
Project
Pg. 01
Learning Outcome(s):
CLO 1, 2, 5
1, Demonstrate an
understanding of the
concepts of decision
analysis and decision
support systems (DSS)
including probability,
modelling, decisions under
uncertainty, and real-world
problems.
2, Describe advanced
Business Intelligence,
Business Analytics, Data
Visualization, and
Dashboards.
5, Improve hands-on skills
using Excel, and Orange
for building Decision
Support Systems.
Project
14 Marks
Students can form groups consisting of three students and send their names to their
instructors before 5th October 2023. Otherwise, the instructors will form the groups
randomly and assign any datasets to the groups.
Select one dataset from the datasets provided in the bellow link.
For 28 Data Analysis Projects to Boost Your Skills [2023 Guide]:
https://www.springboard.com/blog/data-analytics/data-analysis-projects/
For more free public datasets for EDA:
https://www.tableau.com/learn/articles/free-public-data-sets
✓
After the dataset is selected (or assigned), analyze the data using Microsoft Excel
to discover the structure of data, trends, patterns, or any anomalies in the data based on
your own hypothesis.
✓
Perform the following six tasks.
✓
You should use visualization to aid your answers.
Your project will include two main parts:
1.
The final project report which must incorporate all the following 6 tasks and
written using the provided template. (10 marks distributed among the below tasks).
2.
A presentation that illustrates your 6 tasks completed in the project. (4 marks)
==========================================================
Task 1: Understand and describe the nature and structure of the selected dataset. (2
marks)
•
Describe the dataset. Your description should answer the following questions:
is it reliable? how was it collected? What its size?
Project
Pg. 02
•
Identify the features of dataset.
•
Propose hypothesis / assumptions (between 2 numerical variables) to validate.
Task 2: Check if your selected features have any of the following issues. Describe how you
conducted the tests and how you addressed the issues. Support your answers with screenshots of
the issues before and after the fixes. (1 mark)
•
Missing values (0.25 for the test, fix and screenshot)
•
Duplicate values (0.25)
•
Data outliers (0.25)
•
Any noise or irregularities (0.25)
Task 3: Provide descriptive statistics for the selected features using statistical method to
understand the dataset more and answer the following analysis questions: (2 marks)
•
Include any of the measure of central tendency such as the mean, median, and mode.
•
Describe the spread of your data. This may include the measure of variance, standard
deviation, skewness, and kurtosis.
(You are encouraged to impose other analysis questions based on any trend you notice in
the dataset).
Task 4: Validate the hypothesis in Task 1 by investigating the relationship between two
quantitative variables you have chosen using correlation, regression and R-squared with possible
conclusions. (2 marks)
Task 5: Show visual representation of your analysis (hint: use the right chart/graph for your data
analysis). (1 mark)
Task 6: Build an active Dashboard which summarizes the most crucial factors (variables) that
will help in decision-making process, and then demonstrate the effectiveness of your selection
of those factors in the decision-making process. (2 marks)
Project
Pg. 03
Project Report
Purchase answer to see full
attachment