Subject – DATA VISUALIZATION , NEED A JUPYTER NOTEBOOK ONLY -

Description

Two assignments,

Don't use plagiarized sources. Get Your Custom Assignment on

Subject – DATA VISUALIZATION , NEED A JUPYTER NOTEBOOK ONLY

From as Little as $13/Page

First assignment is very simple , I’m attaching the requirement screenshots below. ASSIGN 1 HAS TWO PICTURES . THOSE ARE REQUIRMENTS . Follow 3.2 – PART 2- DATA VISULAZITION AND ONE MORE PIC HAS THE DATA SET DETAILS.

complete all the 10 points using MATPLOTLIB, SEABORN and also need to take DATA SET FROM

2nd Assignment – FINAL ASSIGNMENT

i’M ATTACHING PICTURE NAM AS FINAL PROJECT .

REQUIRMENT IS – NEED TO TAKE A DATA SET AND DO A COMPLETE DATA EXPLORATION AND ANALYSIS USING

MATPLOTLIB, SEABORN, PLOTLY, NUMPY,PANDAS

2. NEED A VERY GOOD PPT FOR THE SAME FINAL PROJECT . EXPLAINING ABOUT DATA VISUALIZTION

Unformatted Attachment Preview

Assignment 2: Data Analysis
and Visualization with Python
1. Objective: Explore Washington D.C.
Bike Rental Dataset
source
The dataset can be downloaded from Kaggle here (you only need the train.csv). It
provides hourly bike rental numbers in Washington D.C. for the years 2011 and
2012. The objective is to explore the effect that different weather and temporal
factors have on the number of bikes rented.
2. Data Description
datetime – hourly date + timestamp
season – 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday – whether the day is considered a holiday
workingday – whether the day is neither a weekend nor holiday
Assignment 2: Data Analysis and Visualization with Python
1
weather – 1: Clear, Few clouds, Partly cloudy, Partly cloudy – 2: Mist + Cloudy, Mist +
Broken clouds, Mist + Few clouds, Mist – 3: Light Snow, Light Rain + Thunderstorm +
Scattered clouds, Light Rain + Scattered clouds – 4: Heavy Rain + Ice Pallets +
Thunderstorm + Mist, Snow + Fog
temp – temperature in Celsius
atemp – “feels like” temperature in Celsius
humidity – relative humidity
windspeed – wind speed
casual – number of non-registered user rentals initiated
registered – number of registered user rentals initiated
count – number of total rentals
3. Tasks
3.1 Part I: Data Manipulation and Analysis
1. Import the dataset into a pandas dataframe. Make sure that the date column is in
pandas date time format.
2. Check the data type of each column. How many rows are there in the dataset ?
Does the dataset contain any missing values ?
3. Using the date column, create new columns for: year, month, day of the week
and hour of the day.
4. Rename the values in the season column to spring, summer, fall and winter.
5. Calculate the total number of casual and registered bikes rented in the years
2011 and 2012.
6. Calculate the mean of the hourly total rentals count by season. Which season
has the highest mean ?
7. Are more bikes rented by registered users on working or non-working days ?
Does the answer differ for non-registered users ? Is the answer the same for
both years ?
8. Which months in the year 2011 have the highest and the lowest total number of
bikes rented ? Repeat for the year 2012.
Assignment 2: Data Analysis and Visualization with Python
2
9. Which type of weather have the highest and lowest mean of the hourly total
rentals count ?
10. Calculate the correlation between the hourly total rentals count and all the
numerical columns in the dataset. Which column has the highest correlation with
the total rentals count ?
11. Create a new categorical column called day_period, which can take four
possible values: night, morning, afternoon and evening. These values
correspond to the following binning of the hour column: 0-6: night, 6-12: morning,
12-6: afternoon, 6-24:evening.
12. Generate a pivot table for the mean of the hourly total rentals count, with the
index set to the day period and the column set to the working day column. What
can you observe from the table ?
3.2 Part II: Data Visualisation
1. Plot the distributions of all the numerical columns in the dataset using
histograms.
2. Plot the distributions of all the numerical columns in the dataset using box plots.
3. Plot the the mean of the hourly total rentals count for working and non-working
days.
4. Plot the the mean of the hourly total rentals count for the different months for
both years combined.
5. Plot the the mean of the hourly total rentals count for the different months for
both years separately in a multi-panel figure.
6. Plot the the mean and the 95% confidence interval of the hourly total rentals
count for the four different weather categories. What can you observe ?
7. Plot the the mean of the hourly total rentals count versus the hour of the day.
Which hours of the day have the highest rentals count ?
8. Repeat the plot in 7 for different days of the week. What patterns can you
observe ?
9. Repeat the plot in 8 for the four seasons using a multi-panel figure. What
patterns can you observe ?
10. Plot the the mean and the 95% confidence interval of the hourly total rentals
count versus the period of the day column, which you created in the first part of
Assignment 2: Data Analysis and Visualization with Python
3
the assignment. Which period of the day has the highest rentals count ? Does
this peak period differ for working and non-working days ?
11. Plot a heatmap for the correlation matrix of the dataset numerical variables.
What observations can you make ?
Assignment 2: Data Analysis and Visualization with Python
4
Assignment 2: Data Analysis
and Visualization with Python
1. Objective: Explore Washington D.C.
Bike Rental Dataset
source
The dataset can be downloaded from Kaggle here (you only need the train.csv). It
provides hourly bike rental numbers in Washington D.C. for the years 2011 and
2012. The objective is to explore the effect that different weather and temporal
factors have on the number of bikes rented.
2. Data Description
datetime – hourly date + timestamp
season – 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday – whether the day is considered a holiday
workingday – whether the day is neither a weekend nor holiday
Assignment 2: Data Analysis and Visualization with Python
1
weather – 1: Clear, Few clouds, Partly cloudy, Partly cloudy – 2: Mist + Cloudy, Mist +
Broken clouds, Mist + Few clouds, Mist – 3: Light Snow, Light Rain + Thunderstorm +
Scattered clouds, Light Rain + Scattered clouds – 4: Heavy Rain + Ice Pallets +
Thunderstorm + Mist, Snow + Fog
temp – temperature in Celsius
atemp – “feels like” temperature in Celsius
humidity – relative humidity
windspeed – wind speed
casual – number of non-registered user rentals initiated
registered – number of registered user rentals initiated
count – number of total rentals
3. Tasks
3.1 Part I: Data Manipulation and Analysis
1. Import the dataset into a pandas dataframe. Make sure that the date column is in
pandas date time format.
2. Check the data type of each column. How many rows are there in the dataset ?
Does the dataset contain any missing values ?
3. Using the date column, create new columns for: year, month, day of the week
and hour of the day.
4. Rename the values in the season column to spring, summer, fall and winter.
5. Calculate the total number of casual and registered bikes rented in the years
2011 and 2012.
6. Calculate the mean of the hourly total rentals count by season. Which season
has the highest mean ?
7. Are more bikes rented by registered users on working or non-working days ?
Does the answer differ for non-registered users ? Is the answer the same for
both years ?
8. Which months in the year 2011 have the highest and the lowest total number of
bikes rented ? Repeat for the year 2012.
Assignment 2: Data Analysis and Visualization with Python
2
9. Which type of weather have the highest and lowest mean of the hourly total
rentals count ?
10. Calculate the correlation between the hourly total rentals count and all the
numerical columns in the dataset. Which column has the highest correlation with
the total rentals count ?
11. Create a new categorical column called day_period, which can take four
possible values: night, morning, afternoon and evening. These values
correspond to the following binning of the hour column: 0-6: night, 6-12: morning,
12-6: afternoon, 6-24:evening.
12. Generate a pivot table for the mean of the hourly total rentals count, with the
index set to the day period and the column set to the working day column. What
can you observe from the table ?
3.2 Part II: Data Visualisation
1. Plot the distributions of all the numerical columns in the dataset using
histograms.
2. Plot the distributions of all the numerical columns in the dataset using box plots.
3. Plot the the mean of the hourly total rentals count for working and non-working
days.
4. Plot the the mean of the hourly total rentals count for the different months for
both years combined.
5. Plot the the mean of the hourly total rentals count for the different months for
both years separately in a multi-panel figure.
6. Plot the the mean and the 95% confidence interval of the hourly total rentals
count for the four different weather categories. What can you observe ?
7. Plot the the mean of the hourly total rentals count versus the hour of the day.
Which hours of the day have the highest rentals count ?
8. Repeat the plot in 7 for different days of the week. What patterns can you
observe ?
9. Repeat the plot in 8 for the four seasons using a multi-panel figure. What
patterns can you observe ?
10. Plot the the mean and the 95% confidence interval of the hourly total rentals
count versus the period of the day column, which you created in the first part of
Assignment 2: Data Analysis and Visualization with Python
3
the assignment. Which period of the day has the highest rentals count ? Does
this peak period differ for working and non-working days ?
11. Plot a heatmap for the correlation matrix of the dataset numerical variables.
What observations can you make ?
Assignment 2: Data Analysis and Visualization with Python
4
Final Project
1. Objective

The objective of this final project is to analyze and visualize a real-world dataset of your
choice by incorporating the skills acquired in this course. You should aim to choose a
dataset that you find particularly interesting, as this will help you come up with
meaningful analytical questions about it as connect those questions into a story. Your
analysis should contain 7 – 10 analytical questions, each explored through a
visualization (i.e. 7 – 10 visualizations in total).
2. Deliverables and Due Date
There are two deliverable to this project:

1. A Jupyter notebook containing your analysis and visualizations. The due date for
this deliverable is Wednesday 24.01 till 23:59.
Please deliver your
notebook itself, a PDF version of it and your dataset in a 1-1 message to me on
Microsoft Teams.
Final Project
1
2. A 8 – 10 minutes presentation summarizing your analysis, showcasing your
visualizations and highlighting your major findings. You should aim to structure your
presentation as a story about your data.
Presentations will be held in random order on
Friday 26.01 at 14:00 on campus (P Auditorium). Please bring your laptop with
you to present.
3. Grading
✅
The final project deliverables make 40% of your overall grade for this course.
Each of the two deliverables will hold equal weight in the project grading (i.e. 20% each
of the overall course grade).
The following aspects are regarded:
Timely delivery (10%)
Appropriate dataset selection (10%)
Methods/Ideas/Procedure (40%)
Correctness (20%)
Overall quality of deliverables (20%)
4. Helpful resources

Possible sources to find an idea for your dataset include:
UCI Machine Learning Repository
Kaggle Datasets
Google Dataset Search
Amazon AWS Public Datasets
Data.gov
World Bank Open Data
FiveThirtyEight Datasets
Final Project
2
European Data Portal
Final Project
3

Purchase answer to see full
attachment

Related Questions: