Data Analytics Question

Description

Please complete the Python code according to the content and requirements of the PDF.

Don't use plagiarized sources. Get Your Custom Assignment on
Data Analytics Question
From as Little as $13/Page

Unformatted Attachment Preview

Lab Assignment 1: Text Processing Techniques
Objective: In this lab assignment, you will use some basic text processing techniques and
Pandas to explore the Yelp Review data and examine the language of positive and negative
restaurant reviews.
Submission: Submit one file LA1_Lastname_FirstName.ipynb, a Python Notebook containing
the code and answers required by this assignment.
Input: restaurant_reviews_az.csv
Use beginning text to include assignment title, author name, your ASU ID, and the file creation
date (5%).
Create code chunks to meet the following requirements.
• Code Cell 1 (5%) – Library and data import. Show the summary of the input data.
• Code Cell 2 (5%) -Select the 1 star reviews and 5 star reviews from the dataset.
• Code Cell 3 (30%) – Apply necessary text processing techniques on the selected reviews.
• Code Cell 4 (10%) – Find the top 20 frequently used nouns in 1 star reviews and 5 star
reviews, respectively.
• Code Cell 5 (10%) – Find the top 20 frequently used adjectives in 1 star reviews and 5
star, respectively.
• Code Cell 6 (10%) – Find the top 20 frequently used verbs in 1 star reviews and 5 star
reviews, respectively.
• Code Cell 7 (10%) – Find the top 20 frequently used named entities from the selected
reviews.
• Text Cell 8 (10%) – Write one paragraph about your observation on the language of 1
star and 5 star reviews. What is the key to a good restaurant experience?
• Text Cell 9 (5%) – Acknowledge if you have used any GenAI tools in this assignment and
anyone you have worked together with on this assignment.
For each code cell:
• Add comment lines for each code requirement item.
Onsite MSBA Applied Project, Spring 2024
W.P. Carey, ASU
Project Plan
Team 463
Topic
Client
Darren Brito, Anne Silvia Dasi, Allen Guo, Pratiksha Pawar, Amy Wu
City of Mesa, Nicholas Labella
~1~
Contents
1.
2.
Project Objective………………………………………………………………………………………………………………….. 3
Project Plan …………………………………………………………………………………………………………………………. 3
2.1
Project Timeline ……………………………………………………………………………………………………………. 3
2.2
Project Execution ………………………………………………………………………………………………………….. 3
2.3
Project Stakeholder Meetings …………………………………………………………………………………………. 4
3. Project Resources …………………………………………………………………………………………………………………. 4
3.1
Stakeholders / Personnel ………………………………………………………………………………………………… 4
3.2
Data and Facility Resources ……………………………………………………………………………………………. 5
4. Risks and Issues …………………………………………………………………………………………………………………… 5
4.1
Item: Name of the risk or issue ……………………………………………………………………………………….. 6
4.2
Item: Name of the risk or issue ……………………………………………………………………………………….. 6
~2~
Text in Blue Italics is instructional and should be deleted or replaced. Text in normal, black font (e.g.,
title, headings) is intended to be retained.
1.
Project Objective
On average, over 500 traffic crashes are reported monthly to Mesa PD. Building on prior data analysis
projects, further identify factors that contribute to traffic accidents including traffic volumes, speed,
location, time of day, day of week, time of year and predict number of future accidents and their
locations.
The objective of this project is to explore the role of location, such as proximity to schools and retail
areas, play in traffic accidents. The project aims to discover what construction permits and traffic signs
contribute to traffic accidents and traffic volumes.
2.
Project Plan
2.1
Project Timeline
Please refer to the Onsite MSBA Applied Project Timeline.
Meeting/Phase Date/TimeFrame ASU
City of Mesa
Attendees
Attendees
Organize/Data Jan 15-Feb 2
Team 463
Cleaning
Data
Feb 5-Feb 23
Team 463
Cleaning/EDA
Midterm
March 1, 2024
Team 463,
Presentation
(Proposed)
Faculty
Advisors
Feedback
Mar 4-Mar 22
Implementation
Visualization
Mar 25-Apr 12
Presentation
Apr 15-Apr 25
Prep
Final
Apr 26, 2024
Presentation
(Proposed)
2.2
Medium
Various
Project Execution
Purpose: Identify all Tasks required to complete the Applied Project Deliverables called out in the
Applied Project Proposal and Timeline. These might include tool and platform identification, acquiring
data, cleaning data, identifying methodology, feature extraction, building model, testing model, validating
model, etc.
For larger tasks, define appropriate sub-tasks (e.g., 2.1). The responsibilities of each task member
should be evident from the Task list. A team member as owner of a task should be identified. Entries
such as “team” do not provide adequate visibility into who has respopnsibility for the task’s success.
~3~
It is conceivable that you will update this table as the Project progresses. Continue to maintain the
integrity of task indices as you make updates – if a new task is included, use a new index; if an old tasks is
eliminated, strike through that row, but do not resuse the index.
Index
Task
Owner
(team
member)
Start
Date
End
Date
Dependencies
(Task Indices)
Status*
1
2
2.1
2.2
3
*(Not Started, In Progress, Blocked, Completed)
2.3
Project Stakeholder Meetings
Purpose: Identify all recurring and one-off Stakeholder meetings.
Stakeholders include any person involved in moving your Applied Project forward – such as Applied
Project Team Members, Faculty Directors, Instructors, Clients, and IT personnel.
Any meeting must have an objective (why are you meeting / what are you trying to accomplish) and next
steps (resolutions accomplished / action items identified). If you don’t have an objective, don’t meet.
It is conceivable that you will update this table as the Project progresses. Continue to maintain the
integrity of task indices as you make updates – if a new meeting is included, use a new index; if an old
meeting is eliminated, strike through that row, but do not resuse the index.
Index
Meeting Objective
Meeting
Attendees
Start
DateTime
End
DateTime
Next Steps Identified
(possibly a task index in
Project Execution Plan)
1
2
3
4
5
3.
Project Resources
3.1
Stakeholders / Personnel
Identify project stakeholders / personnel resources, including client, principal participating stakeholders,
the firm, and external resources.
Contact
[name]
[name]
[name]
[name]
[name]
Role
Client PoC
Client, Sr. Tech Advisor
Team, PoC & team lead
Team member
Domain Expert, ASU
~4~
Contact Info
[phone, email, office address]
[phone, email, office address]
[phone, email, office address]
[phone, email, office address]
[phone, email, office address]
3.2
Data and Facility Resources
Purpose: Include facility and data resource identification.
Resource
Description
Describe the
information
contained in the
data, approximate
size, format, etc.
Define the facility,
lab, computing
resource, etc.
Location
Access
Need
Constraints
Where is
the data
located?
What are the access
requirements, e.g.,
authorized personnel,
dates/times, duration,
method.
What are the access
requirements, e.g.,
authorized personnel,
dates/times, duration,
method.
When is the need
date per the
schedule?
What are the
constraints on
using the data?
Identify the
periods, frequency,
duration, etc. for
which the resource
is needed.
What are the
constraints on
using the
resource?
Where is
the
resource
located?

4.
Risks and Issues
Purpose: This section tracks the risks and issues that could affect the successful completion of the
Applied Project. A Risk is elevated to an Issue when the suspense date has passed without adequate
resolution. This section should be updated as actions are performed and risks are retired. There is one
section for each risk item. Issues identified here will typically tie back to Blocked tasks in the Project
Execution section.
Risks
Data Privacy and Confidentiality: Collecting and analyzing data related to traffic accidents may
involve sensitive information. Ensuring the privacy and confidentiality of this data is paramount.
Data Accuracy and Completeness: The reliability of conclusions drawn depends on the accuracy
and completeness of the data. Inaccurate or incomplete data can lead to incorrect predictions
and analyses.
Complexity of Data Integration: Integrating data from diverse sources (traffic volumes,
construction permits, location data, etc.) can be complex and challenging.
Resource Intensity: Such projects can be resource-intensive, requiring significant time, skilled
personnel, and technological resources for data collection, analysis, and model development.
Dependence on External Factors: Traffic patterns are influenced by numerous external factors
(e.g., weather conditions, economic changes, policy changes), which may not be entirely
predictable or quantifiable.
Ethical Considerations: Using data for predictive modeling raises ethical questions, especially if
it leads to enforcement or policy changes that could disproportionately affect certain
populations.
Problems
~5~
Difficulties in Identifying Causal Relationships: Establishing a clear cause-and-effect
relationship between various factors (like proximity to schools or traffic signs) and accidents can
be challenging.
Varying Impact of Construction and Traffic Signs: The impact of construction permits and traffic
signs on traffic volumes and accidents can vary widely based on numerous contextual factors.
Dynamic Nature of Traffic Patterns: Traffic patterns are dynamic and can change rapidly due to
various factors, making predictions difficult.
Potential for Data Misinterpretation: Data related to accidents, traffic volumes, and other
factors can be complex and subject to misinterpretation if not analyzed carefully.
Impact of Unforeseen Events: Unpredictable events (like major accidents or road closures) can
significantly impact traffic patterns and accident rates, complicating predictive modeling.
Challenges in Generalizability: Findings in Mesa might not be applicable to other regions due to
different geographical, cultural, and infrastructural factors.
4.1
Item: Name of the risk or issue
Status:
Risk/Issue/Resolved
Receiver:
Identify the activity, deliverable, organization, or person that would be impacted
Impact:
Effect of the risk if unresolved: Low/Medium/High
Likelihood:
Low/Medium/High
Suspense Date:
Date
Responsible Party: Team member name
Resolution:
Describe the resolution or mitigation
Activities:
Date: Describe action taken, changes in the risk itself, additional information, etc.
Date: Describe action taken, changes in the risk itself, additional information, etc.
Date: Describe action taken, changes in the risk itself, additional information, etc.
4.2
Item: Name of the risk or issue
~6~

Purchase answer to see full
attachment