Description
Objective: The purpose of the Model Pipeline Design is to transform the requirements into complete and detailed system design specifications. The general process for building models is to set up the structure of the model (e.g., predictive, classifier, recommender, etc.), compile the model, fit the model, and evaluate the model. Deliverable: Model Pipeline Design Prepared for: [Stakeholder Name] (if applicable) Project Name: Mortality in the elderly and public health insurance Prepared by: [Name] Contributors: [Document contributors] (if applicable) Note: Make sure all instructions/prompts are removed before submission. Design Planning Summary Write an overview of this specific development project, a synopsis of the situation that led to the need (if applicable), and a short description of the issues that the development project is going to solve, as well as a general description of the proposed solution and the rationale for the solution. Overview of Model Pipeline Design Provide the high-level design of the proposed solution or business case with supporting narrative text. The design concepts must address the following:How will the data be obtained?How will the data be scrubbed or cleaned?How the data will be explored and visualized (e.g., to detect patterns and trends)?What data model will be used (e.g., how will you set up a predictive model)?What methods will be used to interpret the results of analysis?Use the template to list the project deliverables. Include all components, features, and tasks your finished project is expected to perform. Deliverable Acceptance Log ID Deliverable Description Comments 1 2 3 4 5 Detailed Model Pipeline Design Provide a detailed overview of how the proposed design fits into the overall solution/business case structure. Keep in mind, the purpose of the detailed model pipeline design is to provide sufficient information for a developer to implement the steps listed in the pipeline. The design overview should include: 1. The data sources 2. The dataset types and formatting 3. The data cleaning procedure 4. Method of initial data exploration and visualization 5. The data model used and its nature (e.g., predictive) 6. The methodology for interpreting the analysis results 7. Any configuration changes that will be required to develop and implement the proposed solution. 8. Describe the approach and resources required to assure system security, if applicable; otherwise, explain why security is not relevant. 9. Use the template to list the hardware and software technologies. Hardware and Software Technologies 1 – 2 – 3 – 4 – 5 – Projects Requirements Review Prior to submitting the Milestone deliverable, review the prior milestone and ensure consistency throughout. The project may have evolved since the first proposal; therefore, some revisions may be required to maintain coherence and stay true to the original proposal. A copy of Milestone 1: Project Proposal is attached for review/reference.
Unformatted Attachment Preview
Capstone Project Proposal
General Information
Project Name: Mortality in the elderly and public health insurance
Author:
Date project proposal form is submitted: December 21, 2022
Project Overview and Project Objectives
State the Problem
Background
The purpose of this study is to investigate the relationship between health insurance coverage and
amenable mortality. The study relies on data on survey conducted on population aged more than 50
years from a Longitudinal Study conducted by Nolan et al. (2022). That plays a significant role in
determining whether there is a relationship between individuals with no health insurance and their
mortality.
Project Objectives
The goal of this study is to provide evidence on some of the key reforms that should be implemented as
part of maintaining a universal healthcare.
Challenges
Approval to access data.
Benefits and Opportunities
Describe the benefits or opportunities resulting from project implementation.
The benefits of this project is to show insurance companies the best way to provide health insurance to
the elderly that will positively increase their overall mortality.
Allows the opportunities for public health insurance companies to better compete with private health
insurance companies, by using the results to better target those in high cost private plans, giving the
access to the same or better benefits at a lower cost.
Project Scope
1.
2.
This study will include detailed evidence of how health insurance impacts mortality risk. The
project will list some of the risks facing the elderly due to lack of health insurance.
List the work breakdown required to satisfy the project objectives. Identify teams and other
resources that may be required to successfully complete the project.
Completion
Estimate to
Completion
Completion
11/10/2022 11/28/2022
11/28/2022 11/28/2022 Laptop
2
Submit proposal and
requirement analysis
In progress
N/A
12/21/2022 12/21/2022
12/21/2022 TBD
Laptop
3
Conduct research and
literature review for project
In progress
N/A
TBD
TBD
TBD
Laptop
Incomplete
Unknown
TBD
TBD
TBD
Laptop
4
Data Understanding,
Collection, and
Preprocessing
5
Design Plan
Incomplete
N/A
TBD
TBD
TBD
TBD
Laptop
6
Methodology Approach
and Model Building
Incomplete
N/A
TBD
TBD
TBD
TBD
Laptop
7
Model Evaluation,
Verification and Calibration
Incomplete
N/A
TBD
TBD
TBD
TBD
Laptop
8
Model Implementation
Incomplete
N/A
TBD
TBD
TBD
TBD
Laptop
9
Performance Analysis
Incomplete
N/A
TBD
TBD
TBD
TBD
Laptop
10
Final Presentation
Incomplete
N/A
TBD
TBD
TBD
TBD
Laptop
Access to
data
TBD
Resource
N/A
Actual
Start Date
Choose/Submit/Get
approval on topic
Complete
1
Planned
Task
Status
ID
Cost
Effort Hours
Dependencies
Work Breakdown Structure
ISSDA
Software
available
Project Completion
1.
Work Breakdown Structure
1 -Choose/Submit/Get approval on topic: Approved topic for research
2 – Submit proposal and requirement analysis: Approval of proposal and requirement analysis
3 – Conduct research and literature review for project: Complete document containing literature review
4 – Data Understanding, Collection, and Preprocessing: Final clean data, ready for analysis
5 – Design Plan
6 – Methodology Approach and Model Building: A shortlist of models that are to be trained on the data.
7 – Model Evaluation, Verification and Calibration: Models with their performance indicated. Best
performing model selected.
8- Model Implementation: Model deployed to servers
9 – Performance Analysis: Model performance evaluated and results collected
10 – Final Presentation
2.
Assumptions and Constraints
Description
Comments
1
The research will contribute something to
research
The assumption is that this research will
add something to the knowledge base
Assumption
12/12/2022
All respondents were honest
The assumption that data collected from Assumption
respondents were honest and unbiased
12/12/2022
All the data we want will be collected
We assume whatever data we want we
will find it
Assumption
12/12/2022
Time will be enough to get all the data, analyze
and train models
For the project time, and the large
geographical areas, time will not be
enough
Constraint
12/12/2022
Biased point of view from Respondents
Despite telling the truth, the respondents Constraint
are biased based on their own views
12/12/2022
Results only based on results
We cannot get any other results, only
from the data collected
Constraint
12/12/2022
2
3
4
5
6
Type
Status
Date Entered
ID
Project Controls
1.
Risk Management
Event Risk
Risk Probability
(high, medium,
low)
Risk Impact
Risk Mitigation
Contingency Plan
What is the risk?
What is the
probability?
What is the impact if the
risk occurs?
What can be done to
minimize the risk?
What can be done to
minimize the impact of the
risk?
No Past Research Material Low to Medium
Risk of having a redundant
project
or
No proper guidance on areas
to improve on
Restricted past research
Cost to access data
“Ask for help from Instructor” “Start a new project on a
different topic”
Low
Project not being completed
as expected, timely
Gauge possible cost of access Change data parameters of the
to data, ask for assistance.
project to data that is free and
available
If data needs to be simulated, Utilize other free data
misleading information or bias resources on the same topic Change project topic
opinions could occur.
Data may have to be collected
by myself, which runs the risk
of no participants
2.
Included
in Rev. #
Date of
Decision
Status
Evaluator
Date
Assigned
Date
Entered
Originator
Priority
Change
Description
ID
Change Control Log
New unexpected data
High Data collection
team
12/12/2022
Yes
1
2 Data fields not found when
collected
High Data collection
team
12/12/2022
Yes
High Data collection
team
12/12/2022
Yes
3
12/12/2022
Yes
Change in data collection
techniques
4 Change in software used for
data analysis
Low
Data Analysis
team
3. Use the template to describe how the end user is involved in the software development, if
applicable. Include relevant information about meetings, reviews, presentations, etc.
Roles and Responsibilities
Name
Team
Project
Role
Responsibility
Cha Nesha
Collect data from online sources
•
•
Search online for data
Compile data into a single location and
format
Cha Nesha
Clean and reformat data
•
Change the data into a format that
allows for data analysis
Deal with missing data
Add or remove data fields
1
2
•
•
Cha Nesha
Analyze data
3
•
•
•
Find patterns in the data
Make recommendations based on the
insights on the data
Prepare a presentation of the results
Cha Nesha
Train, test and deploy models
•
•
•
Train models on the data provided
Test the most suitable model
Deploy the model for use
Cha Nesha
Report and Presentation writing
•
•
Compile results
Do a literature review of similar past
projects
Make final presentation and present
4
5
•
Project Schedule
Cost Estimate (if applicable)
1. The project being designed may not require any cost. This will be re-evaluated later in the
project.
Issue Log
1.
2
Date Resolved
Date to Review
Date Entered
Importance
How will this impact
How do you intend to deal
scope, schedule & cost? with this issue?
Owner
What is the
issue?
Action
Plan/Resolution
Project Impact
1
Issue Description
ID
Issues Log
Who manages
this issue?
Cha Nesha
Time Constraint. Some data may not be
Use other methods or
Time allotted for collected or accessible by resources to collect the
needed data.
data collection the data collection
deadline.
may require
Request/Add more time to
more time.
the data collection part of the
project.
High 12/12/2022
TBD
TBD
Take time to get results. Possibly minimize the amount Cha Nesha
Project complexity
of data that will trained and
increased
tested, but not so much that
results are skewed or results
show no relevance.
High 12/12/2022
TBD
TBD
Data may be
vase/big
5
Overall Instructor Feedback/Comments
Integrated Instructor Feedback into Project Documentation
☐ Yes ☐ No
Project Approval
☐ Instructor
Requirements Analysis
Mortality in the elderly and public health insurance
Use Cases
–
Researcher creates account
–
Researcher logs in
–
Researcher enters details
–
Researcher enters details for data collection
–
System combines data, analyzes and displays results on dashboard
System Design
1. Define the research topic/research questions
2. Applying the knowledge learned in the course work to identify objectives, issues and possible
solutions to the research topic
3. Collecting domain knowledge using literature review
4. Collect data from the field, clean, analyze, get insights, make recommendations train and test
model
Business Problem
Picking a
research topic
Domain
Knowldge(Propos
al Formulation)
• Formulation of
research questions
• Formulation of
objectives
• Possible solutions
and limitations in the
field
Infomation
Review of past
research
(Literarure
Review)
Data and Facts
• Data collection
• Data cleaning
• Data
engineering
• Data analysis
• Model training
• Model testing
• Model
deployment
Technical Requirements
Availability
Technically speaking, availability is a need that more closely resembles a metric. This metric measures
time as a percentage and confirms the length of time that the results of the project or research is
accessible to the public. To achieve this, the results will be published to the public.
Data integrity
Data and information of a given quality are referred to as data quality, which is a technical necessity.
Idealized data that you can utilize for operational and decision-making procedures is what you wish to
have. Data needs to be of high quality to produce accurate results or produce an accurate assessment of
the situation. Use of experienced interviewers and double check the answers and do follow ups with the
respondents.
Human error
Technically speaking, software must be able to recognize when users have entered incorrect data. When
this problem is found, the software alerts the user and suggests they correct the inconsistency. Setting
up data validation procedures on databases where data will be stored will reduce human error.
Information security
This technological prerequisite relates to the encryption and security of user credentials and personal
data inside an online data base or transportation system. High-level classified material would need to be
encrypted in order to maintain this level of protection. Only researchers and the supervisor will have
access to the data and research. This is achieved through the use of passwords, access request forms
etc.
Data Science Model
Business
Problem/Objective
Data Requirements
Data Collection
Data Cleaning and
Processing
Exploratory Data
Analysis
Model Building
Evaluation
Communicating
Model Results
Model Deployment
Mainteance and
Monitorin
Reports
1.
A literature review of past research studies similar to the research topic
2.
A report on the data collection methodology
3.
A report on the data cleaning procedures
4.
A report on the findings from data analysis
5.
A report from model training, testing deployment
6.
An aggregate report of the research study from problem formulation to model deployment
Screen Definitions and Layouts
This research study is a data science/statistical type of study which means that, there is no software
product being created. Data will be collected, analyzed, Machine learning models will be trained, tested
and deployed. There will therefore no need for development of a software product.
Security
This research study is a data science/statistical type of study which means that, there is no software
product being created. Data will be collected, analyzed, Machine learning models will be trained, tested
and deployed. There will therefore no need for development of a software product. Therefore there is
no need for a security matrix. The only security needed will be that of the product that machine learning
model will be deployed on.
Other (as dictated by the context and scope of the project)
References
ISSDA | The Irish Longitudinal Study on Ageing (TILDA). (n.d.). https://www.ucd.ie/issda/data/tilda/
Nolan, A., May, P., Matthews, S., Normand, C., Kenny, R. A., & Ward, M. (2022). Public health insurance
and mortality in the older population: Evidence from the Irish Longitudinal Study on
Ageing. Health Policy, 126(3), 190-196.
Milestone 2: Model Pipeline Design
Objective: The purpose of the Model Pipeline Design is to transform the requirements into complete
and detailed system design specifications. The general process for building models is to set up the
structure of the model (e.g., predictive, classifier, recommender, etc.), compile the model, fit the
model, and evaluate the model.
Deliverable: Model Pipeline Design
Prepared for: [Stakeholder Name] (if applicable)
Project Name: Mortality in the elderly and public health insurance
Prepared by: [Name]
Contributors: [Document contributors] (if applicable)
Note: Make sure all instructions/prompts are removed before submission.
Design Planning Summary
Write an overview of this specific development project, a synopsis of the situation that led to the need (if applicable),
and a short description of the issues that the development project is going to solve, as well as a general description of
the proposed solution and the rationale for the solution.
Overview of Model Pipeline Design
1. Provide the high-level design of the proposed solution or business case with supporting narrative text. The design
concepts must address the following:
o How will the data be obtained?
o How will the data be scrubbed or cleaned?
o How the data will be explored and visualized (e.g., to detect patterns and trends)?
o What data model will be used (e.g., how will you set up a predictive model)?
o What methods will be used to interpret the results of analysis?
2. Use the template to list the project deliverables. Include all components, features, and tasks your finished project is
expected to perform.
Deliverable Acceptance Log
ID
1
2
3
4
5
Deliverable Description
Comments
Detailed Model Pipeline Design
Provide a detailed overview of how the proposed design fits into the overall solution/business case structure. Keep in
mind, the purpose of the detailed model pipeline design is to provide sufficient information for a developer to
implement the steps listed in the pipeline. The design overview should include:
1. The data sources
2. The dataset types and formatting
3. The data cleaning procedure
4. Method of initial data exploration and visualization
5. The data model used and its nature (e.g., predictive)
6. The methodology for interpreting the analysis results
7. Any configuration changes that will be required to develop and implement the proposed solution.
8. Describe the approach and resources required to assure system security, if applicable; otherwise, explain why security
is not relevant.
9. Use the template to list the hardware and software technologies.
Hardware and Software Technologies
1–
2–
3–
4–
5–
Projects Requirements Review
Prior to submitting the Milestone deliverable, review the prior milestone and ensure consistency throughout. The
project may have evolved since the first proposal; therefore, some revisions may be required to maintain coherence and
stay true to the original proposal. A copy of Milestone 1: Project Proposal is attached.
Purchase answer to see full
attachment