Description
Unformatted Attachment Preview
DATA MINING AND MACHINE LEARNING (EBUS537)
terksseBerry
Isenn000ese
Individual Assignment 2 – Development of a novel data mining application
Set by Dr Eric Leung
Submission Deadline: 12th January 2024, 12:00 noon
Contribution to the Final Mark of Module: 50%
Maximum Word Length: 1800 words (excluding references and appendices)
Requirements:
Data mining and machine learning tools have a wide variety of applications. In this
open-ended question, you are free to choose any dataset and any particular tool
associated with clustering, association rule learning and fuzzy logic (e.g. K-means
clustering, Apriori algorithm, etc.). Imagine that you are a data analyst who is
responsible for mining hidden patterns from datasets. Specifically, you are required
to:
1. Suggest an applicable area where a tool can be applied –
a. Brainstorm and suggest a real-life scenario where the data mining or
machine learning tools taught in the module can be applied to generate
insights from a dataset (see Remark (i))
b. Discuss why this tool is selected for this scenario
2. Identify and discuss a dataset – The dataset can be an open dataset from
various open sources (see Remark (ii)). Alternatively, if no dataset is available
for the application/scenario you brainstormed in Step 1, you may create a
virtual dataset, which contains at least 50 data objects/observations so that a
data mining tool can further be applied.
a. Introduce the dataset: what the dataset contains (types of data, data
variables, etc.), the source of the dataset (if obtained from a database
website) (see Remark (iii))
b. the potential insights that can be generated through applying the
selected tool to mine the data from the selected dataset
3. Apply the selected tool on the dataset you picked/created in Step 2 –
a. Discuss and interpret the data mining results after applying the tool
b. Discuss the novelty and significance of this application
Remark:
(i)
Your chosen application should be new. In other words, you should not select
an application area exactly the same as any existing ones that can be found
in the literature. This exercise is to allow you to think out of the box to identify
any promising areas of a data mining and machine learning tool.
(ii)
There are numerous open database websites that enable you to identify and
retrieve a free, public dataset. They include, but not limited to, Google
Dataset Search, Kaggle, Datahub.io, UCI Machine Learning Repository,
Earth Data, Global Health Observatory Data Repository. Try to google the
rest of them and identify a dataset, or create a virtual dataset!
(iii)
Your submission should include the dataset (in excel format) you picked.
DATA MINING AND MACHINE LEARNING (EBUS537)
terksseBerry
Isenn000ese
Individual Assignment 2 – Development of a novel data mining application
Set by Dr Eric Leung
Submission Deadline: 12th January 2024, 12:00 noon
Contribution to the Final Mark of Module: 50%
Maximum Word Length: 1800 words (excluding references and appendices)
Requirements:
Data mining and machine learning tools have a wide variety of applications. In this
open-ended question, you are free to choose any dataset and any particular tool
associated with clustering, association rule learning and fuzzy logic (e.g. K-means
clustering, Apriori algorithm, etc.). Imagine that you are a data analyst who is
responsible for mining hidden patterns from datasets. Specifically, you are required
to:
1. Suggest an applicable area where a tool can be applied –
a. Brainstorm and suggest a real-life scenario where the data mining or
machine learning tools taught in the module can be applied to generate
insights from a dataset (see Remark (i))
b. Discuss why this tool is selected for this scenario
2. Identify and discuss a dataset – The dataset can be an open dataset from
various open sources (see Remark (ii)). Alternatively, if no dataset is available
for the application/scenario you brainstormed in Step 1, you may create a
virtual dataset, which contains at least 50 data objects/observations so that a
data mining tool can further be applied.
a. Introduce the dataset: what the dataset contains (types of data, data
variables, etc.), the source of the dataset (if obtained from a database
website) (see Remark (iii))
b. the potential insights that can be generated through applying the
selected tool to mine the data from the selected dataset
3. Apply the selected tool on the dataset you picked/created in Step 2 –
a. Discuss and interpret the data mining results after applying the tool
b. Discuss the novelty and significance of this application
Remark:
(i)
Your chosen application should be new. In other words, you should not select
an application area exactly the same as any existing ones that can be found
in the literature. This exercise is to allow you to think out of the box to identify
any promising areas of a data mining and machine learning tool.
(ii)
There are numerous open database websites that enable you to identify and
retrieve a free, public dataset. They include, but not limited to, Google
Dataset Search, Kaggle, Datahub.io, UCI Machine Learning Repository,
Earth Data, Global Health Observatory Data Repository. Try to google the
rest of them and identify a dataset, or create a virtual dataset!
(iii)
Your submission should include the dataset (in excel format) you picked.
Purchase answer to see full
attachment