Description
Methodology steps for Unsupervised learning for isolation forest, one class svm and auto encoders using Keras. base document has been written but needs to be expanded on and the writing format needs to be changed to match the example (which is the pdf document) the writing style and format needs to be similar.
Unformatted Attachment Preview
1
Methodology Steps for Unsupervised Learning with Isolation Forest, One-Class SVM, and
Autoencoders using Keras.
Student’s Name
Professor’s Name
Course Title
Date
2
Methodology Steps for Unsupervised Learning with Isolation Forest, One-Class SVM, and
Autoencoders using Keras.
Abstract
Particularly, unsupervised learning, which is a fascinating branch of machine learning, plays
a significant role in the analysis of data, learning about patterns, and anomaly detection.
Compared to supervised learning, which is characterized by trained algorithms on labeled
data, unsupervised learning entails unlabeled data, which helps to uncover hidden structures.
Essentially, unsupervised learning ensures computers explore data on their own, further
identifying inherent similarities or differences. The key concepts in unsupervised learning
include clustering and dimensionality reduction. While clustering helps in grouping similar
data points, dimensionality reduction helps to make it easier to visualize as well as analyze
complex datasets. Therefore, unsupervised learning techniques are important in anomaly
detection, with the main objective of identifying abnormal instances in a data set without
previous knowledge of what comprises normal behavior. The paper will detail a
comprehensive methodology for anomaly detection with the use of isolation forest, one-class
SVM, and Autoencoders. The algorithms will be implemented through the utilization of the
Keras framework, detailing step-by-step instructions for data preprocessing, model
construction, training, and evaluation.
3
Introduction
Anomalies in data are significant as they show faulty behaviors. Therefore, locating
anomalies helps to find the source of the problems (Livari, 2022). Unsupervised learning
techniques such as the Isolation Forest, One-Class SVM, and Autoencoders are efficient in
ensuring that they detect anomalies in unlabeled data. Particularly, anomaly detection is a
crucial task that can be applied in different industries such as the detection of fraud,
cybersecurity issues, and industrial monitoring. The paper will detail a comprehensive
methodology for anomaly detection with the use of isolation forest, one-class SVM, and
Autoencoders. The algorithms will be implemented through the utilization of the Keras
framework, detailing step-by-step instructions for data preprocessing, model construction,
training, and evaluation.
Data Preprocessing
Data preprocessing entails the generation of raw data for machine learning. It is the
first step in the creation of an algorithm model. It is also time-consuming. In the current
world, data can miss some elements, making it completely useless, however, with data
preprocessing, it helps to adjust and refine it to make it useful. As detailed by Fan et al.
(2022), there is no doubt that the development in data science as well as the increase in the
availability of creating operational data have provided opportunities to come up with datadriven solutions. In this regard, data preprocessing lays the foundation for valid data analyses.
Data preprocessing is an indispensable step that data analysis, which considers the
intrinsic complexity as well as deficiencies in data quality. Data preprocessing, thus, is
required to enhance the validity and reliability of data analysis (Simplilearn, 2023). It is a
step that helps to remove outliers and fill the gap created by missing values for more valid
data analysis. Data preprocessing follows several steps-
4
a) Load the dataset and import the libraries- A library entails a set of functions
used in algorithms. The data loaded is used in machine learning algorithms.
This is the most important step as it helps to collect data and import it for
further analysis (Simplilearn, 2023).
b) Check for missing values- This step entails data cleaning and transformation.
The loaded data is assessed for the purpose of ensuring that there are no
missing values. In the event missing values are found, one can remove the
entire row which contains missing values. In this case, removing the entire
row can lead to losing essential data. Nonetheless, this can be done when the
data set is too large. Secondly, the value of the missing data can be estimated
with measures of central tendency such as mean, median or mode
(Simplilearn, 2023).
c) Data arrangement- Machine learning algorithms do not comprehend nonnumeric data. It is important that the data loaded is arranged in a numerical
form which helps to avoid problems in the later stages of data analysis. When
all text values are converted into numerical form, it helps to create a solution.
A useful function includes LabelEncoder() which can be used in converting
data to numerical form (Simplilearn, 2023).
d) Feature scaling and normalization- Scaling helps to convert data into shorter
ranges. Scaling also helps in ensuring uniformity in the distribution of data.
e) Train-test split- data is distributed into training, evaluation, and validation sets
(Simplilearn, 2023). This helps to evaluate the model performance.
Isolation Forest
5
As detailed by Gao et al. (2019), isolation forest is an unsupervised machine learning
that helps in the creation of binary trees to solve the problem of anomalies. Isolation forest,
therefore, is key in the detection of anomalies in data points with the utilization of path length
to come up with anomaly scores. Isolation forest can be compared to a decision tree where
there is the use of terminal node statistics to get to a class. Isolation forest helps in the
processing of sub-samples of data in a random manner to arrive at an anomaly point, which is
based on random features. In this case, more splits are required to arrive at a point that is less
likely to be an anomaly, as well, fewer splits help to isolate anomalies from other data points.
This is where the name isolation forest comes from.
Furthermore, the isolation of data points takes place through recursive splitting using
split values randomly. Data points are isolated at terminal nodes when the isolation tree is
grown. Rath (2023) notes that finding anomalous data points requires the use of minimum
path length, which entails the number of edges needed to arrive at the terminal code from the
root node.
With isolation forest, these steps are followeda) Use of Keras- Keras is used in the construction of an isolation forest model.
b) Construction of the model- In this case, the structure of the isolation forest
model is defined. This has to include the number of trees as well as the
maximum tree depth.
c) Tuning- Hyperparameters are optimized, which may include the subsampling
size and the decision trees.
d) Training- The isolation forest model is fitted to the data being trained.
e) Utilization of evaluation metrics- the isolation model performance is assessed
with the use of performance metrics such as recall, precision, and FI score.
6
One-Class SVM
SVM is a powerful nonparametric classification technique that uses statistical learning
theory in machine learning. According to Hejazi & Singh (2013), one-class SVM is a kernelbased technique that uses the idea that training data in input space is mapped into feature
space through the use of kernel function. This then helps in finding a hyperplane
characterized by an optimum margin in feature space for the purpose of separating mapped
data from the origin. In the real world, datasets are characterized by imbalanced class
distributions in applications such as text categorization, identification of fraudulent credit
card transactions, and the detection of certain objects from satellite images. Therefore, as a
support vector machine algorithm, one-class SVM helps to identify outliers based on normal
data distribution. These steps are followeda) Implementation with the use of Keras- One class SVM is implemented using
the Keras framework.
b) Construction of the model- The parameters of the one-class SVM are
identified and specified. These may include the kernel types and the
parameters of regularization.
c) Tuning of hyperparameters- This is done using techniques such as randomized
search. Random search utilizes a random combination of hyperparameters to
identify the optimal value for the established model.
d) Training- The dataset is trained using the one-class SVM model.
e) Evaluation- The performance of the one-class SVM model is evaluated using
performance metrics of accuracy, recall, and precision.
7
Autoencoders
Autoencoders are designed to receive an input and change it to a different
representation. According to Chen & Guo (2023), autoencoders can be stacked to form
hierarchical deep models to organize, compress, and extract high-level features using
unlabeled data. Therefore, autoencoders allow for unsupervised learning and the extraction of
non-linear features. Autoencoders network can help in the conversion of an input vector into
a code vector with the use of a set of recognition weights. In unsupervised learning,
autoencoders are utilized in the compression of data and reduction in their dimensionality.
Notably, the original data is reconstructed from the compressed data with the use of
Autoencoders. Autoencoders comprise of the encoder, code, and decoder layers. The encoder
layer compresses the input image into a latent space representation. The input image is
encoded as a compressed representation and reduces its dimension. The code layer represents
the compressed input which is fed to the decoder layer. Finally, the decoder layer helps in
decoding the encoded image to its original form. Therefore, autoencoders are used for feature
learning as well as reconstruction tasks.
The following steps are keya) Implementation with the use of Keras- An autoencoder is designed and trained
using Keras.
b) Construction of the architecture design- the autoencoder model is defined
including the different number of layers, neurons, and functions that need to
be activated.
c) Tuning of hyperparameters- Hyperparameters include the batch size, the
dropout rate, and the learning rate. These are then optimized.
8
d) Training- the autoencoder model is trained to reconstruct input instances.
e) Calculation of errors- the reconstruction error is calculated for every instance
in the dataset already loaded.
f) Evaluation- Metrics such as the mean squared error and cosine similarities are
utilized in the measurement of the performance of the autoencoder model.
Comparative analysis
In this step, comparing the performance of the isolation forest, one-class SVM and
autoencoders helps determine the technique that better represents detection accuracy,
efficiency in computation, and scalability. For instance, the isolation forest is designed to
work well with large datasets, proving speed and efficiency. Isolation forest requires less
computational resources, which makes it a cost-efficient solution of anomaly detection. It is
also able to handle high-dimensional data and does not need the data to be preprocessed.
Isolation forest, is thus, an attractive option when it comes to detecting anomalies in large
datasets. However, the isolation forest algorithm may not efficiently detect anomalies in
specific types of data transactions.
On the other hand, one-class SVM works well with unstructured data. There is also
less risk of over-fitting data since there is a generalization in practice. However, it can be hard
to comprehend the final model, the weight of the variables, and the individual impact. For
autoencoders, they are flexible as well as adaptable to different types of data. This means that
autoencoders can handle linear as well as non-linear relationships in data, which makes it
easy to learn complex or abstract features of data. However, the use of autoencoders requires
more time and resources to tune and train. Autoencoders are also sensitive to the choice of
hyperparameters, which means that this can impact performance and the quality of
autoencoders and may need trial and error to arrive at the optimal values.
9
Conclusion
To sum up, unsupervised learning plays a significant role in the analysis of data,
learning about patterns, and anomaly detection. Compared to supervised learning, which is
characterized by trained algorithms on labeled data, unsupervised learning entails unlabeled
data, which helps to uncover hidden structures. Essentially, unsupervised learning ensures
computers explore data on their own, further identifying inherent similarities or differences.
The key concepts in unsupervised learning include clustering and dimensionality reduction.
While clustering helps in grouping similar data points, dimensionality reduction helps to
make it easier to visualize as well as analyze complex datasets. Therefore, unsupervised
learning techniques are important in anomaly detection, with the main objective of
identifying abnormal instances in a data set without previous knowledge of what comprises
normal behavior. These techniques include the isolation forest, one-class SVM, and
Autoencoders.
10
References
Chen, S., & Guo, W. (2023). Auto-Encoders in Deep Learning—A Review with New
Perspectives. Mathematics, 11(8), 1777.
Fan, C., Chen, M., Wang, X., Wang, J., & Huang, B. (2021). A review on data preprocessing
techniques toward efficient and reliable knowledge discovery from building
operational data. Frontiers in Energy Research, 9, 652801.
Gao, R., Zhang, T., Sun, S., & Liu, Z. (2019, June). Research and improvement of isolation
forest in detection of local anomaly points. In Journal of Physics: Conference
Series (Vol. 1237, No. 5, p. 052023). IOP Publishing.
Hejazi, M., & Singh, Y. P. (2013). One-class support vector machines approach to anomaly
detection. Applied Artificial Intelligence, 27(5), 351-366.
Livari, A. (2022). Anomaly detection techniques for unsupervised machine learning.
Rath, D.P. (2023). Isolation Forest-An overview. Retrieved from
https://www.linkedin.com/pulse/isolation-forest-overview-debi-prasad-rath-mnzqc
Simplilearn, (2023). Data preprocessing in machine learning: a beginner’s guide. Retrieved
from https://www.simplilearn.com/data-preprocessing-in-machine-learning-article
Chapter 3—Methodology
3.1 Introduction
The methods within this chapter will describe how privacy preserving federated
learning (FL) enables detection of abnormal behavior within an IoMT ecosystem. Our
charge is to define an Intrusion Detection System (IDS) to detect nefarious activity on
Internet of Medical Things (IoMT) edge network in near real-time. To mitigate and
contain potential threat threats detection at the edge nodes are paramount. This praxis
tests the performance of two centralized and federated ML models. The research will
evaluate which model has superior performance in terms of highest prediction speed,
prediction accuracy, F1 score, AUC-ROC value, and lowest false alarm rate. Deep
Neural Networks (DNN) and XGBoost were selected as the primary learners for the
study. Federated learning analysis is based on the FEDAVG algorithm.
The construct of our proposed FL/IDS system expands upon the research from
FIDChain. (Ashraf E, 2022) FIDChain integrates Federated learning and blockchain
within a four-tier architecture as previously described in Chapter 2. FIDChain was trained
and validated an IoT NetFlow traffic-based dataset (BoT-IoT); to train a NIDS model
leveraging artificial neural network (ANN) in a Federated construct. The common FL
algorithm FEDAVG was applied to aggregate weights and generate a centralized/global
model. Artificial Neural Networks (ANN) and eXtreme Gradient Boosting (XGBoost)
algorithms were used to develop the model in conjunction with leveraging the BoT-IoT
dataset provided the foundation for evaluating the model’s performance. Our research
expands upon the model’s resilience to detect anomalies with telemetry generated from
33
heterogeneous IoMT devices and more attack and biometric sources using the WUSTLEHMS-2020 dataset which is more aligned with the healthcare domain.
Our architecture is comprised of multiple layers to provide resilience, expediency,
and accuracy in attack detection in IoMT networks. The layers are aligned in four
primary categories IoMT device-sensor layer, Edge Security Layer (ESL), Cloud Layer
and Application layer. The layers are described below:
IoMT Device Layer contains the sensory IoMT devices which capture patient
data.
Edge Security Layer provides the local compute, network and storage that
harvests raw data from the IoMT devices. The edge server employs NIDS that are
hosted on edge computing/fog nodes. The primary function is to normalize data,
examine NetFlow traffic and perform binary classification of network-based
attacks. The fog nodes will serve as the federated clients that will participate in
model training and send data upstream to the global/centralized server. The
clients will also perform local training/validation to continually optimize the
NIDS FL. Employing NIDS at this layer mitigates risk of further network
penetration and provides rapid exfiltration of threats at the edge. The local model
metadata/gradients and weights will be written to the blockchain and ultimately
ingested by the central model. Note for our research all clients are equally
weighted. Attack detection turnaround time of intrusion will be lessened as the
attack resources proximity of the sensor layer. Additionally, there will be a
reduced burden on computing resources and processing capacity as the FL model
will ingest smaller iterations of data as compared to CL methods. Once the
34
module learning process is finished, the local model weights of each client is
ported to a blockchain-distributed ledger and stored in linked chained blocks that
connect gateway nodes with a server node in the next cloud layer. The proposed
IDS is also at this layer to identify threat actors at the edge nodes. Lastly, the
permissioned blockchain is incorporated to secure the ingress/egress traffic to the
cloud layer. A known drawback of federated learning is trusted nodes which may
introduce poisoning attacks. An adversary may send malicious data to the central
server. The immutable and consensus (via smart contracts) characteristics of
blockchain mitigate this risk.
Cloud Layer This layer provides cloud services to store patient data. Also brokers
connections between healthcare and provider networks via blockchain. Cloud
services can be either on-premises, outsourced to a hyperscaler or hybrid.
Business Layer: Data is presented at this layer for various healthcare systems and
consumed by providers for patient analytics, diagnosis, and treatment. This is the
main interface point to exchange information between the provider and patient.
Due to the heterogeneity of IoMT devices which sense, detect and transmit data to the
IoMT gateway, securing the gateway point is vital to secure ingress/egress traffic.
IoMT devices have limited storage, compute and network resources which are not
feasible to deploy native IDS capabilities. Thus, fog nodes are deployed within the
Edge Security Layer (ESL) to provide sufficient processing capability edge to ingest
IoMT telemetry and transmit data to the cloud layer. Additionally, fog nodes will
minimize latency since frequently accessed data can remain local and negate directly to
the cloud for every transaction. IDS is hosted in this layer since heterogeneous devices
35
are transmitting data that must be aggregated and analyzed for suspicious activity. The
proposed architecture is reflected in figure 7.
WƌŽǀŝĚĞƌ
ŝĂŐŶŽƐŝƐ
ƵƐŝŶĞƐƐ>ĂLJĞƌ
ůŽƵĚ>ĂLJĞƌ
ůŽĐŬĐŚĂŝŶ
;,LJƉĞƌůĞĚŐĞƌ
&ĂďƌŝĐͿ
ĚŐĞ^ĞĐƵƌŝƚLJ
>ĂLJĞƌ
&ŽŐEŽĚĞƐ
/ŽDdĞǀŝĐĞ
>ĂLJĞƌ
Figure 7: Proposed Architecture
Our solution enhances the FIDChain framework through utilizing the WUSTL-EHMS36
2020 dataset which has greater alignment with healthcare telemetry. (Ashraf E, 2022)
3.2 WUSTL-EHMS-2020 Dataset
The WUSTL-EHMS-2020 Dataset was selected as it combines both network
traffic and biometric data to closely emulate an IoMT ecosystem. The dataset is
comprised of 44 features: 35-network traffic and 8-biometric data features as reflected
in Table 1. The data was collected using a real-time Enhanced Healthcare Monitoring
System (EHMS). (Hady AA, 2020) Incorporation of both network flow metrics and
patient biometrics to create an enriched dataset. Traditional datasets typically contain
one of these two types of features. This will provide enhance robust learning for ML
training and validation that is aligned with IoMT ecosystems MiTM cyber-attack
types were incorporated within a dataset of greater than 16 thousand records containing
normal and attack healthcare data was developed. Dataset information is reflected in
Table 1 below: (Hady AA, 2020)
Table 2: WUSTL-EHMS-2020 Dataset Statistics (Hady AA, 2020)
The dataset is comprised of two attack categories data alteration and spoofing.
Attack categories are targeted to disrupt data integrity and confidentiality. In data
alteration, the adversary alters communication between the patient sensors and
providers. This alteration may lead to misdiagnosis, incorrect calibration of critical
devices and other effects on patient devices which may lead to degrading patient’s
health. There are eight biometrics features within the dataset. When a sensory device is
37
operating properly, it will output benign netflow traffic. Conversely, if the adversaries
attack the IoMT devices it will trigger each IoMT device to generate traffic data
anomalies.
Dataset features are described in Table 3 (Hady AA, 2020)
Metric
Description
Type
SrcBytes
Dst Bytes
SrcLoad
DstLoad
SrcGap
DstGap
SIntPkt
DIntPkt
SIntPktAct
DIntPktAct
Source Bytes
Destination Bytes
Source Load
Destination Load
Source missing bytes
Destination missing bytes
Source Inter Packet
Destination Inter Packet
Source Active Inter Packet
Destination Active Inter
Packet
Source Jitter
Destination Jitter
Source Maximum
Transmitted Packet size
Destination Maximum
Transmitted Packet Size
Source Minimum
Transmitted Packet Size
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Destination Minimum
Transmitted Packet Size
Duration
Aggregated Packets
Count
Total Packets Count
Total Packets Bytes
Retransmitted or
Dropped Packets
Percentage of
Retransmitted or
Dropped Packets
Percentage of Source
Retransmitted or
Dropped Packets
Percentage of
Destination
Retransmitted or
Dropped Packets
Flow Metric
SrcJitter
DstJitter
sMaxPktSz
dMaxPktSz
sMinPktSz
dMinPktSz
Dur
Trans
TotPkts
TotBytes
Loss
pLoss
pSrcLoss
pDstLoss
38
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Rate
SrcAddr
DstAddr
Sport
Dport
Load
SrcMac
DstMac
Packet_Num
Attack_Category
Label
Temp
SpO2
Pulse_Rate
SYS
DIA
Heart_Rate
Resp_Rate
ST
Number of Packets per
Second
Source Address
Destination Address
Source Port
Destination Port
Load
Source MAC Address
Destination MAC
Address
Packet Number
Attack Category
Attack_Label
Temperature
Peripheral
Pulse_Rate
Systolic Blood Pressure
Diastolic Blood Pressure
Heart Rate
Respiration_Rate
ECG ST segment
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Flow Metric
Biometric
Biometric
Biometric
Biometric
Biometric
Biometric
Biometric
Biometric
Table 3: WUSTL-EHMS-2020 Dataset Features (Hady AA, 2020)
3.3 Exploratory Data Analysis (EDA)
EDA is beneficial in discovering insights from a dataset, and it is usually
accomplished by visualizing the dataset. EDA is essential for ML model development
since it is used to understand the dataset’s features. Additionally, to understand the
feature’s distribution and scale, and identify patterns or relationships between variables
that may not be easily noticed otherwise (Sahoo K, 2019). EDA is especially helpful for
datasets with multiple classes before they are used in multi-class classification ML model
development. This is because it can visualize any data imbalances between the various
classes, which could lead to ML model biases. In this research, some EDA examples
include reviewing basic statistics for each feature in the dataset. Specifically, there are
39
examinations on the relationship between biometric and attacks to determine if there are
any perceived patterns that increase probability of attack.
3.4 Feature Selection
Given the primary objective of this praxis is to develop an FL-based IDS for
IoMT networks that can detect malicious cyberattacks in real-time, the runtime of the
IDS developed is critical. Limiting the features used in the model to those most crucial
to the model’s performance ensures that the model can run as efficiently as possible
while maintaining high performance metrics. (Sarhan, 2022) To achieve that goal, the
features of the WUSTL-EHMS-2020 dataset need to be ranked based on their impact
on the model’s performance. After having the features ranked in terms of importance
to the model, it is essential to select the optimal number of features to define the
minimal set.
To reduce this dataset’s dimensionality and lower the ML model’s complexity,
this praxis uses Chi-Square values to determine each feature’s importance. The ChiSquare value for a given feature of a dataset summarizes the marginal contribution of
that feature to the performance of the model that it is a part of (Sarhan, 2022) This is
accomplished by calculating the model’s performance with statistically relevant features
which excludes the feature being evaluated, and then measuring the model’s
performance with a subset of features inclusive of the evaluated feature. A chi-square
test is a statistical method to identify the independence of a feature relative to class
label. The statistical method measures how the expected label (X) and the feature (Y)
vary from each other. Degrees of freedom is used as an identifier to derive if the null
hypothesis can be rejected. A high chi-square value (above 0.5) signifies the hypothesis
40
of independence should be rejected, as the feature and class have a dependency.
Additionally, the feature should be incorporated in the classification experiments. This
research leverages tensor flow sklearn.feature method to perform chi-square test. The
significance of each feature is derived from WUSTL-EHMS-2020 dataset via using the
tensor flow model characteristics property. Feature value provides a score for every
function of the results. A higher score is indicative of the features significance in
relation to the overall model evaluation. Feature importance is incorporated with built-in
class with Tree Based Classifiers, our research leveraged the Extra Tree Classifier to
extrapolate the top 12 features for this dataset.
3.5 Data Preprocessing
Data preprocessing is a crucial step for building ML models. Data
preprocessing helps with cleaning up and filling gaps in the dataset used to construct
an ML model. It is also a crucial step in preparing the dataset to be run in an efficient
way by allowing the ML model to focus on the features that influence the model’s
performance the most. Preprocessing the WUSTL-EHMS-2020 dataset involved
removing null and duplicated values, balancing the data and feature scaling. This
provides transformations to the data before providing it to the algorithms. The
outcome is to obtain clean dataset conducive to optimal model training and results.
Given there will be an evaluation of multiple models on the same data, a properly
formatted dataset is paramount. In preparation for training the dataset was split into
80% training and 20% validation as a basis to measure the model’s performance.
MinMaxScalar via Tensor flow scikit learn was used to normalize high-dimensional
41
features to a range of (0, 1) allowing us to preserve and retain the original
distribution. To prepare for federated learning clients, the dataset was sampled into
five and ten smaller client datasets to simulate data and operations of IoMT edge
devices. The dataset was partitioned to provide mechanism for each client with local
models to train on normal and attack data inclusive of biometrics. Both the FL and
CL models employed similar training/validation split. Training was established with
80% of the data. Twenty percent was withheld for validation. This mitigated
duplication/overlaps between training/test and netflow/biometric data.
3.6 Tensor Flow
“TensorFlow is platform that is comprised of deep learning algorithms.
Traditionally operations are executed based on tensor objects. Neural network
algorithms are a combination of basic operations such as multiplication and addition of
tensors.” (Long L, 2022) This research evaluates the performance of CL and FL
classification approaches utilizing the TensorFlow platform for centralized learning.
The Tensor flow federated (TFF) construct was used for federated learning model
training.
3.6.1 Centralized Learning (CL)
Deep Neural Networks (DNN), Long-Short-Term Memory (LTSM), XGBoost
and LightGBM were used as centralized algorithms to compare performance of the
federated learning model. XGBoost is a gradient-boosted algorithm that is scalable and
employs fast parallel and distributed computing with optimal memory utilization. It is
42
an ensemble learner designed for both regression and classification problems. XGBoost
leverages ensemble boosting to learn from the errors committed in the preceding trees.
It is designed to minimize overfitting which is prevalent in tree-based models.
LightGBM enhances the efficiency and accuracy of Gradient Boosting Decision Tree’s
(GBDT) which is the basis for XGBoost. LightGBM implements the same ensemble
algorithms as XGBoost, however, it applies theoretical and technical optimizations to
improve performance and accuracy while significantly reducing memory usage. Hence,
this framework is conducive for IoMT edge devices given the lightweight resource
consumption when compared to other ML algorithms. Neural networks refer to a model
in which multiple neurons are used to parameterize the mapping function. Deep Neural
Network (DNN) refers to an implementation of neural networks defined by a number
of layers of neurons, which have the capability to learn sophisticated models based on
iterative levels of abstraction layers. These algorithms are traditionally resource
intensive, however considered in the methodology to evaluate a breadth of models due
to the broad IoMT ecosystem.
3.6.2 Federated Learning (FL)
TFF is an open-source framework for ML/FL and other computations on
decentralized data. TFF was used to train and validate the model for federated learning.
Federated Averaging (FedAvg) was the primary algorithm used for simplicity and
performance. FedAvg is performed on a set of local models by computing the
parameter-wise arithmetic mean across the models, producing an aggregate model. It is
important to understand that performing aggregation once is not enough to produce a
43
good global aggregate model; instead, it is the iterative process of locally training the
previous global model and aggregating the produced local models into a new global
model that allows for global training progress to be made. Unlike centralized learning,
the FL learning process consists of steps within an Epoch (Round). Referencing Figure
8, the high-level steps within an Epoch are outlined below:
1. Aggregate global model parameters are deployed via private blockchain
network consensus to each device on (IoMT) gateway.
2. Global model parameters on each IoMT gateway are trained with local data.
3. Local training model weights are placed into a ‘block’ on the blockchain. Block
is added to the blockchain. For certain duration, the parameters are sent via
blockchain to central server (which is also distributed through blockchain
Hyperledger fabric)
4. The central/global server aggregates the local models by applying an
aggregation function (FedAVG), producing a new aggregate global model.
Blockchain consensus and approval by network to down the updated global
destined to clients.
The dataset was shuffled and decomposed into balanced smaller sets to simulate local
clients. The simulation was run on 5 and 10 clients. Each client represents a patient in
an ICU room with IoMT devices. 100 Epochs with a batch size of 1024 were used.
44
ϰ
ŐŐƌĞŐĂƚŝŽŶ
ůŽĐŬĐŚĂŝŶ
ůŽĐŬĐŚĂŝŶ
‘ůŽďĂů^ĞƌǀĞƌ
ůŽĐŬĐŚĂŝŶ
ϯ
ϭ
ϭ
ůŽĐŬĐŚĂŝŶ
ϯ
ůŝĞŶƚ
Ϯ
>ŽĐĂůdƌĂŝŶŝŶŐ
ϭ
ϯ
ůŝĞŶƚ
Ϯ
ůŝĞŶƚ
Ϯ
>ŽĐĂůdƌĂŝŶŝŶŐ
>ŽĐĂůdƌĂŝŶŝŶŐ
Figure 8: Steps in Epoch Iteration
Federated Learning architecture was based on Deep Neural Networks (DNN) since this
was conducive to a decentralized learning construct. The architecture was based on the
work from (Mosaiyebzadeh F, 2023), ho