Medical intelligence using PPG signals and hybrid learning at the edge to detect fatigue in physical activities
PPG signals
The analyzed data was done according to the protocol presented in the research41 and we used a large number of students in the analysis. This study collected data to evaluate the effectiveness of the suggested method in detecting weariness caused by learning in a real-life setting, without interfering with students’ regular activities. The research procedure was officially endorsed by the University Ethics Committee. The experiment was conducted in two separate settings: (1) a university classroom with fluorescent lights hanging from the ceiling, without any ability to adjust their brightness, and (2) during periods of intense studying by students, which was anticipated to lead to mental fatigue. A total of 12 physically fit students participated in the data gathering for this study. Every student had to undertake two separate examinations: one while experiencing great exhaustion and another under normal circumstances. Each examination was assigned a time frame of 10 min.
In order to evaluate the students’ current level of learning prior to each test, we utilized two assessment tools. The first tool was the 2-min Psychomotor Vigilance Test (PVT), which measures instances of decreased attention caused by fatigue42. The second tool was the Karolinska sleepiness scale (KSS) form, a commonly employed method for detecting sleepiness43,44,45,46,47. The corresponding results can be displayed in Table 1. The mean latency, measured in milliseconds (ms), is presented in the third and fifth columns of Table 2. The mean ± std denotes the average and standard deviation (std) for all participants in both the fatigue and non-fatigue tests.
System setting
To evaluate the model’s ability to handle several states, the recommended classifier was used. We divided the data into two segments: training and testing. We used k-fold cross-validation (CV) to evaluate the algorithm’s performance and create validation data, specifically a fivefold CV separated by subject (student).
Initially, our data consisted of recordings from 12 participants. However, we performed windowing on this data, where the recordings were segmented into smaller windows. This windowing process effectively augmented our dataset, resulting in up to 450 windows for some participants. To ensure the validity of our cross-validation process, we applied a fivefold cross-validation strategy on these windows. Here’s how it worked:
-
(1)
Each participant’s signal was divided into multiple small windows. For instance, if a participant’s signal was divided into 40 windows, these windows were used in the CV process.
-
(2)
The classification algorithm was then applied to these windows. If the majority of these windows for a specific participant were classified as indicating fatigue, the entire signal for that participant was labeled as fatigue.
This approach allows us to increase the number of samples significantly, providing a more robust dataset for training and evaluation. It also ensures that the cross-validation is performed in a manner that respects the separation of subjects while leveraging the augmented data for better model performance.
PPG signals previously unobserved were compared to the trained model with optimized parameters. Segmented signals were used for decision-making by labeling each window and assigning the predominant label to each signal within the windowed PPG sample. The decision-making process can use the most commonly occurring label assigned to each signal if each window in a sample of windowed PPG data is labelled. Computer components equipped with 64-bit operating systems were used in the development of our solution. CPUs with Core i7 graphics cards and 8 Gigabytes of Random Access Memory (RAM). Initially, this design acquired knowledge at 0.001. For effective learning, 500 to 2000 epochs were recommended. Stochastic Gradient Descent (SGD) significantly improved the performance of the proposed architecture. In order to deploy a hybrid model in production, it must be trained on a single CPU for up to 6 h. Structures recommended by the experts are incorporated into our designs. Four to 6 h are required for each iteration of training and optimizing CNN models. Clinical system design, testing, convergence, training protocols, error estimations, and fine-tuning are critical components of successful validation and training processes. By minimizing training and validation errors, we achieved optimal convergence for each convolutional CNN architecture. In the event that accuracy tests consistently fail or error rates do not decrease, training is immediately stopped.
Model setting
With a small quantity of training data, the study aims to determine the most effective model and parameters. Three types of parameters can be modified in various deep learning models: layers, kernels, and kernel size. We anticipate that performance will fluctuate based on both temporal and spatial factors due to the decrease in training continuous PPG signals. We augment the BILSTM model with either ResNetCNN or Xception in order to improve precision. Additionally, BILSTM, ResNetCNN, and Xception networks were used as deep learning models in this study. Data characteristics were obtained using ResNetCNN and Xception networks, each of which was trained with a separate setup.
A one-layer BILSTM was used to investigate the temporal transmission characteristics of the PPG wave. Table 3 shows the simulated parameters. For one-dimensional data, we evaluate both spread and deep models. Among many models, we prioritize identifying the most effective parameter combinations. Both ResNetCNN and Xception parameter sets are included in the tables. With limited training data, the main goal is to determine the optimal number of layers and to select a good kernel. Various parameters were used in this study to compare the classification efficacy of ResNetCNN.
At layer 31, the study specifically analyzed the performance of ResNetCNN with different layer configurations (21, 26, 31, 36, and 41) and kernel sizes (13, 14, 15, 16, and 17). Additionally, the study evaluated the performance when the kernel size was 15 and the layers were 31, with kernel sizes ranging from 37 to 41. This comparison evaluated the performance of neural networks when operating with less data. Aim of this study was to determine the most effective network parameter for extracting characteristics from data. A number of parameter configurations were used in this study to assess Xception’s categorization skills. Layers = 31, 34, 37, 40, and 43; kernel = 30, 31, 32, 33, and 34 at layers = 37; and kernel = 32 and layers = 34, 35, 36, 37, 38.
Assessments and comparison
An advanced deep learning model utilizing a multi-interest network is proposed. Four performance metrics are examined in this study: F1-measure, Recall, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). Prediction precision can be evaluated using recall and the F1-measure. PPG signals are analyzed using a portion of the available data. A system’s prospective effectiveness can also be assessed using the F1-measure, a statistical technique. A calculation of the average of precision and recall scores is used to make the determination. Under a threshold value K, accuracy (PrK) is the proportion of suggestions that are correct compared to the total number of suggestions. F1(K) is sometimes referred to as F1-measure, and it is defined as follows:
$$F_1(K)=2\times \frac\textPr_K\times \textRe_K\textPr_K+\textRe_K.$$
(16)
By using two distinct measures, statistical analysis evaluates the accuracy of prediction systems. It is possible to determine the absolute inaccuracy of \(|\widehat\textr_\textu_\texti-\textr_\textu_\texti|\) by comparing the projected rating \(\widehatr_u_i\) with the actual rating rui from the Test collection. Here, the quantity of tests is represented by |Test|. All combined test results are represented by the mean absolute error (MAE). To obtain the standard deviation of all test instances, use the root-mean-square error (RMSE). By reducing RMAE or MSE, idea development shows enhanced performance.
$$\textMSE = \mathop \sum \limits_r_ui \in \textTest \frac{\left( \hatr_u_i – r_u_i \right)^2 }\left.$$
(17)
In addition,
$$\textMAE = \mathop \sum \limits_{r_ui \in \textTest} \frac{{\left| {\hatr_u_i – r_{u_i } } \right|}}{{\left| \textTest \right|}}.$$
(18)
Moreover, we define that MSE = (RMSE)2. A decision-making model can be developed using similar methods as shown in Table 4. As well as focus groups and criteria of data analysis, Xcepetion, ResNet, and a combination of these two architecture methods were used.
Learning fatigue detection is validated by performing fivefold (subject-dependent) cross-validations on the dataset obtained by the person. Training and assessment are conducted on a sample of 11 individuals and a separate individual, resulting in a total of 12 participants. 440 samples are used for training, and 40 samples are used for testing. The process is iterated 11 times using new individuals. There are four performance metrics in Table 5: area under the curve (AUC), F1-score, accuracy, and precision. Based on the four measurements in the last row of Table 5, the suggested technique accurately detects learning fatigue. A proposed approach may also be able to determine if an individual is fatigued or not based on data from other individuals. In order to do so, we consider all four metrics above a threshold of 0.825 in the last row of Table 5. The threshold value of 0.825 was determined through a combination of empirical analysis and optimization techniques.
In addition, a comparison of several deep models can be seen in Table 5. First of all, the suggested feature set outperforms the two most advanced feature sets47,48. Moreover, the proposed strategy performs better than the other approaches that use single data. As shown in Table 5, the suggested method achieves an accuracy of 0.918, which is much higher than the accuracy of the other individual data methods.
Discussion
PPG data was analyzed using several deep learning models, including LSTM, ResNetCNN, Xception, ResNetCNN with BILSTM, and Xception with BILSTM. Additionally, it analyzed how time series and geographical characteristics affect network structures. In this study, various parameters were used to compare the network performance of ResNetCNN and Xception with BILSTM. After 500 iterations, the precision of features derived from ResNetCNN with BILSTM or Xception with BILSTM exceeded that of ResNetCNN, Xception, or LSTM individually. The results of several models are shown in Table 5. Fatigue detection was reliably identified in the assessed datasets regardless of whether ResNetCNN or Xception were used individually. The LSTM network design, however, did not provide adequate resolution performance. Xception, ResNetCNN, and BILSTM are presented in Table 5. While ResNetCNN achieves 89% accuracy rate, its recall and precision are considered insufficient.
After 500 training epochs, both the validation set and the model’s accuracy stabilized. Figure 5 shows the same simulation parameters. For the training datasets, Fig. 5 shows the loss, or mean squared error. Due to the substantial amount of interference detected in the study’s unfiltered data, the open dataset yielded disappointing results. In spite of this, the study proved to be successful in validating the Xception with BILSTM model in terms of its overall accuracy.
As shown in Fig. 6, multiple techniques provide precise categorization of student exhaustion during physical activity, including both fatigue and non-fatigue situations as confusion matrix. A total of 37 Xception layers are used in the BILSTM model, with a kernel size of 32, a kernel size of 36, and a kernel size of 36. When data is imbalanced, deep learning systems may have difficulty classifying it.
PPGs were used as an indicator of students’ fatigue levels following physical activity in this study. Regardless of whether the activity was intense or low-intensity, the PPGs obtained from the students’ wrists were reliable indicators of their physical activity levels in sports.
Intermediate health levels may be more challenging to assess. Using advanced artificial intelligence methods, students can assess their physical fatigue based on the results of the experiment. In order to compile physiological characteristics for pupils, mathematical computations, noise reduction, and model assessments were used. We used ResNetCNN with BILSTM and Xception with BILSTM, taking into account the one-dimensional nature and temporal correlation of physiological signals, computational capacity, and deep learning’s ability to reduce dimensions. 91.8% of the forecasts were successful.
Data were collected using a standard running test, which may not capture all physiological recordings of students’ physical activity. According to official government documents, this limitation should be considered when predicting their health condition. Despite our acknowledgement that further physical tests can yield more reliable results, long-distance running can be used to assess cardiopulmonary function and is easier to supervise than activities that require sitting in front flexion. For the evaluation model, the current technique has successfully achieved a commercial level of accuracy. In the future, the quality of the model will remain uncertain if signals from other sports are incorporated. Overall, this paper focused primarily on the following topics:
-
(1)
As a result of our investigation of the relationship between students’ physiological data and their levels of physical activity during moments of exhaustion, we believe that some key factors can be useful for predicting when students will feel tired.
-
(2)
We developed a model for predicting fatigue outcomes in a running physical test. The model uses an enhanced ResNetCNN with BILSTM. By using the proposed model in biosensing recordings, additional predictive tasks might be possible.
-
(3)
Results of the experiment demonstrate the feasibility of predicting students’ fatigue levels using biosensing data.
Moreover, by leveraging edge computing, the algorithm can analyze physiological signals such as heart rate, respiratory rate, and body temperature directly on the device where the data is collected from smartwatches, which serve as the primary sensors. This significantly reduces latency and ensures immediate feedback. While the smartwatches function as local processors analyzing fatigue conditions, edge computing acts as an intermediate layer in cloud environments to handle the substantial data received from the smartwatches. This processing not only enhances the system’s responsiveness but also reduces the need for constant connectivity to cloud servers, preserving bandwidth and improving data privacy. In educational settings, where timely detection and intervention are critical, edge computing ensures continuous and efficient monitoring of fatigue, enabling educators and students to make informed decisions promptly. This approach is especially beneficial in environments with limited or unreliable internet access, ensuring the reliability and robustness of the fatigue detection system.
Our paper presents a novel edge computing algorithm designed specifically for the real-time detection of fatigue in students. This algorithm leverages hybrid deep learning models that combine CNN, Xception architecture, and BILSTM architectures to process and analyze physiological signals such as heart rate, respiratory rate, and body temperature.
By deploying the algorithm on edge devices, we ensure low-latency processing and immediate feedback, which is crucial in a real-time educational environment. This approach also reduces dependency on cloud resources, enhancing data privacy and reducing network bandwidth usage.
The hybrid deep learning models were rigorously validated using a comprehensive dataset, demonstrating high accuracy and robustness in detecting fatigue states. The use of CNN, Xception architecture, and BILSTM architectures allows the model to capture both spatial and temporal features from the physiological data, improving overall detection performance.
Our study’s findings have immediate practical implications for improving health monitoring and training regimens in educational settings. The ability to detect fatigue accurately and in real-time can significantly enhance the safety and performance of young athletes. Future work will focus on expanding the participant base and collaborating with other institutions to gather more extensive datasets, further solidifying the generalizability and applicability of our findings.
link