Epilepsy is a common neurological condition that causes recurrent and unprovoked seizures. Epilepsy is a central nervous system disease that causes abnormal behavior and sometimes even loss of awareness in a patient. About seventy million people in the world are affected by epilepsy. The epileptic seizure may be related to brain damage or hereditary, in which the cause is often completely unknown. Electroencephalogram (EEG) signals which monitor brain activity are generally analyzed by neurologists and specialists to detect and categorize various types of disease and to identify regions indicative of pre-ictal spikes and seizures. The presence of numerous spikes in the EEG signals is an indication of epileptic seizure activity in the brain.
Normally in clinical environments, diagnosis of seizure in patients involves continuous monitoring using video and EEG signals recorded over long periods. Human experts are then required to manually review the data based on the visual inspection to arrive at a clinical interpretation. This is time-consuming and there is a lack of sufficient expertise. Hence automation of seizure detection is essential. Automation systems require features extracted from the signal. Several techniques exist for extracting the feature either in the time, frequency, or time-frequency domains. Due to the nonlinear and non-stationary nature of the EEG signals, features based on the time-frequency domain are used for detecting epileptic signals[1–3]. Empirical mode decomposition technique and Fourier-Bessel expansion are used for computing the mean frequency of intrinsic mode functions (IMFs) to discriminate ictal from interictal EEG signals. Recurrence quantification analysis (RQA), wavelet transform and multi-wavelet transform are used for classification of the EEG signal into three classes such as normal, interictal and ictal in. Alternatively, a pattern recognition approach that recognizes the recorded EEG signals on cognitive conditions focusing more on improving classification accuracy is proposed. Numerous machine learning algorithms like k-nearest neighbor (k-NN), naive Bayes (NB), random forest, artificial neural networks (ANN), support vector machine (SVM), decision trees, least square-support vector machine (LS-SVM), general regression neural network (GRNN) and mixture of expert model have been proposed to classify the abnormality from the EEG data.
Based on information from magnetoencephalogram, electromyogram, electrooculogram, electrocardiogram and EEG, nonlinear dynamic techniques are effectively used in biomedical applications[2–8]. This study focuses on modeling the nonlinear dynamics of the brain. As universally accepted, the brain is regarded as a chaotic dynamic system, and it produces EEG signals that are usually chaotic. In another sense, an EEG signal is chaotic, as its amplitude changes randomly over time. These chaotic signals are characterized by long-term unpredictability, which makes classical signal processing techniques less helpful. Modeling the dynamics is a challenge when using conventional features/models. We use reconstructed phase space (RPS) techniques developed for chaotic signal analysis on the epilepsy dataset from the University of Bonn (UoB) and show improved classification accuracy for 22 different class combinations.
The use of RPS trajectory images as input to a convolution neural network helps to model the dynamics. Further, the deep neural network results in an end-to-end system, eliminating the need to handcraft features for modeling. The end-to-end system proposed performs at par or better than other state-of-the-art systems reported in literature. A pre-trained convolutional neural network of the AlexNet architecture is retrained with RPS images extracted from the dataset to classify the data into different classes. Representative RPS images for each of the five classes of UoB dataset are shown in Fig. 1.
Due to the tedious nature of observing EEG signals in clinical settings, researchers have employed machine learning approaches for automating the detection of seizure classification of EEG signals with promising results. In 1979, Gotman et al is one of the pioneers who helped open up the research in seizure detection, used sharp and spike waves for an automatic recognition system that used prolonged EEG recordings to detect interictal epileptic activity. Later on, he focused on using functional magnetic resonance imaging to examine automatic seizure detection with high-frequency activities in the wavelet domain.
Using UoB dataset, researchers have examined several techniques for the automatic detection of epilepsy. SVMs have commonly been used as a classifier to distinguish seizure vs. non-seizure using features based on discrete wavelet transform (DWT), tunable Q-wavelet transform and recurrence quantification analysis with an accuracy of 96.3%, 98.6%, and 94.4%, respectively. Shoeb used SVM for patient-specific prediction which resulted in 96% accuracy. Gandhi et al utilized a probabilistic neural network (PNN) in combination with SVM which resulted in an accuracy of 95.44% for classifying class combination ABCD-E. Sharmila et al studied fourteen different combinations of classes using statistical features extracted from DWT coefficients and applied naive Bayes and k-NN classifiers. A GRNN was employed for the classification of ictal and non-ictal states in. Both the studies[7,19] achieved maximum accuracy of 100% for A-E (normal vs. seizure) cases. In a study by Nicolaou et al, where entropy-based features were employed an accuracy of 93.55% was attained for A-E cases and 86.1% for ABCD-E.
A probabilistic approach to modeling the distribution of the classes using Gaussian mixture model (GMM) by Chua et al resulted in an accuracy of 93.11% for three classes (normal, interictal and ictal) when using higher order spectra (HOS) features and 93.1% classification accuracy when using power spectral density features.
Deep learning approaches in machine learning are currently outperforming the state-of-art performance of conventional machine learning algorithms in numerous domains. Employing deep learning methods, Ishan Ullah et al used pyramidal one-dimensional convolution neural network (P-1D-CNN) and achieved the maximum accuracy of 100% for A-E class combination. In the P-1D-CNN, novel data augmentation schemes and an effective deep CNN model were used for the classification of UoB dataset. Rajendra Acharya et al reported accuracy of 88.67% using a CNN with thirteen deep convolution layers. Table 1 reviews selected studies on EEG classification using the UoB dataset with the features, classifiers used and their accuracies.
Authors Methods used Accuracy (%) Classifiers Sharmila et al Conventional features from DWT 95.10–100.00 NB + k-NN Guo et al Genetic programming 93.50 k-NN Tzallas et al Time-frequency features 97.70–100.00 ANN Guo et al Line length features on wavelet transform 97.77–99.60 Kumar et al DWT based approximate entropy 92.50–100.00 ANN + SVM Sharma et al 2D and 3D phase space representation of intrinsic mode functions 98.60 LS-SVM Manish et al Analytic time frequency – flexible wavelet transform 92.50–100.00 Bhattacharyya et al Tunable Q-wavelet transform 98.60 SVM Acharya et al Recurrence quantification analysis 94.40 Nicolaou et al Permutation entropy 79.94–93.55 Acharya et al DWT 96.30 Subashi et al DWT 98.75–100 Swami et al Dual tree complex wavelet transform 93.30–100.00 GRNN Chua et al Higher order spectra features 93.10 GMM Acharya et al Nonlinear parameters: approximate entropy, correlation dimension, Hurst exponent, fractal dimension 95.00 Ullah et al Data augmentation schemes 99.10–100.00 P-1D-CNN Acharya et al 13-layer deep CNN model 88.67 CNN NB: naiveBayes; k-NN: k-nearest neighbor; ANN: artificial neural networks; SVM: support vector machine; LS-SVM: least square-support vector machine; GRNN: general regression neural network; GMM: Gaussian mixture model; DWT: discrete wavelet transform; CNN: convolution neural network; P-1D-CNN: pyramidal one-dimensional convolution neural network.
Table 1. Seizure classification studies on the UoB dataset
The PSR signal provides a visualization of the signal's dynamic behavior over time which is useful to guide model specification. In some studies, two dimensional (2D) and three-dimensional (3D) PSRs of the IMFs are used for the classification of EEG signals. Using dataset of Graz University of Technology, the PSR technique has been used for the three-class combination of motor imagery classification. Based on 2D RPS plot, from Physionet CHB MIT database, central tendency measure (CTM) was used to compute the region of 2D RPS plots to differentiate between seizure and seizure-free EEG signals.
The rest of the paper describes in detail the process involved and discusses the basic details about RPS, convolution neural network, its layers and transfer learning. Next, it describes the Bonn Dataset, the design of the proposed system to classify EEG signals based on RPS images, and finally deals with the experimental results and performance of the system.
The University of Bonn, Department of Epileptology, Germany, provides an open-source epileptic EEG dataset. Bonn database is a widely used benchmarking dataset for validating seizure detection models. The dataset uses the 10-20 international electrode placement system for acquiring the data. This experimental dataset includes five sets of EEG data A, B, C, D, and E as shown in Table 2. Sets A and B contain normal EEG signals recorded from five healthy subjects. The remaining sets of C, D, and E were recorded from epilepsy patients. The sets A and B of normal subjects were relaxed in an awaken state and represent EEG recordings with their eyes in open and closed states respectively. Before an epileptic attack, set C was recorded opposite to the epileptogenic zone, while set D was recorded in the epileptogenic zone. These two sets C, D represent the interictal state. Set E was recorded during an occurrence of epileptic seizure (ictal) signal in an epileptogenic zone.
Dataset Subject details Patient status Electrode type Electrode placement No. of epochs and duration (second) Set A Five healthy subjects (normal) Awaken state with eyes open Surface International 10-20 system 100 and 23.6 Set B Awaken state with eyes closed Surface International 10-20 system 100 and 23.6 Set C Five epilepsy patients Interictal (seizure-free) Intracranial Opposite to epileptogenic zone 100 and 23.6 Set D Interictal (seizure-free) Intracranial Within epileptogenic zone 100 and 23.6 Set E Ictal (seizure) Intracranial Within epileptogenic zone 100 and 23.6
Table 2. Description of the EEG database of UoB
Each set of data recorded with a 128-channel amplifier system comprises 100 files corresponding to single-channel EEG segments and the duration of each sample recording is 23.6 seconds with a sampling rate of 173.61 Hz. A band-pass filter with a passband of 0.53 Hz–40 Hz (12 dB/oct) was used to select the EEG signal of the required band. As a result of visual inspection of artifacts (e.g., owing to a pathological activity or eye movement), these artifacts were removed from the continuous multi-channel EEG recordings. From all recording locations with ictal activity, the EEG segments were selected for set E.
Thus each recording data contains 4 097 samples which have been split into segments of 510 samples each to generate many instances from one record. These segments form the basis for all further processing.
The proposed system uses RPS images that are extracted from segments of the EEG signal in the UoB dataset. The RPS image is used to test a CNN of the AlexNet architecture, so as to classify the RPS image into epileptic and non-epileptic classes. The functional flow diagram is shown in Fig. 2.
The observational dataset includes five sets of EEG data A, B, C, D, and E. These sets pertain to normal, seizure-free (interictal) and epileptic seizure (ictal) signals. The choice of segmenting each recording into 510 samples is because RPS portraits are too dense for larger samples and CNN cannot capture the intrinsic feature the phase space depicts. Therefore, by trial and error, 510 samples are found to be the most suitable segment size to represent RPS portraits as well as to generate sufficient images to retrain the CNN model. Fig. 3 shows the process flow diagram for the dataset creation.
Embedding dimension 'm' and time delay 'τ' are the critical parameters of the RPS portraits. FNN is chosen for finding embedding dimension, while mutual information is used for finding the time delay. The embedding dimension is based on the FNN percentage where it effectively drops to zero as depicted in Fig. 4A and the time delay is based on mutual information at which the first minimum occurs as depicted in Fig. 4B. After running multiple experiments we arrived at an appropriate delay of τ=6 as seen from Fig. 4B. Likewise Fig. 4A, it can be seen that the ideal embedding dimension would be m=3, but since we are working on two dimensional (2D) CNN we restrict our dimension to m=2. Every segment is transformed into a RPS image and these images are extracted for the non-overlapping segments with the dimension m=2 and time delay τ=6.
In this study, a pre-trained AlexNet model is used as it shows better performance in classifying epileptic seizure sets as compared to another state of the art of CNN models namely LeNet and GoogLeNet as shown in Table 3. AlexNet changed all the records of pre-existing non-deep learning-based techniques. AlexNet contains five convolution (Conv) layers and three fully connected (FC) layers. Each convolution layer consists of 96 to 384 filters and the size of the filters ranges from 3×3 to 11×11 with the feature map of 3 to 256 channels each. In each layer, a non-linear ReLU (Rectified Linear Unit) activation function is used. ReLU is an important feature of AlexNet instead of the tanh or sigmoid activation function used to train a model for a neural network. The primary reasons for using ReLU in Convolution layers are faster convergence owing to the lack of vanishing gradient problem and inducing sparsity in the features. 3×3 Max pooling is applied to the outputs of layer 1, 2 and 5. In the first layer, a stride of 4 is used to reduce the computation.
CNN model/Class combinations LeNet
A-E 99.44 96.67 100.00 B-E 99.44 96.67 99.44 C-E 50.00 98.33 98.89 D-E 96.67 97.78 97.22 AB-E 66.67 94.44 99.26 CD-E 66.67 97.41 99.63 ABCD-E 99.11 94.22 98.67 CNN: convolution neural network.
Table 3. Comparison of AlexNet with LeNet and GoogLeNet
Local Response Normalization (LRN) is used in the first and second layers before max pooling. When compared to LeNet, AlexNet applies larger weights and the shape varies from layer to layer. To handle overfitting, dropout is used instead of regularization. The training time, however, is doubled by a 0.5 dropout rate. AlexNet model's tensor sizes (images) and number of parameters of convolution layers are shown in Fig. 5.
For training the proposed model, 90% of epileptic and non-epileptic RPS images of the dataset (section 5.1) are utilized (75% training and 15% validation). 10% of the data is reserved for testing. To optimize the performance stratified 10-fold cross validation (CV) is performed. The 10-fold CV splits the data at random into 10-disjoint sub-sets called folds. The stratified folds maintain the mean response value in all folds, which is approximately equal. Each fold holds the same proportions of the two types of class labels, namely epileptic (Set E) and non-epileptic (Sets A, B, C, and D) classes.
Training of AlexNet model requires the weight (Kernels) to be learned from the data. We use backpropagation with cross entropy as the loss function along with the stochastic gradient descent for optimization to learn these parameters. The model is trained for image classification using the Caffe framework. This model is trained for 200 epochs with learning rate (0.01), batch size (128), weight decay (0.0001), gamma (0.1), and momentum (0.9) as hyper-parameters. The outcome is that the model's depth is significant for its high efficiency, which is computationally expensive but made possible using graphics processing units (GPUs). Several other complicated CNNs can perform very effectively on faster GPUs, even on large datasets. Using K80 GPU machine, this step takes around 45 minutes for one run of training for computation on UoB datasets. As the model complexity increases, the computational complexity during training and testing also increases. The retrained AlexNet model in Fig. 6 shows the visualization of convolution filters of the CNN model trained on RPS images of the UoB dataset.
In this work, twenty-two different combinations of classes were considered for classifying the segments into being epileptic/non-epileptic conditions. To evaluate the performance of the proposed RPS based deep learning approach, the performance was evaluated based on standard metrics like classification accuracy, sensitivity, specificity, precision and F-score for all the binary and ternary classes as described below.
Here, TP refers to the number of images that are actually epileptic and predicted as epileptic class, FP indicates actually non-epileptic class predicted as epileptic. TN refers to actually non-epileptic class predicted as non-epileptic class, while FN indicates actually epileptic and predicted as non-epileptic class.
Table 4 shows the classification accuracy, sensitivity, and specificity for 15 different binary class combinations. The corresponding confusion matrices for a few binary classes are shown in Table 5. Likewise, the performance of the system for ternary class combinations is given in Table 6.
Class combination Accuracy (%) Sensitivity (%) Specificity (%) A-E 100.00 100.00 100.00 B-E 99.44 100.00 98.90 C-E 98.89 100.00 97.83 D-E 97.22 97.75 96.70 AB-E 99.26 100.00 98.90 AC-E 99.63 100.00 99.45 BC-E 98.52 100.00 97.83 BD-E 97.78 98.84 97.28 CD-E 99.63 100.00 99.45 ABC-E 99.17 98.88 99.26 ACD-E 98.61 98.84 98.17 BCD-E 98.06 100.00 97.47 AB-CD 98.11 99.43 97.81 AB-CDE 97.33 97.43 97.19 ABCD-E 98.67 100.00 98.36
Table 4. Performance measures of binary class combination
Class combination Positive and negative cases Predicted as non-epileptic Predicted as epileptic Accuracy (%) A-E Actually non-epileptic 90 0 100.00 Actually epileptic 0 90 B-E Actually non-epileptic 90 0 99.44 Actually epileptic 1 89 C-E Actually non-epileptic 90 0 98.89 Actually epileptic 2 88 D-E Actually non-epileptic 88 2 97.22 Actually epileptic 3 87 AB-E Actually non-epileptic 180 0 99.26 Actually epileptic 2 88 CD-E Actually non-epileptic 180 0 99.63 Actually epileptic 1 89 ABCD-E Actually non-epileptic 360 0 98.67 Actually epileptic 6 84
Table 5. Confusion matrices under various test conditions: binary class combinations
Class combination Accuracy Sensitivity Specificity AB-CD-E 96.00 96.80 98.05 A-C-E 94.81 94.87 97.40 A-D-E 95.93 96.07 97.98 B-C-E 97.04 96.34 98.53 B-D-E 95.56 95.80 97.86 A-B-E 93.33 93.82 96.80
Table 6. Performance measures of ternary class combination (%)
The corresponding confusion matrices for selected ternary class combinations are given in Table 7. The results indicate that the proposed system performs best for the case A-E with 100% accuracy. The system performed better than or at par with other results reported in the literature (Table 1).
Class combination Positive and negative cases Predicted as non-epileptic Predicted as non–epileptic Predicted as epileptic Accuracy(%) AB-CD-E Actually non-epileptic (AB) 179 1 0 96.00 Actually non-epileptic (CD) 9 171 0 Actually epileptic 4 4 82 A-C-E Actually non-epileptic (A) 85 5 0 94.81 Actually non-epileptic (C) 7 83 0 Actually epileptic 0 2 88 B-D-E Actually non-epileptic (B) 89 1 0 95.56 Actually non-epileptic (D) 1 89 0 Actually epileptic (E) 4 6 80
Table 7. Confusion matrices under various test conditions: ternary class combinations
The performance of our approach was compared with the state-of-the-art methods of earlier investigations for all twenty-two class combinations. Sensitivity scores of the proposed approach indicate that the percentage of correctly identified epileptic patients for all class combinations is always high and for nine of the classes 100% is achieved. For specificity, the proposed approach achieves 100% for case A-E and very high percentages for the rest of the class combinations.
Table 8 reports the sensitivity and specificity for various class combinations in comparison with results in the literature. To the best of the authors' knowledge, the performance of our sensitivity is better than existing approaches. We see that the proposed approach of using RPS based CNN models perform better than existing approaches for all binary classes with an accuracy of (98.5±1.5)%. For ternary class combinations, the proposed system has an accuracy of (95±2)% except for the C-D-E case where the accuracy is 84.44%.
Class combination Best reported sensitivity
Sensitivity of proposed work
Best reported specificity
Specificity of proposed work
A-E 100.00[7,19,29,31] 100.00 100.00[7,19,31] 100.00 B-E 99.49
98.90 C-E 99.50, 99.30[19,29]
100.00 99.74, 99.40
97.83 D-E 95.23, 96.30
97.75 95.01, 92.60
96.70 AB-E 98.02
98.90 CD-E 97.50
99.45 ACD-E 93.64
98.17 BCD-E 90.12
97.47 ABCD-E 89.92
98.36 AB-CD 90.50 99.43 94.50 97.81
Table 8. Comparison of performance measures: sensitivity and specificity of this work with those of earlier reported studies
Deep learning approach to detect seizure using reconstructed phase space images
- Received Date: 2019-03-21
- Accepted Date: 2019-11-16
- Rev Recd Date: 2019-07-10
- Available Online: 2020-01-24
- Publish Date: 2020-05-01
- epilepsy /
- reconstructed phase space /
- convolution neural network /
- reconstructed phase space image /
- AlexNet /
Abstract: Epilepsy is a chronic neurological disorder that affects the function of the brain in people of all ages. It manifests in the electroencephalogram (EEG) signal which records the electrical activity of the brain. Various image processing, signal processing, and machine-learning based techniques are employed to analyze epilepsy, using spatial and temporal features. The nervous system that generates the EEG signal is considered nonlinear and the EEG signals exhibit chaotic behavior. In order to capture these nonlinear dynamics, we use reconstructed phase space (RPS) representation of the signal. Earlier studies have primarily addressed seizure detection as a binary classification (normal vs. ictal) problem and rarely as a ternary class (normal vs. interictal vs. ictal) problem. We employ transfer learning on a pre-trained deep neural network model and retrain it using RPS images of the EEG signal. The classification accuracy of the model for the binary classes is (98.5±1.5)% and (95±2)% for the ternary classes. The performance of the convolution neural network (CNN) model is better than the other existing statistical approach for all performance indicators such as accuracy, sensitivity, and specificity. The result of the proposed approach shows the prospect of employing RPS images with CNN for predicting epileptic seizures.
|Citation:||N. Ilakiyaselvan, A. Nayeemulla Khan, A. Shahina. Deep learning approach to detect seizure using reconstructed phase space images[J]. The Journal of Biomedical Research, 2020, 34(3): 240-250. doi: 10.7555/JBR.34.20190043|