TUSZ is a subset of the larger EEG signals database from the TUH. TUSZ data set is remarkable by the high number of patients and configurations it contains. Indeed, in the considered version 1.2.1 of this subset released in April 2018, there are 266 patients in the training set (of whom 118 suffering from seizures), 50 in the test set (of whom 38 suffering from seizures), and more than 40 different configurations of electrodes.
TUSZ data set was established by identifying the sessions in the TUH EEG Corpus that were the most likely to contain seizure events. Three annotation tools were used to identify events of clinical interest: Natural Language Processing (NLP), a commercial software, and a three-pass system. Then an annotation team manually annotated the data and showed that the NLP approach had been the most efficient as the first step, revealing again the difficulty to automatically detect seizure events in a large data set of EEG.
This data set contains a huge diversity of configurations, in terms of sampling rate, references or montages (placement and connections between the electrodes). However, two major montages predominate in the data set: "Average Reference" (AR) for which the potentials are measured relatively to the average value of a subset of electrodes and "Linked Ears" (LE) for which the reference is located on the ear, electrically quiet. Previous researches offered more statistics about TUH data sets[16– 17] and more considerations on the montages and their consequences.
Nevertheless, all recordings contain the electrodes of the standard 10/20 placement. We call "channels" the derivations of electrodes obtained with the TCP bipolar montage also called "double-banana". They are shown in Fig. 1, which comes from a previous study.
The recordings are split into six file folders: three theoretically for training purpose and the three remaining for the test. Each folder contains one specific montage, either AR, LE, or a slight variant of AR. This variant is less represented and was not considered for this study.
For each patient, there are one or several sessions themselves containing files related to one or more recordings. There are five files for each recording. The .edf file contains the raw EEG signals and a header containing useful information (frequency, duration, date). The .lbl file contains event-based annotations that catch the beginning and end of each event on every channel of the 22 shown in Fig. 1. An event is either background or a particular type of seizure. Practically those annotations may indicate if an ictal phase begins or ends earlier or later on some channels. The .lbl_bi is very similar to the previous one except that the events are just binary (bckg and seiz). The .tse file contains label-based annotations that catch the beginning and end of each event globally for all the channels. These are the most frequently used in Machine Learning research as they allow to give a single overall target for each time step. The .tse_bi file is very similar to the previous one except that the events are just binary (bckg and seiz). For each session, there is also a .txt file that contains general information about the patient, the recordings, in a non-structured way.
Epileptic seizures are usually classified into two main categories. The first category is the generalized crisis which begins bilaterally and synchronously from the outset in both cerebral hemispheres. The second category is the partial or focal crisis which starts in a localized part of the brain. It can be accompanied by a loss of consciousness; the abnormal electrical activity that started from the focused location in the brain can gradually extend to the entire brain; the crisis may end in convulsions.
However, TUSZ is following a more refined classification. In this data set 10 different types of seizures are defined, though some of them are less represented. Several types of seizures are covered by the generalized class or by the focal one. Among the generalized seizures, we find the "absence" which results in a momentary alteration of consciousness, the "tonic crisis" characterized by a strong variation of muscle tone, the "clonic crisis" characterized by jerks and the generalized "tonic-clonic" crisis characterized by convulsions, and also the atonic and myoclonic seizures. Among the focal seizures, we can specifically distinguish the complex and simple partial seizures if respectively there is a loss of consciousness or not. In Table 1, we have extracted from the data set all the information about the distribution in terms of segments of each type of seizures in several folders. The segments have in the present case a duration of one second. We can note in this table that non-specific seizures are dominant.
Seizure code Seizure type AR
FNSZ Focal non-specific 13 088 23 301 10 186 3 418 GNSZ Generalized non-specific 6 743 14 226 10 230 122 SPSZ Simple partial 1 327 364 167 0 CPSZ Complex partial 8 551 8 114 5 811 784 ABSZ Absence 14 0 479 339 TNSZ Tonic 424 857 0 0 CNSZ Clonic 0 0 0 0 TCSZ Tonic clonic 795 1 335 938 2 343 ATSZ Atonic 0 0 0 0 MYSZ Myoclonic 0 1 178 0 0 AR: average reference; LE: linked ears.
Table 1. Seizure segments distribution by type in the different folders
Since we worked on the TUSZ data set, we tackled the problem of seizure detection through different angles. Prior work was based on the transformation of EEG signals into features and the comparison of several classification models. We noticed during that experiment the good performances of ensemble learning methods as random forest and gradient boosting. Moreover, the use of high-performance version of gradient boosting like XGBoost allowed to considerably reduce the training time, and thus to be more focused on fast iterations and optimization of parameters and hyper-parameters. We also noticed that the results were good when training and test sets were built after mixing non-overlapping ictal and inter-ictal segments on the whole population of recordings. Unfortunately, this method led to a kind of data leakage as a segment of one recording was not really different from its direct neighbor of the same state. Hence, results were less impressive when we tried to classify EEG segments on new patients. This work was using the first version of the data set and limited to 59 patients for the training.
Interested by the abilities and the promises of some deep learning algorithms to directly extract features from complex time-series without preprocessing, we investigated that field without the expected success, as dealing with raw data is really resources-demanding. Another attempt by using EEG features with a deep learning algorithm (Long Short-Term Memory, LSTM) came crashing down on the wall of bad performances related by the TUH literature.
Based on these previous approaches, we made several observations. Firstly, the EEG features approach is still relevant for performance issues. Secondly, XGBoost is efficient and allows it to quickly iterate. Furthermore, there is a bad generalization of new patients in the test set.
Taking up the last point, there are two ways to solve this issue. Either we move backward and consider patient-specific modeling, or we increase the amount of training data for generic modeling. The second choice is more relevant and it would be indeed interesting to see to which extent the increase of the amount of data would help to improve performances. EEG features and XGBoost will be the baseline of the two methods introduced hereafter. The first method aims to evaluate the performance of a classic split in terms of train, validation and test sets on version 1.2.1 of TUSZ. With the second method, we search to increase the training data set by replacing the classic split into a leave-one-out approach. It is a common technique used to increase a training set when there is a lack of data.
The method is presented in Fig. 2. We first applied a preprocessing step to extract features from the raw EEG signals. These features are temporal and frequential. We based ourselves on the literature review and Python's library PyEEG. A total of 22 features were then considered, which are statistical parameters (mean, variance, skewness, kurtosis, interquartile range, minimum and maximum), other temporal features (Hjorth complexity and mobility, Petrosian fractal) and frequency features (spectral density in the five frequency rhythms: alpha, beta, gamma, delta, theta), the corresponding ratios, spectral centroid, and monotony. As for the frequential features, the five frequency ranges were defined according to previous studies. Delta range was the lowest, between 1 and 4 Hz, theta was between 4 and 8 Hz, alpha between 8 and 13 Hz, beta between 13 and 35 Hz and gamma 35 Hz upwards. Gamma rhythm was rarely modified by a seizure.
We split the raw EEG signal into non-overlapping 1-second segments and computed these features for each of the 22 channels in the bipolar montage, resulting in a 22×22=484 features vector for each segment. This preprocessing step was computed offline and each feature matrix n_segments×n_features was stored by recording.
After this preprocessing step, the segments of the TUSZ training set were randomly split in 80% of train and 20% of validation. Then XGBoost with 400 estimators of depth 3 was trained. The ratio parameter was used to take into account class imbalance between non-seizure and seizure segments. The learning rate was set to 0.1. To avoid problems of speed and memory, we chose to work with "gpu_hist" as tree method, which is the XGBoost fast histogram algorithm for GPU. The internal evaluation metric for the validation data was set to "AUC".
Once the model converged, the model was tested on all the recordings of the test set. For each recording, the model tried to predict the class of each segment. A smoothing function was then applied to those predictions to avoid isolated and improbable seizure/non-seizure states. This function was a sliding window on 19 segments applying a simple majority vote. At the end of the test set classification, we had confusion matrices for all the recordings and we could compute confusion matrices by patient or by types of seizures. Similarly, an overall confusion matrix was computed by summing all the recording confusion matrices.
The second method is similar to the first one except that we applied a specific partitioning technique to increase the training data. This method, detailed in Algorithm 1, is close to a cross-validation method called leave-one-out cross validation (LOOCV). Actually, one should rather call it in this case LOOC-Test as explained below, as a single patient is left out for the test at each iteration. Similarly to a cross-validation method, we combined at the end the test results from the multiple rounds to come up with an estimate of the model's predictive performance. This second method is summarized in Fig. 3.
Algorithm 1 Leave-one-out cross test with XGBoost 1: for i in test_patients_indexes do 2: LeaveOneOutPatient= (X, y)[i] 3: other-patients: (X, y)\LeaveOneOutPatient 4: XGBoost.fit (other_patients, validation_split=0.2) 5: for record in LeaveOneOutPatient do 6: pred=XGBoost.predict (X[record]) 7: pred=smooth(pred) 8: Confusion_matrix (y[record], pred) 9: end for 10: end for
For each recording, we had a matrix of size matrix n_segments×n_features and the binary labels for each time segment (seizure or background). In the algorithm, one can see X as a list of length number_of_patients where each component is made of these features matrices for all the recordings of that patient, while y is the corresponding binary target vector. We also needed to store separately the IDs of the patients in the TUSZ train sets and test sets.
Then, the algorithm could be applied. At each iteration, one patient was left out of the training data and was kept for the test. The segments of the other recordings were randomly split in 80% of train and 20% of validation. Then similarly to the first method, a XGBoost model was trained, conserving the same parameters.
Once the model converged, the model was tested on each recording of the patient left away. Thus, for each recording, the model tried to predict the class of each segment. The smoothing function was still applied as post-processing.
The number of iterations of the LOOCT method was equal to the number of patients in the original test set present in the TUSZ. These were the only patients that are "left out" by the algorithm in order to have a kind of comparison to the first method in terms of population.
At the end of all the iterations, we had confusion matrices for all the recordings and we could compute a confusion matrix by patient or by type of seizure. Similarly, an overall confusion matrix was computed by summing all the recording confusion matrices.
The seizure detection is a classification problem, thus the evaluation of the models was based on confusion matrices (Fig. 4). The number of true positives (TP) was in this case the number of EEG segments correctly classified as seizure; the true negatives (TN) were the number of EEG segments correctly classified as background; the false positives (FP) were the number of EEG segments classified as seizure that were actually background (False Positives are also known as false alarms in the medical field); the false negatives (FN) were the number of EEG segments classified as background though it is actually seizure.
The confusion matrices are used to compute some interesting ratios. Specificity, also known as true negative rate, is defined by
Sensitivity (or Recall) is the probability to detect an element of the positive class, and it is defined by
Precision is the fraction of correct instances among all the detected instances, it is defined by
Finally, the last metric we had been interested in is a measure of accuracy, a metric that evaluates the overall quality of the model. The standard accuracy might not be the best metric to look at, as the classes in the present case are highly imbalanced. We used instead another measure of accuracy, the more appropriated F1-Score, which is the harmonic average of the precision and recall
In the literature presented above, sensitivity and specificity were mainly used, as well as the false alarm rate. It is obviously an important metric in the case of seizure detection, and we also communicated results in this sense for comparison. Note that, in this literature, sensitivity and specificity were computed using the OVLP[8,23]. OVLP is more a method to calculate metrics in a permissive way than a metric per se. OVLP is based on an event decomposition of the EEG signal. Instead of looking at each segment (also called epoch), OVLP only considers the events (i.e. continuous sequence of seizure or background) that occur. Fig. 5 shows a potential drift of the method. The signal above shows the reference, divided in segments of equal length while the signal below is the hypothesis, i.e. the sequence predicted by the model. While one would count true/false negatives/positives by matching each pair of corresponding segments, OVLP considers that the true positive rate is 100% as soon as the hypothesis overlaps the reference for at least lambda segments, where lambda is often set to 1 to guarantee that the events do overlap. Thus, in the example, true positive rate is 100% while traditional metrics only counts one segment as a true positive.
In Table 1, we have deliberately retrieved the distribution of types of seizures in terms of seizure segments instead of events, because it seems to us that the impact on the models of the number of segments is more relevant and reliable. Effectively the global ratios of events and of segments are different. For instance, in AR train, there are 77 events of non-specific generalized seizures for 6 743 segments. In the LE train, there are 35 events for 10 230 segments. It means that the non-specific generalized seizures are way longer in the montage LE. This information is hidden if we count by events.
Firstly, to get significant results in reasonable time, we started by testing the two described methods on the AR test set. The results are thus corresponding to 36 patients, 899 recordings containing 562 seizure events, and more than 49 000 seizure segments. The population of each subset is recalled in Table 2. The events and segments are divided between all types of focals and all types of generalized seizures. As we mentioned before, Table 1 presents a more detailed classification, but the two non-specific sets are dominant and the other types are not enough populated, so for a more refined analysis regarding the results and the content of the corresponding training sets, we have decided to consider the two main seizure classes: focal and generalized seizures.
Patients Recordings Focal events Generalized events Focal segments Generalized segments AR train 124 1 220 376 102 22 966 7 976 LE train 129 322 310 100 16 164 11 647 AR test 36 899 312 250 31 779 17 596 LE test 24 49 38 44 4 202 2 804
Table 2. Cardinalities of the selected sets
The results of the experiments are summarized in Table 3. The ones concerning the AR Test as target are in the first section and numbered from 1 to 4. As one can see, due in particular to the LOOCT method, we took a few liberties with the initial destinations of each subsets. So, in the first section for instance, AR Test was used in the training set as well. For this round of experiments, sensitivities by type of seizure are summarized in Table 4.
AR train LE train AR test LE test First method Second method Sensitivity Precision Specificity F1 (%) 1 train test x 35.80 31.65 92.46 33.60 2 train train test x 41.74 35.17 92.50 38.17 3 train train-test x 52.66 27.55 86.50 36.17 4 train train train-test x 53.87 27.57 86.20 36.47 5 train test x 71.65 28.67 73.04 40.95 6 train train test x 66.70 34.06 80.47 45.09 7 train train train test x 78.72 33.54 76.41 47.04 8 train train train-test x 72.59 39.61 83.26 51.25 9 train train train train-test x 71.61 40.01 83.76 51.33
Table 3. List of experiments – selected sets, methods and results
Seizure type Experiment 1 Experiment 2 Experiment 3 Experiment 4 Focal 23.02% 27.68% 35.02% 37.61% Generalized 59.50% 66.24% 81.51% 81.10%
Table 4. Sensitivity by type of seizure
We were expecting a bit more from the second method, that is why we started a second round of experiments with LE Test as test set. This subset was smaller than the AR Test; 82 events against 562 for the AR Test. Note that these subsets have 10 patients in common, and these ones were thus filtered out from the training set for the cases where the AR Test set was used in this role. These experiments are numbered from 5 to 9 in Table 3.
The first observation answers the question of this work: by increasing the amount of data, we improve significantly the detection score. However, by looking at the F1-score, this needs to be discussed as it is mainly true for the first method using the classic split of the data set. As one can see, in the experiment 2, by adding LE Train to the training set, all metrics are improved compared to the experiment 1, where the training set only consisted of the AR Train set; the F1-score increases by 5% to reach 38%. That is less clear with the second method LOOCT, the sensitivities are higher than those with the first method, but we observed a high number of False Positive, which degrades the precision and the specificity. This translates into a stagnant F1-score at 36% for the experiments 3 and 4.
The second round of experiment with LE Test as test set confirmed the interest of adding the LE train set in the training set. The use of the second method shows in this case a good improvement of the F1-score, we are 4% over the best F1-Score obtained with the first method (experiment 7). We also observed that the addition of the AR Test in the training set do not bring the strong improvement that we could expect. Even, in the seventh experiment, we observed a decrease in the precision and the specificity as we had in the first round of experiments. This tends to say that the problem is not the second method but the use of the AR Test set as training material. This observation must be nuanced by the difference in size between the AR Test and LE Test sets. Nevertheless, following a discussion with the TUH Team, we learned that they have visually observed that LE files were much cleaner compared to AR files in terms of signal noise ratio and that it would make sense to get bad performances on AR compared to LE files. Moreover, previous studies discussed the importance of the montage and the reference point as it practically changes the nature of the waveforms even by using the conversion to the bipolar montage. Another explanation for the problem of adding AR Test in the training set could lie in the properties of the inter-ictal seizure signals. Fig. 6 shows a recording in the AR Test set. Our system, based on frames of 1 second, detects a seizure state from the beginning of the record. The TUSZ annotations indicate the beginning of the seizure at 7 seconds. The EEG patterns are the same but these events are more considered as inter-ictal anomalies, as their duration is under 10 seconds. We can imagine the problem of adding these "background" annotated segments in a training set. The intuition is that the accuracy would be penalized by increasing the false negative and false positive rates. On the other hand, it is also probably the case in the other subsets dedicated to the training set, the only difference is that the proportion of patients with epilepsy is higher in the AR Test. These observations also explain why performances are so low on the TUSZ data set. To summarize, this advocates for two things. Firstly, a better strategy at the level of the model for handling these inter-ictal anomalies. Secondly, the training data should be selected cautiously.
Figure 6. Repetitive inter-ictal generalized spike and wave discharges alternated with global EEG signal attenuation, followed by a generalized seizure (7 seconds from the start of the segment).
Comparatively to the performances reported in the TUH literature, these results are of the same order as the ones of their best model[9–10]. The sensitivity is higher, while the specificity is a bit lower. In terms of false alarm rate for the patients without epilepsy in the test set, 5 events in 3 recordings were detected on a total of 82 recordings in the second experiment; the false alarm rate was then estimated around 8 FA/24 hours. If all the recordings without seizure are considered, the false alarm rate rises to 18 FA/24 hours. This denotes a good capacity of the model to distinguish patients with epilepsy from the others. Furthermore, if we compute the scores by records and not by segments, the precision rises to 64.33%, which means that there's a majority of recordings correctly classified on all the recordings detected with a pathology of epilepsy. The other metrics are unchanged.
If we look at the sensitivities by type of seizure for the first round of experiments in Table 4, we can make the same observations in terms of improvement in function of the used training sets. Moreover, by making a proportional average of these scores with the amount of focal and generalized segments in Table 2, we can find back the sensitivities in Table 3. The main information that we can draw from this view is that the generalized seizures are detected more efficiently compared to focal seizures. This was expected; nevertheless, we have to point out that the proportion in the training set is in disfavor of the generalized seizures, this one oscillating between 25% (AR Train) and 40% (LE Train). The difficulty to detect focal seizures is due to the fact that they begin in one specific area of the brain with later a potential generalization, while the electrical discharge of the generalized seizures is directly widespread in both sides of the brain.
This advocates to analyze the seizure detection problem with a type-specific approach instead of a generic modeling; however, as argument for the generic approach, 11 patients on the 36 of the AR Test encountered both focal and generalized seizures in the same recordings or in several. Nevertheless, some strategies should be adopted to deal with the different locations of the onsets of the focal seizures. For instance, a new metric that takes into account these preoccupations, such as the importance of characterizing the beginning of a seizure, would be meaningful.
The idea of type-specific analysis was recalled by the study of the most important features and channels that were more decisive during the training to discriminate seizures from background. Indeed, the coverage of specific focal seizures in the training set may have an incidence on the importance of the channels. Among all the experiments and the training sets, a feature on two channels emerged, namely the power spectral density on the beta frequency band for the channels P4-O2 and P3-O1. These two particular channels were located symmetrically in the posterior region of the head between the parietal and occipital area. After discussion of our results with neurologists specialized in epilepsy from the University Hospital of Saint-Luc in Brussels, we hypothesized two possible explanations.
The first one refers indeed to the type of seizure from the location point-of-view. The locations of these features correspond to the place where the focal seizures of occipital origin start. Generalized seizures can also induce ictal changes in this area due to their widespread nature, they are nevertheless a minority in the training sets, and focal seizures are most frequently observed. Considering the general statistics of epilepsy among all populations, occipital seizures have been reported to constitute 8% of the population with epilepsy and are occasionally subject to a secondary generalization. The dominant type of the focal seizures is generally considered to be the Temporal Lobe Epilepsy (TLE) which represents more than 60% of all focal seizures; this type of seizure rarely spreads to the occipital lobes. Taken together, all these statistics do not directly explain the location and the importance of these channels. Moreover, neurologists reveal that a bias can exist in the statistics of hospitals specialized in epilepsy compared to the whole population with epilepsy, as patients with refractory epilepsy (pharmacoresistant epilepsy) are more likely to be redirected through these institutions. The Temple University Hospital is indeed accredited as a level 4 epilepsy center, which means that it has the professional expertise to provide the highest level medical and surgical treatment for patients with refractory epilepsy. To sort out this question, we need to determine the exact locations of all the included seizures from the data set. This information is not explicitly given in the TUSZ, but it could be retrieved by processing the .lbl files presented before, which can be the subject of future work.
The second potential explanation concerns the general physiological activity in the occipital lobe. When reviewing the EEG, neurologists first consider the occipital region; as a normal background activity is generally observed in these channels, with EEG signals less polluted by artefacts like muscle or eye movements, it helps to interpret the onset of an eventual seizure. If this empiric method reveals indeed visual discrimination of ictal and interictal phases, it is likely that the algorithm has found the same rule. Furthermore, a study from the TUH aims to measure the performance of their model if fewer channels are considered. The removed channels were chosen by using domain knowledge instead of using a proper automated and optimized selection process. If only two channels were to be preserved, one of them would be located in the occipital region and the other bound to the CZ electrode for its central location. The removal of the other channels is justified either by their susceptibility to noise or artefact interference, or by their less strategic location. So, to conclude this point, experts seem to be concerned by the activity in the posterior region of the head and our interpretable models suggest this location for the most important features. The validation of this relation will also be the subject of future work.
Below the power spectral density on the beta frequency band for the channels P4-O2 and P3-O1, we can find other significant combinations. As there were 22 features for 22 channels, instead of enumerating them, we did a global analysis presented in two figures. Fig. 7 shows the averaged channel importance over all the features and Fig. 8 shows the averaged feature importance of all the channels. For the channel importance, we find first the two already mentioned channels between the occipital and parietal area, then some others between the central and the parietal area or between the temporal and occipital area or purely temporal. Two things are interesting to note. Firstly it is not symmetric, as if the channels in the right hemisphere are more important; secondly, the less important channels are related to the frontal area, indicating that EEG is less sensitive to abnormalities in the frontal region. Indeed, the frontal lobe is a big lobe and many regions, especially deep mesial ones are not well captured by the EEG[27–28]. For the feature importance, the power spectral density on the beta frequency band confirms to be the first on average, closely followed and far above the others by the Petrosian Fractal Dimension. Various studies have shown the interest of using fractals for biosignals analysis and seizure detection[29–30]. For the remaining features, we note the good performance of the other power spectral densities in alpha, gamma, and theta frequency bands and of the variance.
In this paper, we analyzed the problem of seizure detection using machine learning techniques on the TUSZ with as main goal to understand whether increasing the amount of data increased performance of seizure detection. By using both a classic machine learning approach and a leave-one-out method on two subsets, we generated experiments based on training sets that have the characteristic of being large, heterogeneous and not commonly used in the current state-of-the-art. Using an efficient implementation of the gradient boosting algorithm called XGBoost, we showed the potential of improvement that can be done by adding more training data, even if this one should be selected cautiously. We presented results of the same order of magnitude than the corresponding references based on deep learning architectures. The performances by type of seizure were also analyzed and showed that generalized seizures are easier to recognize, even if they were under-represented in the training set. Finally, the interpretable property of the XGBoost algorithm showed that a feature linked to two specific EEG channels between the parietal and occipital area was statistically the most discriminant to distinguish a seizure from a non-seizure (considering segments of one second and a generic modeling done with the proportions by type of seizure proposed by the TUSZ).
In further works, we would like to extend this approach to the future releases of the TUSZ with more data. A larger data set means that the LOOCT method would not be necessary anymore. Based on the actual findings, we are planning to study an architecture that would be more able to discriminate ictal phase from inter-ictal, and also a type of seizure from another, either by using a multiclass classification approach by distinguishing focals and generalized seizures, either by leaving aside the generic approach or by focusing on type-dependent models. The use of interpretable models is still relevant and it would be interesting to see how the importance of the features evolves in function of the focal seizures and their onset locations. We are also interested to do some process mining based on the by-channel annotations by looking at the order of (dis) appearance of a seizure on particular channels in order to identify channels that are the most critical. One of the goals would be to converge to a system capable of early detection. Such a fast detection is critical for clinicians because it allows nurses to react as quickly as possible with the appropriate actions.
From a technical point of view, deep learning models could be tested again. Studying other sources of signals such as electrocardiograms (ECG) would also be interesting but once again, we suffer from a lack of data resources.
Epileptic seizure detection using EEG signals and extreme gradient boosting
- Received Date: 2019-01-31
- Accepted Date: 2019-06-03
- Rev Recd Date: 2019-04-04
- Available Online: 2019-08-30
- Publish Date: 2020-05-01
- epileptic seizure /
- electroencephalograms /
- Temple University Hospital EEG Seizure Corpus /
- machine learning /
Abstract: The problem of automated seizure detection is treated using clinical electroencephalograms (EEG) and machine learning algorithms on the Temple University Hospital EEG Seizure Corpus (TUSZ). Performances on this complex data set are still not encountering expectations. The purpose of this work is to determine to what extent the use of larger amount of data can help to improve the performances. Two methods are explored: a standard partitioning on a recent and larger version of the TUSZ, and a leave-one-out approach used to increase the amount of data for the training set. XGBoost, a fast implementation of the gradient boosting classifier, is the ideal algorithm for these tasks. The performances obtained are in the range of what is reported until now in the literature with deep learning models. We give interpretation to our results by identifying the most relevant features and analyzing performances by seizure types. We show that generalized seizures tend to be far better predicted than focal ones. We also notice that some EEG channels and features are more important than others to distinguish seizure from background.
|Citation:||Paul Vanabelle, Pierre De Handschutter, Riëm El Tahry, Mohammed Benjelloun, Mohamed Boukhebouze. Epileptic seizure detection using EEG signals and extreme gradient boosting[J]. The Journal of Biomedical Research, 2020, 34(3): 228-239. doi: 10.7555/JBR.33.20190016|