Contents lists available atScienceDirect
Biomedical Signal Processing and Control
j o u r n a l h o m e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c
Evaluation of convolutional neural networks using a large multi-subject P300 dataset
Lukáˇs Vaˇreka
NTIS–NewTechnologiesfortheInformationSociety,FacultyofAppliedSciences,UniversityofWestBohemia,Univerzitni8,30614Pilsen,CzechRepublic
a r t i c l e i n f o
Articlehistory:
Received6August2019
Receivedinrevisedform4December2019 Accepted31December2019
Keywords:
Convolutionalneuralnetworks Event-relatedpotentials P300
BCI LDA
Machinelearning
a b s t r a c t
Deepneuralnetworks(DNN)havebeenstudiedinvariousmachinelearningareas.Forexample,event- relatedpotential(ERP)signalclassificationisahighlycomplextaskpotentiallysuitableforDNNas signal-to-noiseratioislow,andunderlyingspatialandtemporalpatternsdisplayalargeintra-andinter- subjectvariability.Convolutionalneuralnetworks(CNN)havebeencomparedwithbaselinetraditional models,i.e.lineardiscriminantanalysis(LDA)andsupportvectormachines(SVM)forsingletrialclas- sificationusingalargemulti-subjectpubliclyavailableP300datasetofschool-agechildren(138males and112females).Forsingletrialclassification,classificationaccuracystayedbetween62%and64%for alltestedclassificationmodels.Whenapplyingthetrainedclassificationmodelstoaveragedtrials,accu- racyincreasedto76–79%withoutsignificantdifferencesamongclassificationmodels.CNNdidnotprove superiortobaselineforthetesteddataset.Comparisonwithrelatedliterature,limitationsandfuture directionsarediscussed.
©2020ElsevierLtd.Allrightsreserved.
1. Introduction
Inrecentyears,bothfundamentalandappliedresearchindeep learninghasrapidlydeveloped.Inimageprocessingandnatural languageprocessing,it hasledtosignificantly betterclassifica- tionratesthanpreviousstate-of-the-artalgorithms[7].Therefore, therehasbeena growinginterest inapplyingdeepneuralnet- works(DNNs)tovariousfieldsofappliedresearch.Suchaneffort canalsobeseeninelectroencephalographic(EEG)dataprocessing andclassification.Awell-knownapplicationofEEGclassification isa brain-computerinterface(BCI)[18]which allowsimmobile persons tooperate devices only by decoding their intentfrom EEGsignalwithoutanyneedfor muscleinvolvement. Asignifi- cantchallengeinBCIsystemsistorecognizetheintentionofthe usercorrectlysincethebraincomponentsofinterestoftenhavea significantlyloweramplitudethanrandomEEGsignal[18].
DNNsoftendonotrequirecostlyfeatureengineering,andthus couldleadtomoreuniversalandreliableEEGclassification.How- ever,recentreviewofthefieldreachedaconclusionthatsofar, thesebenefitshave not beenconvincinglypresented in thelit- erature[14].ManystudiesdidnotcomparethestudiedDNNto state-of-the-art BCImethods or performedbiasedcomparisons, witheithersuboptimalparametersforthestate-of-the-artcom-
E-mailaddress:lvareka@ntis.zcu.cz
petitorsorwithunjustifiedchoicesofparametersfortheDNN[14].
SimilarconclusionhasbeenreachedinanotherreviewofDNNand EEG[22].Manyrelated paperssufferfrompoorreproducibility:
a majorityofpapers wouldbehard orimpossibletoreproduce given the unavailability of theirdata and code [22]. Moreover, oneofthedrawbacksofDNNsishavingtocollectalargetraining dataset.TypicalBCIdatasetshaveverysmallnumbersoftraining examples,sinceBCIuserscannotbeaskedtoperformthousands ofmentaloperationsbeforeactuallyusingtheBCI.Toovercome thisproblem,ithasbeenproposedtoobtainBCIapplicationswith verylargetrainingdatabases,e.g.formulti-subjectclassification.
Multi-subjectclassificationhasonemoreadvantage—itsolvesthe problemofDNNlongtrainingtimes.Instead,auniversalBCIsys- temcanbetrainedonlyonceandthenjustappliedtoanewdataset fromanewuserwithoutanyadditionaltraining[14].
Guessthenumber(GTN)isasimpleP300event-relatedpoten- tial(ERP)BCIexperiment.Itsaimistoaskthemeasuredparticipant topicka numberbetween1 and9.Then, heor sheisexposed tocorrespondingvisualstimuli.TheP300waveformisexpected followingtheselected(target)number.Duringthemeasurement, experimenterstrytoguesstheselectednumberbasedonmanual evaluationofaverageERPsassociatedwitheachnumber.Finally, boththenumbersthoughtandguessesoftheexperimentersare recordedasmetadata.250school-agechildrenparticipatedinthe experimentsthatwerecarriedoutinelementaryandsecondary schools intheCzechRepublic.Onlythree EEGchannels(Fz,Cz,
https://doi.org/10.1016/j.bspc.2019.101837 1746-8094/©2020ElsevierLtd.Allrightsreserved.
Pz)wererecordedtodecreasepreparationtime.Nevertheless,to theauthor’sbestknowledge,thisisthelargestP300BCIdataset availablesofar[19].
Themainaimofthispaperistoevaluateoneofthedeeplearning models,convolutionalneuralnetworks(CNN)forclassificationof P300BCIdata.Unlikemostrelatedstudies,multi-subjectclassifica- tionwasperformedwiththefuturegoalofdevelopingauniversal BCI.Twostate-of-theartBCIclassifierswereusedasbaselineto minimizetherisk of biased comparison.To avoid overtraining, cross-validationandfinaltestingusingapreviouslyunusedpart ofthedatasetwereperformed.Anotheraimofthismanuscriptis toevaluatesomeCNNparametersinthisapplication.
1.1. State-of-the-art
AlthoughvariousBCIalgorithmshavebeenevaluatedandpub- lished in recent decades,there is still no featureextraction or machinelearningalgorithmclearlyestablishedasstate-of-the-art.
However,severalstudieshavefocusedonreviewsandcomparisons withpartlyconsistentresults.In[12],acomparisonofseveralclas- sifiers(Pearson’scorrelationmethod,Fisher’slineardiscriminant analysis(LDA),stepwiselineardiscriminantanalysis(SWLDA),lin- earsupport-vectormachine(SVM),andGaussiankernelsupport vectormachine(nSVM))wasperformedon8healthysubjects.It wasshownthatSWLDAandLDA achievedthebestoverallper- formance.AsoriginallyproposedbyBlankertzetal.[3]andalso confirmedinarecentreview[14],shrinkageLDAisanotheruseful toolforBCI,particularlywithsmalltrainingdatasets.In[16],the authorsdemonstratedthatLDAandBayesianlineardiscriminant analysis(BLDA)wereabletobeatotherclassificationalgorithms.
Effortstodevelopauniversalmulti-subjectP300BCImachine learninghave beenrelatively rarein theliterature. In [21],the authorsdeveloped a genericshrinkage LDA classifier using the trainingdataof18subjects.Theperformancewasevaluatedwith thedata of 7 subjects.It wasconcluded that generic classifier achievedcomparableresultsregardingtheeffectivenessandeffi- ciencyaspersonalizedclassifiers.
2. Methods
2.1. Dataacquisition
Thedatadescribedindetailandaccessiblein[19]wereusedin subsequentexperiments.Themeasurementsweretakenbetween 8amand3pm.Unfortunately,theenvironmentwasusuallyquite noisysincemanychildrenandalsomanyelectricaldeviceswere presentintheroomatthesametime.However,inanycasethere werenopeoplestandingormovingbehindthemonitororinthe closeproximityofthemeasuredparticipant.
Theparticipantswerestimulatedwithnumbersbetween1and9 flashingonthemonitorinrandomorder.Thenumberswerewhite ontheblackbackground.Theinter-stimulus intervalwassetto 1500ms.Thefollowinghardwaredeviceswereused:theBrainVi- sionstandardV-Ampamplifier,standardsmallormedium10/20 EEGcap,monitorforpresentingthenumbers,andtwonotebooks necessarytorunstimulationandrecordingsoftwareapplications.
Thereferenceelectrode wasplacedattherootof thenoseand the ground electrode was placed on the ear. To speed up the guessingtask,onlythree electrodes,Fz,Czand Pz,wereactive.
Thestimulationprotocolwasdeveloped andrunusingthePre- sentationsoftware tool produced by Neurobehavioral Systems, Inc. The BrainVision Recorder wasused for recording raw EEG data.
Theparticipantswereschool-agechildrenandteenagers(aged between7and17;averageage12.9),138malesand112females.All
Fig.1.Comparisonoftargetandnon-targetepochgrandaverages.Asexpected, thereisalargeP300componentfollowingthetargetstimuli.NotethattheP300 averagelatencyissomewhatdelayedcomparedtowhatiscommonlyreportedin theliterature[15].
participantsandtheirparentswereinformedabouttheprogramme ofthedayandtheexperimentscarriedout.Allparticipantstook partintheexperimentvoluntarily.Thegender,age,andlaterality oftheparticipantswerecollected.Nopersonalorsensitivedata wererecorded.
2.2. Preprocessingandfeatureextraction Thedatawerepreprocessedasfollows:
1.Fromeach participant ofthe experiments,shortparts ofthe signal(i.e.ERPtrials,epochs)associatedwithtwonumbersdis- playedwereextracted.Oneofthemwasthetarget(thought) number.Anotheronewasrandomlyselectednumberoutofthe remainingstimulibetween1and9.Consequently,similarnum- beroftrainingexamplesforbothclassificationclasses(target, non-target)wasextracted.Theextracted epochswerestored intoafile(availablein[20]).
2.For epoch extraction, intervals between 200ms prestimulus and1000mspoststimuluswereused.Theprestimulusinterval between−200and0mswasusedforbaselinecorrection,i.e.
computingaverageof thisperiodandsubtractingit fromthe data.Thusgiventhesamplingfrequencyof1kHz,11,532×3× 1200(numberofepochs×numberofEEGchannelsxnumberof samples)datamatrixwasproduced.
3.Toskipseverelydamagedepochs,especiallycausedbyeyeblinks orbadchannels,amplitudethresholdwassetto100Vaccord- ingtocommonguidelines(suchasin[15]).Anyepochx[c,t]with cbeingthechannelindexandttimewasrejectedif:
maxc,t |x[c,t]|>100 (1)
Withthisprocedure,30.3%ofepochswererejected.InFig.1, grandaveragesofacceptedepochs(acrossallparticipants)are depicted.
Featureextraction.ManydeeplearningmethodssuchasCNN aredesigned toavoid significantfeatureengineering [2,28]. On theotherhand,linearclassifiersusuallyperformbetterwhenthe dimensionalityoftheoriginaldatamatrixisreduced,andonlythe mostsignificantfeaturesareextracted[3].Intheparameteropti- mizationphase,state-of-the-artclassifierswereusedeitherwith originaldatadimension,orafterfeatureselectionproposedin[3]
tocomparetheperformance.Thefeatureextractionmethodwas
Fig.2. Flowchartofpreprocessing,featureextractionanddatasplittingapplied.
Fig.3.Architectureoftheconvolutionalneuralnetwork.Therewasoneconvolutionallayer,onedenselayer,andfinallyasoftmaxlayerforbinaryclassification(target/non- target).Batchnormalizationanddropoutfollowedboththeconvolutionalanddenselayers.
basedonaveragingtime intervalsofinterestandmergingthese averagesacrossallrelevantEEGchannelstogetreducedspatio- temporal feature vectors(Windowed meansfeature extraction, WM).InlinewithrecommendationsforP300BCIs,aprioritime windowwasinitiallysetbetween300and500msafterstimuli[25].
Thistimewindowwasfurtherdividedinto20equal-sizedtime intervalsinwhichamplitudeaverageswerecomputed.Therefore, withthreeEEGchannels,thedimensionalityoffeaturevectorswas reducedto60.Finally,thesefeaturevectorswerescaledtozero meanandunitvariance.
2.3. Classification
Fig.2depictsproceduresusedtoextractfeaturesandsplitthe dataforclassification.
Datasplitting.Beforeclassification,thedatawererandomlysplit intotraining(75%)andtesting(25%)sets.Usingthetrainingset,30 iterationsofMonte-Carlocross-validation(again75:25fromthe subset)wereperformedtooptimizeparameters.Resultsusingthe holdouttestingsetwerecomputedineachcross-validationiter- ation andaveraged attheend oftheprocessing. Noparameter decisionwasbasedontheholdoutset.
LDA.State-of-the-art[3] LDAwitheigenvaluedecomposition usedasthesolver,andautomaticshrinkageusingtheLedoit–Wolf lemma[13]wasapplied.
SVM.Theimplementationwasbasedonlibsvm[4].Bothrecom- mendationsintheliterature[8]andvalidationsubsetswereused tofindtheoptimalparameters.Finally,penaltyparameterCwas setto1,thekernelcachewas500MB,anddegreeofthepolyno- mialkernelfunctionwassetto3.One-vs-restdecisionfunctionof shapewiththeRBFkerneltypeandshrinkingheuristicswereused.
CNN. Convolutional neural networks were implemented in Keras[5].Theywereconfiguredtomaximizeclassificationper- formance onthevalidation subsets. Its structure is depictedin Fig.3.Initially,afterempiricalparametertuningbasedoncross- validation,theparameterswereselectedasfollows:
–Thefirstconvolutionallayershadsix3×3filters.Thefiltersize wassettocoverallthreeEEGchannels.Boththesecondfilter dimensionandnumberoffiltersweretunedexperimentally.
–Inbothcases,dropoutwassetto0.5.
–Theoutputoftheconvolutionallayerwasfurtherdownsampled byafactorof8usingtheaveragepoolinglayer.
–ELUactivationfunction[6]wasusedforbothconvolutionaland denselayersasrecommendedinrelatedliterature[23].Com- paredtosigmoidfunction,ELUmitigatesthevanishinggradient problemusingtheidentityforpositivevalues.Moreover,incon- trasttorectifiedlinearunits(ReLU),ELUshavenegativevalues whichallowthemtopushmeanunitactivationsclosertozero
Fig.4.DecreaseofclassificationlossbasedonthebaselineCNNarchitectureis shown.Althoughtraininglosskeptdecliningthroughoutall30epochs,validation lossreachedtheminimumafteronlyfiveepochs.Becausethepatienceparameter wassettofive,inthiscase,thetrainingwasstoppedafter10epochs.Asseenfrom thegrowingdifferencebetweentrainingandvalidationloss,furthertrainingwould leadtosubstantialovertraining.
whileensuringanoise-robustdeactivationstate[6].Theparam- eter˛>0wassetto1:
f(x)=
x if x>0˛(ex−1) if x≤0 –Batchsizewassetto16.
–Cross-entropywasusedasthelossfunction.
–Adam[11]optimizerwasusedfortraining becauseitiscom- putationallyefficient, has little memory requirementsand is frequentlyusedinthefield[22].
–Thenumberoftrainingepochswassetto30.
–Earlystoppingwiththepatienceparameterof5wasused.
3. Results
Asmentionedabove,cross-validationforhyperparameteresti- mation was followed by testing on a holdout set. Accuracy, precision,recallandAUC(areaundertheROCcurve)havebeen computed[10].Inthevalidationphase,theaimwastoreachthe configurationyielding thehighestaccuracy whileensuring it is notattheexpenseofprecisionandrecall.InFig.4,anexampleof searchingforanoptimalconfigurationofCNNweightsandbiases basedonthetrainingandvalidationsetsisshown.
3.1. Effectofparametermodificationsonvalidationperformance FeatureextractionforLDAandSVM.Parameteroptimizationof the classifiers themselves has been discussed above. Addition- ally,differentfeatureextractionsettingswerecomparedregarding theaverageclassificationresultsachievedduringcross-validation.
ResultsofthecomparisonsaredepictedinTable1.Accuracyhad anincreasingtrendwhenthetimewindowgotprolongedto800 and1000ms.Itcanbespeculatedthatthestandardapriopritime windowisnotenoughforcapturingtargettonon-targetdiffer- enceswhenclassifyingchildrendatathatdisplayalargevariety intheirP300components.Asexpected,classificationperformance withWMfeatureswasslightlyhigherthanforpreprocessedepochs withoutfeatureextraction.Basedontheresults,bothLDAandSVM configuredasdescribedabovewiththetimewindowbetween300 and1000mswereusedinthetestingphase.
CNN. The neural network architecture described above was usedasthestartingpoint.However,someparametermodifications wereexploredregardingtheireffectonthevalidationclassification
results.TheresultsareshowninTable2.Performancemostlydis- playedonlysmallandinsignificantchangeswiththeseparameter modifications.Consistentlywith[23],batchnormalizationledto slightlybetteraccuracy.Moreover,theabsenceofbatchnormaliza- tionmadetheresultslesspredictableandmorefluctuatingascan beseeninstandarddeviationofrecall.Anothercleardecreasein performancewasobservedwithoutdropoutregularization.Finally, averagepooling wasbetterthanmaxpooling forthevalidation data.Consequently,theinitialconfigurationdescribedinSection 2.3wasusedfortesting.
3.2. Testingresults
Based on the resultsin Section 3.1, both feature extraction methodfor LDAandSVM, andCNNconfigurationachievingthe bestaverageaccuracyduringcross-validationwereselectedforthe testingphase.Fig.5showstheachievedresults.Alltestedmod- elsachievedcomparableclassificationresults.LDAhadthehighest classificationrecall(around67%).Singletrialclassificationaccuracy stayedwithintherangebetween62%and64%.
Averagingofepochsassociatedwiththesamemarkersisastan- dardERPtechniqueforincreasingsignal-to-noiseratio[15].When averaging,repeatedERPsincludingtheP300areamplifiedwhile continuousrandomEEGnoiseissuppressed.BecauseeveninP300 BCIs, repeatedstimulationis usuallyusedto achievegood per- formance[17],itisworthexploringhowoncetrainedclassifiers cangeneralizetoaveragedepochs.Therefore,consecutivegroups ofonetosixneighboringepochsfromthetestingsetwereused insteadofsingletrials.Fig.6depictstheresultsachieved.Withaver- aging,classificationaccuracyincreasedfromoriginal61–64%upto 76–79%.Therewerenosignificantdifferencesamongclassifiers, althoughCNNdisplayedslightlyhigherstandarddeviations.
4. Discussion
Singletrialclassificationaccuracywasbetween62%and64%
foralltestedclassificationmodelswithoutsignificantdifferences.
Similarresultshave been commonly reportedin theliterature.
Forexample,in[9],65%singletrialaccuracywasachieved(using onetothreeEEGchannelsandpersonalizedtrainingdata).In[24], 40–66%classificationaccuracywasreported,highlydependenton thetestedsubject.Comparably,thismanuscriptachievedsimilar performanceforalargemulti-subjectdatasetofschool-agechil- dren.
Onsingletriallevel,CNNachievedcomparableperformanceto bothLDAandSVM.Similarperformancewasalsoachievedwhen applyingaveragedtestingepochs.However,CNNseemedslightly lessstableandmoredependentontraining/validationsplitascan beseeninstandarddeviations.
Consistentlywithrelateddeeplearningliterature[23],acombi- nationofELUs,dropoutandbatchednormalizationwerebeneficial forclassification performance.Unlike many imageclassification applications,averagepoolingwasbetterthanmaxpooling,per- hapsbecauseitisnotassociatedwithdataloss.Evenlessprominent featuresmaycontributetoclassifierdiscriminativeabilities.Tofur- therverifyhowtheCNNwasabletoclassifybetweentargetsand non-targets,thenetwork wasexposed toall target,or allnon- target patterns.Average hidden layeroutputs (the 4th average poolinglayerusedasanexample)acrosstheseconditionswere calculatedandshowninFig.7.Thereisacleardifferencebetween someCNNoutputsalthoughthemostremainstableacrossboth conditions.
Inourpreviouswork[27],weappliedstacked autoencoders (SAE)tothesameGTNdataset.Incontrastwiththecurrentwork, manualfeatureextractionusingdiscrete wavelettransformwas
Table1
Averagecross-validationclassificationresultsbasedonthefeatureextractionmethodwiththeLDAclassifierconfiguredasdescribedinSection2.3.Averagesfrom30 repetitionsandrelatedsamplestandarddeviations(inbrackets)arereported.WM–windowedmeans(timeintervalsrelativetostimulionsetsinsquarebrackets).
Featureextraction AUC Accuracy Precision Recall
WM[300–500ms] 59.56%(1.04) 59.54%(1.04) 59.48%(1.83) 61.69%(2.08)
WM[300–800ms] 60.94%(1.04) 60.93%(1.05) 60.75%(1.9) 63.38%(1.85)
WM[300–1000ms] 61.77%(0.9) 61.76%(0.91) 61.45%(1.9) 64.64%(1.48)
None 61.09%(1.13) 61.08%(1.13) 61.68%(1.67) 59.90%(1.35)
Theboldvaluesdenotetheconfigurationthatyieldedthehighestaccuracy.
Table2
Averagecross-validationclassificationresultsbasedontheCNNparametersettings.Averagesfrom30repetitionsandrelatedsamplestandarddeviations(inbrackets)are reported.CNNconfigurationdescribedinSection2.3wasusedasthebaselinemodel.
Changedparameter AUC Accuracy Precision Recall
None 66.12%(0.68) 62.18%(0.94) 62.76%(1.95) 61.34%(2.63)
RELUsinsteadofELUs 66.36%(0.62) 61.85%(1.15) 62.7%(2.19) 60.1%(3.04)
Filtersize(3,30) 65.84%(0.49) 61.95%(1.18) 62.7%(2.1) 60.5%(3.91)
12conv.filters 66.31%(0.51) 61.83%(1.1) 62.3%(2.21) 61.6%(3.08)
Nobatchnormalization 65.99%(0.77) 60.55%(1.52) 61.02%(3.16) 61.5%(7.21)
Dropout0.2 67.67%(0.65) 60.8%(1.49) 61.33%(2.31) 60.33%(4.0)
Nodropout 68.63%(1.11) 59.49%(1.2) 59.61%(1.93) 60.7%(4.44)
Dense(150) 66.07%(0.8) 61.81%(0.95) 62.33%(1.83) 61.18%(2.49)
Twodensel.(120-60) 65.72%(0.77) 62.11%(0.9) 63.14%(2.03) 59.5%(2.55)
Max-insteadofAvgPool 64.23%(1.15) 58.94%(1.94) 60.22%(4.18) 59.24%(13.76)
Theboldvaluesdenotetheconfigurationthatyieldedthehighestaccuracy.
Fig.5.Testingresultsforsingletrialclassification(errorbarsshowstandarddeviations).
Fig.6.Testingresultswhenaveragingneighboringepochs(errorbarsshowstandarddeviations).
performed. Instead of single trial classification, success rate of detectingthenumberthoughtbasedonmultiplesingletrialclassifi- cationresultswascomputed.Maximumsuccessrateonthetesting datasetwas79.4%forSAE,75.6%forLDAand73.7%forSVM.Itseems thatwhileSAEcombinedwithtraditionalfeatureengineeringand involvingmultipletrialspermarkercanoutperformlinearclassi- fiers,thesamebenefitscannotberepeatedwhenapplyingCNNto singletrialclassificationofrawEEGdata.
Computationalefficiencyisanotherimportantfactortoconsider whenapplyingthemethodsinonlineBCIsystems.Experimental comparisonwasperformedwithIntelCorei7-7700K,fourcores, 4.2GHz,64GBRAMandNVIDIAGeForceGTX1050TiGPU.CNN took46stotrainonCPUand26stotrainonGPU.BothLDAand SVMweremuchfastertotrain,with300and1600ms,respectively.
However,trainingtimeswerenotcriticalinthepresentedexperi- mentsinceanyuniversalclassifierneedstobetrainedjustonceand
Fig.7. Averageoutputsofthe4th(pooling)layeraredepictedaftertheCNN wasexposedtoalltarget/non-targetpatterns.X-axiscorrespondstoindicesof convolutionalfilters(sixintotal).Y-axisistheoutputofconvolutionoriginallycor- respondingtotimeinformation,afteraveragepoolingfurtherdownsampledbya factorof6.Thereisacleardifferenceinoutputs,mainlyinthebottompartofthe maps.However,manyoutputsseemindependentofclassificationlabels,poorly contributingtoCNNdiscriminationabilities.
notwitheverynewBCIuser.Testingtimeswerecalculatedrelative tooneprocessedfeaturevectorandwerelowenoughforallclas- sifiers(CNNtook0.3mstoclassifyonepatternonCPUand0.1ms onGPU,LDAtook0.1andSVM0.2ms).Itcanbeconcludedthatall testedalgorithmscanbeusedinonlineBCIs.Neuralnetworksare slowertotrainandthiscouldbeaproblemforpersonalizedBCIs, retrainedwitheachnewuser.
Thereareseverallimitationsofthereportedexperiments.As a noise suppressionprocedure, severelydamaged epochs (with amplitudeexceeding±100Vwhencomparedtobaseline)were rejectedbeforefurtherprocessing.Whileepochrejectionisbenefi- cialforclassificationaccuracy,ontheotherhand,itwouldalsolead tolowerbit-rateswhenusedinon-lineP300BCIsystemsbecauseof dataloss.ArtifactcorrectionmethodsbasedonIndependentCom- ponentAnalysiswerenotfeasiblebecauseofthelownumberofEEG channels(three).Moreover,thelownumberofEEGchannelscould haveadetrimentaleffectonclassificationperformancebecauseof limitedspatialinformationprovidedontheinput.Anotherpossi- blelimitationwasthattheremightbeanarchitectureofCNNthat wouldleadtobetterclassificationperformanceandhadnotbeen discoveredbytheauthor.However,severalmanipulationsofCNN parametersweretested usingcross-validation,includingadding anewdenselayer,withonlyverymodestchangesinvalidation classificationaccuracy.
Recent review of EEG and DNNs [22] studies reported the mediangaininaccuracyofDNNsovertraditionalbaselinestobe 5.4%.Italsorevealedsignificantchallengesinthefield.Lownumber oftrainingexamplesisacommoncomplaintespeciallyforevent- relateddatathatcontaintherelevantinformationintimedomain.
Inthiscase,onlyasmallfractionofcontinuousEEGmeasurement neartheonsetoftrialscanbeusedandstrategiessuchasoverlap- pingtimewindowstoobtainmoreexamplesinfrequencydomain arenotfeasible.In thecurrentstudy,11,532epochswereused whichisbelowmeannumberofexamples(251,532)andmedium numberofexamples(14,000)inthereviewedpapers[22].Strate- giessuchasdataaugmentationcanbeconsideredtoincreasethe numberoftrainingexamplestobesufficientforDNNs.Moreover, halfofthestudies[22]usedbetween8and62EEGchannels.Adding morechannelstoFz,CzandPzcouldincreasespatialresolution andaccuracy butwouldalsoincrease preparationtime andthe participant’sdiscomfort.Infuturework,moreontheeffectofnum- berof EEGchannelsontheP300classification accuracycanbe investigated.Furthermore,softorhardthresholdingbasedondis- cretewavelettransformcanbeconsideredfornoisecancellation [1].Anotherlineofresearchwouldbetoproposedifferentdeep learningmodelsfor thesameclassificationtask, withextensive parametergridsearch,orgeneticalgorithms.Basedontherecent reviewofthefield[29],frequentlycitedandpromisingnetworks include RecurrentNeural Networks,especially Longshort-term memory(LSTM).Moreover,aCNNlayertocapturespatialpatterns canbefollowedbyaLSTMlayerfortemporalfeatureextraction [29].
5. Conclusion
TheaimsofthepresentedexperimentsweretocompareCNN withbaselineclassifiers(LDA, SVM)usinga largemulti-subject P300 dataset. CNN was applied to raw ERP epochs (with the dimensionalityof3×1200).Baselineclassifierswereappliedto windowedmeansfeatures(withthedimensionalityof60).Empir- icalparameteroptimizationwasperformedusingcross-validation andclassifiersweretestedonaholdoutset.VariousCNNparame- tersarediscussed.Singletrialclassificationaccuracywasbetween 62%and64%foralltestedmodelswithCNNabletomatchbutnot outperformitscompetitors.Whenthetrainedmodelswereapplied toaveragedtrialsinthetestingphase, accuracyincreasedupto 76–79%.Achieved accuracy is comparable withstate-of-the-art despiteusinga multi-subjectdatasetfrom250children.Poten- tialexplanationoftheresultsarediscussed.Basedontheresults, LDAandSVMwithstate-of-the-artfeatureextractionstillseemto beagoodchoiceforP300classification,especiallywithrelatively smalltrainingdatasets.CNNmightneedmorespatialinformation inthedata(bymeansofmorechannels)tobetterunderstandthe patterns.Alternatively,thedatasetwasnotlargeenoughforCNN toproveitsbenefitsand,e.g.dataaugmentationtechniquescould helptoovercomethis obstacle.Boththepreprocesseddata[20]
andPythoncodes[26]areavailabletoensurereproducibilityofthe experiments.
Authors’contribution
LVdesignedandperformedthemachinelearningworkflow.LV wrotethemanuscript.
Acknowledgement
ThispublicationwassupportedbytheprojectLO1506ofthe CzechMinistryofEducation,YouthandSportsundertheprogram NPUI.
Conflictofinterest
Nonedeclared.
References
[1]M.Ahmadi,R.QuianQuiroga,Automaticdenoisingofsingle-trialevoked potentials,NeuroImage66(2013Feb)672–680.
[2]Y.Bengio,A.C.Courville,P.Vincent,UnsupervisedFeatureLearningandDeep Learning:AReviewandNewPerspectives,2012arXiv:1206.5538.
[3]B.Blankertz,S.Lemm,M.Treder,S.Haufe,K.Muller,Single-trialanalysisand classificationofERPcomponents-atutorial,NeuroImage56(2)(2011) 814–825,http://dx.doi.org/10.1016/j.neuroimage.2010.06.048.
[4]C.C.Chang,C.J.Lin,LIBSVM:alibraryforsupportvectormachines,ACMTrans.
Intell.Syst.Technol.2(2011),27:1–27:27,SoftwareAvailableathttp://www.
csie.ntu.edu.tw/cjlin/libsvm.
[5]F.Chollet,etal.,Keras,2015https://keras.io.
[6]D.A.Clevert,T.Unterthiner,S.Hochreiter,FastandAccurateDeepNetwork LearningbyExponentialLinearUnits(ELUs),2016arXiv:1511.07289.
[7]L.Deng,D.Yu,Deeplearning:Methodsandapplications,Found.Trends(r) SignalProcess.7(3-4)(2014)197–387,http://dx.doi.org/10.1561/
2000000039.
[8]R.E.Fan,K.W.Chang,C.J.Hsieh,X.R.Wang,C.J.Lin,Liblinear:alibraryforlarge linearclassification,J.Mach.Learn.Res.9(2008Jun)1871–1874http://dl.
acm.org/citation.cfm?id=1390681.1442794.
[9]N.Haghighatpanah,R.Amirfattahi,V.Abootalebi,B.Nazari,Asingle channel-singletrialP300detectionalgorithm.,201321stIranianConference onElectricalEngineering(ICEE)(2013)1–5,http://dx.doi.org/10.1109/
IranianCEE.2013.6599576,May.
[10]M.Hossin,M.Sulaiman,Areviewonevaluationmetricsfordataclassification evaluations,Int.J.DataMiningKnowl.Manag.Process5(2)(2015)1.
[11]D.P.Kingma,J.Ba,Adam:AMethodforStochasticOptimization,2014, Comment:PublishedasaConferencePaperatthe3rdInternational ConferenceforLearningRepresentations,SanDiego,2015arxiv:1412.6980.
[12]D.J.Krusienski,E.W.Sellers,F.Cabestaing,S.Bayoudh,D.J.McFarland,T.M.
Vaughan,J.R.Wolpaw,AcomparisonofclassificationtechniquesfortheP300 speller,J.NeuralEng.3(4)(2006)299–305,http://dx.doi.org/10.1088/1741- 2560/3/4/007.
[13]O.Ledoit,M.Wolf,Honey,ishrunkthesamplecovariancematrix,J.Portf.
Manag.30(4)(2004)110–119,http://dx.doi.org/10.3905/jpm.2004.110.
[14]F.Lotte,L.Bougrain,A.Cichocki,M.Clerc,M.Congedo,A.Rakotomamonjy,F.
Yger,AreviewofclassificationalgorithmsforEEG-basedbrain-computer interfaces:a10yearupdate,J.NeuralEng.15(3)(2018)031005,http://dx.
doi.org/10.1088/1741-2552/aab2f2.
[15]S.J.Luck,AnIntroductiontotheEvent-RelatedPotentialTechnique,MITPress, Cambridge,MA,2005.
[16]N.V.Manyakov,N.Chumerin,A.Combaz,M.M.VanHulle,Comparisonof classificationmethodsforP300brain-computerinterfaceondisabled subjects,Intell.Neurosci.(2011),http://dx.doi.org/10.1155/2011/519868, 2:1–2:12.
[17]D.J.McFarland,W.A.Sarnacki,G.Townsend,T.Vaughan,J.R.Wolpaw,The P300-basedbrain-computerinterface(BCI):effectsofstimulusrate,Clin.
Neurophysiol.122(4)(2011)731–737.
[18]D.J.McFarland,J.R.Wolpaw,Brain-computerinterfacesforcommunication andcontrol,Commun.ACM54(5)(2011)60–66,http://dx.doi.org/10.1145/
1941487.1941506,May.
[19]R.Mouˇcek,L.Vaˇreka,T.Prokop,J. ˇStˇebeták,P.Br ˚uha,Event-relatedpotential datafromaguessthenumberbrain-computerinterfaceexperimentonschool children,Sci.Data4(2017).
[20]R.Mouˇcek,L.Vaˇreka,T.Prokop,J. ˇStˇebeták,P.Br ˚uha,ReplicationDatafor:
EvaluationofConvolutionalNeuralNetworksUsingALargeMulti-Subject P300Dataset,2019,http://dx.doi.org/10.7910/DVN/G9RRLN.
[21]A.Pinegger,G.Müller-Putz,Notraining,sameperformance!?–ageneric P300classifierapproach,Proceedingsofthe7thInternationalBCIConference Graz2017(2017),http://dx.doi.org/10.3217/978-3-85125-533-1-77,9.
[22]Y.Roy,H.J.Banville,I.Albuquerque,A.Gramfort,T.H.Falk,J.Faubert,Deep Learning-BasedElectroencephalographyAnalysis:ASystematicReview,2019 arXiv:1901.05498.
[23]R.T.Schirrmeister,J.T.Springenberg,L.D.J.Fiederer,M.Glasstetter,K.
Eggensperger,M.Tangermann,F.Hutter,W.Burgard,T.Ball,DeepLearning WithConvolutionalNeuralNetworksforBrainMappingandDecodingof Movement-RelatedInformationfromtheHumanEEG,2017
arXiv:1703.05051.
[24]N.Sharma,Single-TrialP300ClassificationUsingPCAWithLDA,QDAand NeuralNetworks,2017arXiv:1712.01977.
[25]D.S.Tan,A.Nijholt,Brain-ComputerInterfaces:ApplyingOurMindsto Human-ComputerInteraction,1sted.,SpringerPublishingCompany, Incorporated,2010.
[26]L.Vaˇreka,CNNforGTN,2019https://bitbucket.org/lvareka/cnnforgtn/src/
master/.
[27]L.Vaˇreka,T.Prokop,P.Mautner,R.Mouˇcek,J. ˇStˇebeták,Applicationofstacked autoencoderstoP300experimentaldata,Proceedingsofthe16th
InternationalConferenceonArtificialIntelligenceandSoftComputing,ICAISC 2017(2017).
[28]M.D.Zeiler,R.Fergus,Visualizingandunderstandingconvolutionalnetworks, 2013arXiv:1311.2901.
[29]X.Zhang,L.Yao,X.Wang,J.Monaghan,D.McAlpine,ASurveyonDeep LearningBasedBrainComputerInterface:RecentAdvancesandNew Frontiers,2019arXiv:1905.04149.