dataset P300 Evaluation of neural convolutional networks using a largemulti-subject Biomedical Signal Processing and Control

(1)

Contents lists available atScienceDirect

Biomedical Signal Processing and Control

j o u r n a l h o m e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c

Evaluation of convolutional neural networks using a large multi-subject P300 dataset

Lukáˇs Vaˇreka

NTIS–NewTechnologiesfortheInformationSociety,FacultyofAppliedSciences,UniversityofWestBohemia,Univerzitni8,30614Pilsen,CzechRepublic

a r t i c l e i n f o

Articlehistory:

Received6August2019

Receivedinrevisedform4December2019 Accepted31December2019

Keywords:

Convolutionalneuralnetworks Event-relatedpotentials P300

BCI LDA

Machinelearning

a b s t r a c t

Deepneuralnetworks(DNN)havebeenstudiedinvariousmachinelearningareas.Forexample,event- relatedpotential(ERP)signalclassificationisahighlycomplextaskpotentiallysuitableforDNNas signal-to-noiseratioislow,andunderlyingspatialandtemporalpatternsdisplayalargeintra-andinter- subjectvariability.Convolutionalneuralnetworks(CNN)havebeencomparedwithbaselinetraditional models,i.e.lineardiscriminantanalysis(LDA)andsupportvectormachines(SVM)forsingletrialclas- sificationusingalargemulti-subjectpubliclyavailableP300datasetofschool-agechildren(138males and112females).Forsingletrialclassification,classificationaccuracystayedbetween62%and64%for alltestedclassificationmodels.Whenapplyingthetrainedclassificationmodelstoaveragedtrials,accu- racyincreasedto76–79%withoutsignificantdifferencesamongclassificationmodels.CNNdidnotprove superiortobaselineforthetesteddataset.Comparisonwithrelatedliterature,limitationsandfuture directionsarediscussed.

1. Introduction

Inrecentyears,bothfundamentalandappliedresearchindeep learninghasrapidlydeveloped.Inimageprocessingandnatural languageprocessing,it hasledtosignificantly betterclassifica- tionratesthanpreviousstate-of-the-artalgorithms[7].Therefore, therehasbeena growinginterest inapplyingdeepneuralnet- works(DNNs)tovariousfieldsofappliedresearch.Suchaneffort canalsobeseeninelectroencephalographic(EEG)dataprocessing andclassification.Awell-knownapplicationofEEGclassification isa brain-computerinterface(BCI)[18]which allowsimmobile persons tooperate devices only by decoding their intentfrom EEGsignalwithoutanyneedfor muscleinvolvement. Asignifi- cantchallengeinBCIsystemsistorecognizetheintentionofthe usercorrectlysincethebraincomponentsofinterestoftenhavea significantlyloweramplitudethanrandomEEGsignal[18].

DNNsoftendonotrequirecostlyfeatureengineering,andthus couldleadtomoreuniversalandreliableEEGclassification.How- ever,recentreviewofthefieldreachedaconclusionthatsofar, thesebenefitshave not beenconvincinglypresented in theliterature[14].ManystudiesdidnotcomparethestudiedDNNto state-of-the-art BCImethods or performedbiasedcomparisons, witheithersuboptimalparametersforthestate-of-the-artcom-

E-mailaddress:lvareka@ntis.zcu.cz

petitorsorwithunjustiﬁedchoicesofparametersfortheDNN[14].

SimilarconclusionhasbeenreachedinanotherreviewofDNNand EEG[22].Manyrelated paperssufferfrompoorreproducibility:

a majorityofpapers wouldbehard orimpossibletoreproduce given the unavailability of theirdata and code [22]. Moreover, oneofthedrawbacksofDNNsishavingtocollectalargetraining dataset.TypicalBCIdatasetshaveverysmallnumbersoftraining examples,sinceBCIuserscannotbeaskedtoperformthousands ofmentaloperationsbeforeactuallyusingtheBCI.Toovercome thisproblem,ithasbeenproposedtoobtainBCIapplicationswith verylargetrainingdatabases,e.g.formulti-subjectclassiﬁcation.

Multi-subjectclassiﬁcationhasonemoreadvantage—itsolvesthe problemofDNNlongtrainingtimes.Instead,auniversalBCIsys- temcanbetrainedonlyonceandthenjustappliedtoanewdataset fromanewuserwithoutanyadditionaltraining[14].

Guessthenumber(GTN)isasimpleP300event-relatedpotential(ERP)BCIexperiment.Itsaimistoaskthemeasuredparticipant topicka numberbetween1 and9.Then, heor sheisexposed tocorrespondingvisualstimuli.TheP300waveformisexpected followingtheselected(target)number.Duringthemeasurement, experimenterstrytoguesstheselectednumberbasedonmanual evaluationofaverageERPsassociatedwitheachnumber.Finally, boththenumbersthoughtandguessesoftheexperimentersare recordedasmetadata.250school-agechildrenparticipatedinthe experimentsthatwerecarriedoutinelementaryandsecondary schools intheCzechRepublic.Onlythree EEGchannels(Fz,Cz,

(2)

Pz)wererecordedtodecreasepreparationtime.Nevertheless,to theauthor’sbestknowledge,thisisthelargestP300BCIdataset availablesofar[19].

Themainaimofthispaperistoevaluateoneofthedeeplearning models,convolutionalneuralnetworks(CNN)forclassificationof P300BCIdata.Unlikemostrelatedstudies,multi-subjectclassifica- tionwasperformedwiththefuturegoalofdevelopingauniversal BCI.Twostate-of-theartBCIclassifierswereusedasbaselineto minimizetherisk of biased comparison.To avoid overtraining, cross-validationandfinaltestingusingapreviouslyunusedpart ofthedatasetwereperformed.Anotheraimofthismanuscriptis toevaluatesomeCNNparametersinthisapplication.

1.1. State-of-the-art

AlthoughvariousBCIalgorithmshavebeenevaluatedandpub- lished in recent decades,there is still no featureextraction or machinelearningalgorithmclearlyestablishedasstate-of-the-art.

However,severalstudieshavefocusedonreviewsandcomparisons withpartlyconsistentresults.In[12],acomparisonofseveralclas- sifiers(Pearson’scorrelationmethod,Fisher’slineardiscriminant analysis(LDA),stepwiselineardiscriminantanalysis(SWLDA),lin- earsupport-vectormachine(SVM),andGaussiankernelsupport vectormachine(nSVM))wasperformedon8healthysubjects.It wasshownthatSWLDAandLDA achievedthebestoverallper- formance.AsoriginallyproposedbyBlankertzetal.[3]andalso confirmedinarecentreview[14],shrinkageLDAisanotheruseful toolforBCI,particularlywithsmalltrainingdatasets.In[16],the authorsdemonstratedthatLDAandBayesianlineardiscriminant analysis(BLDA)wereabletobeatotherclassificationalgorithms.

Effortstodevelopauniversalmulti-subjectP300BCImachine learninghave beenrelatively rarein theliterature. In [21],the authorsdeveloped a genericshrinkage LDA classifier using the trainingdataof18subjects.Theperformancewasevaluatedwith thedata of 7 subjects.It wasconcluded that generic classifier achievedcomparableresultsregardingtheeffectivenessandeffi- ciencyaspersonalizedclassifiers.

2. Methods

2.1. Dataacquisition

Thedatadescribedindetailandaccessiblein[19]wereusedin subsequentexperiments.Themeasurementsweretakenbetween 8amand3pm.Unfortunately,theenvironmentwasusuallyquite noisysincemanychildrenandalsomanyelectricaldeviceswere presentintheroomatthesametime.However,inanycasethere werenopeoplestandingormovingbehindthemonitororinthe closeproximityofthemeasuredparticipant.

Theparticipantswerestimulatedwithnumbersbetween1and9 ﬂashingonthemonitorinrandomorder.Thenumberswerewhite ontheblackbackground.Theinter-stimulus intervalwassetto 1500ms.Thefollowinghardwaredeviceswereused:theBrainVi- sionstandardV-Ampampliﬁer,standardsmallormedium10/20 EEGcap,monitorforpresentingthenumbers,andtwonotebooks necessarytorunstimulationandrecordingsoftwareapplications.

Thereferenceelectrode wasplacedattherootof thenoseand the ground electrode was placed on the ear. To speed up the guessingtask,onlythree electrodes,Fz,Czand Pz,wereactive.

Thestimulationprotocolwasdeveloped andrunusingthePre- sentationsoftware tool produced by Neurobehavioral Systems, Inc. The BrainVision Recorder wasused for recording raw EEG data.

Theparticipantswereschool-agechildrenandteenagers(aged between7and17;averageage12.9),138malesand112females.All

Fig.1.Comparisonoftargetandnon-targetepochgrandaverages.Asexpected, thereisalargeP300componentfollowingthetargetstimuli.NotethattheP300 averagelatencyissomewhatdelayedcomparedtowhatiscommonlyreportedin theliterature[15].

participantsandtheirparentswereinformedabouttheprogramme ofthedayandtheexperimentscarriedout.Allparticipantstook partintheexperimentvoluntarily.Thegender,age,andlaterality oftheparticipantswerecollected.Nopersonalorsensitivedata wererecorded.

2.2. Preprocessingandfeatureextraction Thedatawerepreprocessedasfollows:

1.Fromeach participant ofthe experiments,shortparts ofthe signal(i.e.ERPtrials,epochs)associatedwithtwonumbersdis- playedwereextracted.Oneofthemwasthetarget(thought) number.Anotheronewasrandomlyselectednumberoutofthe remainingstimulibetween1and9.Consequently,similarnum- beroftrainingexamplesforbothclassiﬁcationclasses(target, non-target)wasextracted.Theextracted epochswerestored intoaﬁle(availablein[20]).

2.For epoch extraction, intervals between 200ms prestimulus and1000mspoststimuluswereused.Theprestimulusinterval between−200and0mswasusedforbaselinecorrection,i.e.

computingaverageof thisperiodandsubtractingit fromthe data.Thusgiventhesamplingfrequencyof1kHz,11,532×3× 1200(numberofepochs×numberofEEGchannelsxnumberof samples)datamatrixwasproduced.

3.Toskipseverelydamagedepochs,especiallycausedbyeyeblinks orbadchannels,amplitudethresholdwassetto100␮Vaccord- ingtocommonguidelines(suchasin[15]).Anyepochx[c,t]with cbeingthechannelindexandttimewasrejectedif:

maxc,t |x[c,t]|>100 (1)

Withthisprocedure,30.3%ofepochswererejected.InFig.1, grandaveragesofacceptedepochs(acrossallparticipants)are depicted.

Featureextraction.ManydeeplearningmethodssuchasCNN aredesigned toavoid significantfeatureengineering [2,28]. On theotherhand,linearclassifiersusuallyperformbetterwhenthe dimensionalityoftheoriginaldatamatrixisreduced,andonlythe mostsignificantfeaturesareextracted[3].Intheparameteropti- mizationphase,state-of-the-artclassifierswereusedeitherwith originaldatadimension,orafterfeatureselectionproposedin[3]

tocomparetheperformance.Thefeatureextractionmethodwas

(3)

Fig.2. Flowchartofpreprocessing,featureextractionanddatasplittingapplied.

Fig.3.Architectureoftheconvolutionalneuralnetwork.Therewasoneconvolutionallayer,onedenselayer,andﬁnallyasoftmaxlayerforbinaryclassiﬁcation(target/non- target).Batchnormalizationanddropoutfollowedboththeconvolutionalanddenselayers.

basedonaveragingtime intervalsofinterestandmergingthese averagesacrossallrelevantEEGchannelstogetreducedspatio- temporal feature vectors(Windowed meansfeature extraction, WM).InlinewithrecommendationsforP300BCIs,aprioritime windowwasinitiallysetbetween300and500msafterstimuli[25].

Thistimewindowwasfurtherdividedinto20equal-sizedtime intervalsinwhichamplitudeaverageswerecomputed.Therefore, withthreeEEGchannels,thedimensionalityoffeaturevectorswas reducedto60.Finally,thesefeaturevectorswerescaledtozero meanandunitvariance.

2.3. Classiﬁcation

Fig.2depictsproceduresusedtoextractfeaturesandsplitthe dataforclassiﬁcation.

Datasplitting.Beforeclassiﬁcation,thedatawererandomlysplit intotraining(75%)andtesting(25%)sets.Usingthetrainingset,30 iterationsofMonte-Carlocross-validation(again75:25fromthe subset)wereperformedtooptimizeparameters.Resultsusingthe holdouttestingsetwerecomputedineachcross-validationiter- ation andaveraged attheend oftheprocessing. Noparameter decisionwasbasedontheholdoutset.

LDA.State-of-the-art[3] LDAwitheigenvaluedecomposition usedasthesolver,andautomaticshrinkageusingtheLedoit–Wolf lemma[13]wasapplied.

SVM.Theimplementationwasbasedonlibsvm[4].Bothrecom- mendationsintheliterature[8]andvalidationsubsetswereused toﬁndtheoptimalparameters.Finally,penaltyparameterCwas setto1,thekernelcachewas500MB,anddegreeofthepolyno- mialkernelfunctionwassetto3.One-vs-restdecisionfunctionof shapewiththeRBFkerneltypeandshrinkingheuristicswereused.

CNN. Convolutional neural networks were implemented in Keras[5].Theywereconﬁguredtomaximizeclassiﬁcationper- formance onthevalidation subsets. Its structure is depictedin Fig.3.Initially,afterempiricalparametertuningbasedoncross- validation,theparameterswereselectedasfollows:

–Thefirstconvolutionallayershadsix3×3filters.Thefiltersize wassettocoverallthreeEEGchannels.Boththesecondfilter dimensionandnumberoffiltersweretunedexperimentally.

–Inbothcases,dropoutwassetto0.5.

–Theoutputoftheconvolutionallayerwasfurtherdownsampled byafactorof8usingtheaveragepoolinglayer.

–ELUactivationfunction[6]wasusedforbothconvolutionaland denselayersasrecommendedinrelatedliterature[23].Com- paredtosigmoidfunction,ELUmitigatesthevanishinggradient problemusingtheidentityforpositivevalues.Moreover,incon- trasttorectiﬁedlinearunits(ReLU),ELUshavenegativevalues whichallowthemtopushmeanunitactivationsclosertozero

(4)

Fig.4.DecreaseofclassificationlossbasedonthebaselineCNNarchitectureis shown.Althoughtraininglosskeptdecliningthroughoutall30epochs,validation lossreachedtheminimumafteronlyfiveepochs.Becausethepatienceparameter wassettofive,inthiscase,thetrainingwasstoppedafter10epochs.Asseenfrom thegrowingdifferencebetweentrainingandvalidationloss,furthertrainingwould leadtosubstantialovertraining.

whileensuringanoise-robustdeactivationstate[6].Theparam- eter˛>0wassetto1:

f(x)=

x if x>0

˛(e^x−1) if x≤0 –Batchsizewassetto16.

–Cross-entropywasusedasthelossfunction.

–Adam[11]optimizerwasusedfortraining becauseitiscom- putationallyefﬁcient, has little memory requirementsand is frequentlyusedintheﬁeld[22].

–Thenumberoftrainingepochswassetto30.

–Earlystoppingwiththepatienceparameterof5wasused.

3. Results

Asmentionedabove,cross-validationforhyperparameteresti- mation was followed by testing on a holdout set. Accuracy, precision,recallandAUC(areaundertheROCcurve)havebeen computed[10].Inthevalidationphase,theaimwastoreachthe conﬁgurationyielding thehighestaccuracy whileensuring it is notattheexpenseofprecisionandrecall.InFig.4,anexampleof searchingforanoptimalconﬁgurationofCNNweightsandbiases basedonthetrainingandvalidationsetsisshown.

3.1. Effectofparametermodificationsonvalidationperformance FeatureextractionforLDAandSVM.Parameteroptimizationof the classifiers themselves has been discussed above. Addition- ally,differentfeatureextractionsettingswerecomparedregarding theaverageclassificationresultsachievedduringcross-validation.

ResultsofthecomparisonsaredepictedinTable1.Accuracyhad anincreasingtrendwhenthetimewindowgotprolongedto800 and1000ms.Itcanbespeculatedthatthestandardapriopritime windowisnotenoughforcapturingtargettonon-targetdiffer- enceswhenclassifyingchildrendatathatdisplayalargevariety intheirP300components.Asexpected,classiﬁcationperformance withWMfeatureswasslightlyhigherthanforpreprocessedepochs withoutfeatureextraction.Basedontheresults,bothLDAandSVM conﬁguredasdescribedabovewiththetimewindowbetween300 and1000mswereusedinthetestingphase.

CNN. The neural network architecture described above was usedasthestartingpoint.However,someparametermodiﬁcations wereexploredregardingtheireffectonthevalidationclassiﬁcation

results.TheresultsareshowninTable2.Performancemostlydis- playedonlysmallandinsignificantchangeswiththeseparameter modifications.Consistentlywith[23],batchnormalizationledto slightlybetteraccuracy.Moreover,theabsenceofbatchnormaliza- tionmadetheresultslesspredictableandmorefluctuatingascan beseeninstandarddeviationofrecall.Anothercleardecreasein performancewasobservedwithoutdropoutregularization.Finally, averagepooling wasbetterthanmaxpooling forthevalidation data.Consequently,theinitialconfigurationdescribedinSection 2.3wasusedfortesting.

3.2. Testingresults

Based on the resultsin Section 3.1, both feature extraction methodfor LDAandSVM, andCNNconfigurationachievingthe bestaverageaccuracyduringcross-validationwereselectedforthe testingphase.Fig.5showstheachievedresults.Alltestedmod- elsachievedcomparableclassificationresults.LDAhadthehighest classificationrecall(around67%).Singletrialclassificationaccuracy stayedwithintherangebetween62%and64%.

Averagingofepochsassociatedwiththesamemarkersisastan- dardERPtechniqueforincreasingsignal-to-noiseratio[15].When averaging,repeatedERPsincludingtheP300areamplifiedwhile continuousrandomEEGnoiseissuppressed.BecauseeveninP300 BCIs, repeatedstimulationis usuallyusedto achievegood performance[17],itisworthexploringhowoncetrainedclassifiers cangeneralizetoaveragedepochs.Therefore,consecutivegroups ofonetosixneighboringepochsfromthetestingsetwereused insteadofsingletrials.Fig.6depictstheresultsachieved.Withaver- aging,classificationaccuracyincreasedfromoriginal61–64%upto 76–79%.Therewerenosignificantdifferencesamongclassifiers, althoughCNNdisplayedslightlyhigherstandarddeviations.

4. Discussion

Singletrialclassiﬁcationaccuracywasbetween62%and64%

foralltestedclassiﬁcationmodelswithoutsigniﬁcantdifferences.

Similarresultshave been commonly reportedin theliterature.

Forexample,in[9],65%singletrialaccuracywasachieved(using onetothreeEEGchannelsandpersonalizedtrainingdata).In[24], 40–66%classiﬁcationaccuracywasreported,highlydependenton thetestedsubject.Comparably,thismanuscriptachievedsimilar performanceforalargemulti-subjectdatasetofschool-agechildren.

Onsingletriallevel,CNNachievedcomparableperformanceto bothLDAandSVM.Similarperformancewasalsoachievedwhen applyingaveragedtestingepochs.However,CNNseemedslightly lessstableandmoredependentontraining/validationsplitascan beseeninstandarddeviations.

Consistentlywithrelateddeeplearningliterature[23],acombi- nationofELUs,dropoutandbatchednormalizationwerebeneficial forclassification performance.Unlike many imageclassification applications,averagepoolingwasbetterthanmaxpooling,per- hapsbecauseitisnotassociatedwithdataloss.Evenlessprominent featuresmaycontributetoclassifierdiscriminativeabilities.Tofur- therverifyhowtheCNNwasabletoclassifybetweentargetsand non-targets,thenetwork wasexposed toall target,or allnon- target patterns.Average hidden layeroutputs (the 4th average poolinglayerusedasanexample)acrosstheseconditionswere calculatedandshowninFig.7.Thereisacleardifferencebetween someCNNoutputsalthoughthemostremainstableacrossboth conditions.

Inourpreviouswork[27],weappliedstacked autoencoders (SAE)tothesameGTNdataset.Incontrastwiththecurrentwork, manualfeatureextractionusingdiscrete wavelettransformwas

(5)

Table1

Averagecross-validationclassificationresultsbasedonthefeatureextractionmethodwiththeLDAclassifierconfiguredasdescribedinSection2.3.Averagesfrom30 repetitionsandrelatedsamplestandarddeviations(inbrackets)arereported.WM–windowedmeans(timeintervalsrelativetostimulionsetsinsquarebrackets).

Featureextraction AUC Accuracy Precision Recall

WM[300–500ms] 59.56%(1.04) 59.54%(1.04) 59.48%(1.83) 61.69%(2.08)

WM[300–800ms] 60.94%(1.04) 60.93%(1.05) 60.75%(1.9) 63.38%(1.85)

WM[300–1000ms] 61.77%(0.9) 61.76%(0.91) 61.45%(1.9) 64.64%(1.48)

None 61.09%(1.13) 61.08%(1.13) 61.68%(1.67) 59.90%(1.35)

Theboldvaluesdenotetheconﬁgurationthatyieldedthehighestaccuracy.

Table2

Averagecross-validationclassiﬁcationresultsbasedontheCNNparametersettings.Averagesfrom30repetitionsandrelatedsamplestandarddeviations(inbrackets)are reported.CNNconﬁgurationdescribedinSection2.3wasusedasthebaselinemodel.

Changedparameter AUC Accuracy Precision Recall

None 66.12%(0.68) 62.18%(0.94) 62.76%(1.95) 61.34%(2.63)

RELUsinsteadofELUs 66.36%(0.62) 61.85%(1.15) 62.7%(2.19) 60.1%(3.04)

Filtersize(3,30) 65.84%(0.49) 61.95%(1.18) 62.7%(2.1) 60.5%(3.91)

12conv.ﬁlters 66.31%(0.51) 61.83%(1.1) 62.3%(2.21) 61.6%(3.08)

Nobatchnormalization 65.99%(0.77) 60.55%(1.52) 61.02%(3.16) 61.5%(7.21)

Dropout0.2 67.67%(0.65) 60.8%(1.49) 61.33%(2.31) 60.33%(4.0)

Nodropout 68.63%(1.11) 59.49%(1.2) 59.61%(1.93) 60.7%(4.44)

Dense(150) 66.07%(0.8) 61.81%(0.95) 62.33%(1.83) 61.18%(2.49)

Twodensel.(120-60) 65.72%(0.77) 62.11%(0.9) 63.14%(2.03) 59.5%(2.55)

Max-insteadofAvgPool 64.23%(1.15) 58.94%(1.94) 60.22%(4.18) 59.24%(13.76)

Theboldvaluesdenotetheconﬁgurationthatyieldedthehighestaccuracy.

Fig.5.Testingresultsforsingletrialclassiﬁcation(errorbarsshowstandarddeviations).

Fig.6.Testingresultswhenaveragingneighboringepochs(errorbarsshowstandarddeviations).

performed. Instead of single trial classification, success rate of detectingthenumberthoughtbasedonmultiplesingletrialclassifi- cationresultswascomputed.Maximumsuccessrateonthetesting datasetwas79.4%forSAE,75.6%forLDAand73.7%forSVM.Itseems thatwhileSAEcombinedwithtraditionalfeatureengineeringand involvingmultipletrialspermarkercanoutperformlinearclassi- fiers,thesamebenefitscannotberepeatedwhenapplyingCNNto singletrialclassificationofrawEEGdata.

Computationalefﬁciencyisanotherimportantfactortoconsider whenapplyingthemethodsinonlineBCIsystems.Experimental comparisonwasperformedwithIntelCorei7-7700K,fourcores, 4.2GHz,64GBRAMandNVIDIAGeForceGTX1050TiGPU.CNN took46stotrainonCPUand26stotrainonGPU.BothLDAand SVMweremuchfastertotrain,with300and1600ms,respectively.

However,trainingtimeswerenotcriticalinthepresentedexperi- mentsinceanyuniversalclassiﬁerneedstobetrainedjustonceand

(6)

Fig.7. Averageoutputsofthe4th(pooling)layeraredepictedaftertheCNN wasexposedtoalltarget/non-targetpatterns.X-axiscorrespondstoindicesof convolutionalﬁlters(sixintotal).Y-axisistheoutputofconvolutionoriginallycor- respondingtotimeinformation,afteraveragepoolingfurtherdownsampledbya factorof6.Thereisacleardifferenceinoutputs,mainlyinthebottompartofthe maps.However,manyoutputsseemindependentofclassiﬁcationlabels,poorly contributingtoCNNdiscriminationabilities.

notwitheverynewBCIuser.Testingtimeswerecalculatedrelative tooneprocessedfeaturevectorandwerelowenoughforallclas- siﬁers(CNNtook0.3mstoclassifyonepatternonCPUand0.1ms onGPU,LDAtook0.1andSVM0.2ms).Itcanbeconcludedthatall testedalgorithmscanbeusedinonlineBCIs.Neuralnetworksare slowertotrainandthiscouldbeaproblemforpersonalizedBCIs, retrainedwitheachnewuser.

Thereareseverallimitationsofthereportedexperiments.As a noise suppressionprocedure, severelydamaged epochs (with amplitudeexceeding±100␮Vwhencomparedtobaseline)were rejectedbeforefurtherprocessing.Whileepochrejectionisbenefi- cialforclassificationaccuracy,ontheotherhand,itwouldalsolead tolowerbit-rateswhenusedinon-lineP300BCIsystemsbecauseof dataloss.ArtifactcorrectionmethodsbasedonIndependentCom- ponentAnalysiswerenotfeasiblebecauseofthelownumberofEEG channels(three).Moreover,thelownumberofEEGchannelscould haveadetrimentaleffectonclassificationperformancebecauseof limitedspatialinformationprovidedontheinput.Anotherpossi- blelimitationwasthattheremightbeanarchitectureofCNNthat wouldleadtobetterclassificationperformanceandhadnotbeen discoveredbytheauthor.However,severalmanipulationsofCNN parametersweretested usingcross-validation,includingadding anewdenselayer,withonlyverymodestchangesinvalidation classificationaccuracy.

Recent review of EEG and DNNs [22] studies reported the mediangaininaccuracyofDNNsovertraditionalbaselinestobe 5.4%.Italsorevealedsigniﬁcantchallengesintheﬁeld.Lownumber oftrainingexamplesisacommoncomplaintespeciallyforevent- relateddatathatcontaintherelevantinformationintimedomain.

Inthiscase,onlyasmallfractionofcontinuousEEGmeasurement neartheonsetoftrialscanbeusedandstrategiessuchasoverlap- pingtimewindowstoobtainmoreexamplesinfrequencydomain arenotfeasible.In thecurrentstudy,11,532epochswereused whichisbelowmeannumberofexamples(251,532)andmedium numberofexamples(14,000)inthereviewedpapers[22].Strate- giessuchasdataaugmentationcanbeconsideredtoincreasethe numberoftrainingexamplestobesufficientforDNNs.Moreover, halfofthestudies[22]usedbetween8and62EEGchannels.Adding morechannelstoFz,CzandPzcouldincreasespatialresolution andaccuracy butwouldalsoincrease preparationtime andthe participant’sdiscomfort.Infuturework,moreontheeffectofnum- berof EEGchannelsontheP300classification accuracycanbe investigated.Furthermore,softorhardthresholdingbasedondis- cretewavelettransformcanbeconsideredfornoisecancellation [1].Anotherlineofresearchwouldbetoproposedifferentdeep learningmodelsfor thesameclassificationtask, withextensive parametergridsearch,orgeneticalgorithms.Basedontherecent reviewofthefield[29],frequentlycitedandpromisingnetworks include RecurrentNeural Networks,especially Longshort-term memory(LSTM).Moreover,aCNNlayertocapturespatialpatterns canbefollowedbyaLSTMlayerfortemporalfeatureextraction [29].

5. Conclusion

TheaimsofthepresentedexperimentsweretocompareCNN withbaselineclassifiers(LDA, SVM)usinga largemulti-subject P300 dataset. CNN was applied to raw ERP epochs (with the dimensionalityof3×1200).Baselineclassifierswereappliedto windowedmeansfeatures(withthedimensionalityof60).Empir- icalparameteroptimizationwasperformedusingcross-validation andclassifiersweretestedonaholdoutset.VariousCNNparame- tersarediscussed.Singletrialclassificationaccuracywasbetween 62%and64%foralltestedmodelswithCNNabletomatchbutnot outperformitscompetitors.Whenthetrainedmodelswereapplied toaveragedtrialsinthetestingphase, accuracyincreasedupto 76–79%.Achieved accuracy is comparable withstate-of-the-art despiteusinga multi-subjectdatasetfrom250children.Poten- tialexplanationoftheresultsarediscussed.Basedontheresults, LDAandSVMwithstate-of-the-artfeatureextractionstillseemto beagoodchoiceforP300classification,especiallywithrelatively smalltrainingdatasets.CNNmightneedmorespatialinformation inthedata(bymeansofmorechannels)tobetterunderstandthe patterns.Alternatively,thedatasetwasnotlargeenoughforCNN toproveitsbenefitsand,e.g.dataaugmentationtechniquescould helptoovercomethis obstacle.Boththepreprocesseddata[20]

andPythoncodes[26]areavailabletoensurereproducibilityofthe experiments.

Authors’contribution

LVdesignedandperformedthemachinelearningworkﬂow.LV wrotethemanuscript.

Acknowledgement

ThispublicationwassupportedbytheprojectLO1506ofthe CzechMinistryofEducation,YouthandSportsundertheprogram NPUI.

(7)

Conﬂictofinterest

Nonedeclared.

References

[1]M.Ahmadi,R.QuianQuiroga,Automaticdenoisingofsingle-trialevoked potentials,NeuroImage66(2013Feb)672–680.

[2]Y.Bengio,A.C.Courville,P.Vincent,UnsupervisedFeatureLearningandDeep Learning:AReviewandNewPerspectives,2012arXiv:1206.5538.

[3]B.Blankertz,S.Lemm,M.Treder,S.Haufe,K.Muller,Single-trialanalysisand classiﬁcationofERPcomponents-atutorial,NeuroImage56(2)(2011) 814–825,http://dx.doi.org/10.1016/j.neuroimage.2010.06.048.

[4]C.C.Chang,C.J.Lin,LIBSVM:alibraryforsupportvectormachines,ACMTrans.

Intell.Syst.Technol.2(2011),27:1–27:27,SoftwareAvailableathttp://www.

csie.ntu.edu.tw/cjlin/libsvm.

[5]F.Chollet,etal.,Keras,2015https://keras.io.

[6]D.A.Clevert,T.Unterthiner,S.Hochreiter,FastandAccurateDeepNetwork LearningbyExponentialLinearUnits(ELUs),2016arXiv:1511.07289.

[7]L.Deng,D.Yu,Deeplearning:Methodsandapplications,Found.Trends(r) SignalProcess.7(3-4)(2014)197–387,http://dx.doi.org/10.1561/

2000000039.

[8]R.E.Fan,K.W.Chang,C.J.Hsieh,X.R.Wang,C.J.Lin,Liblinear:alibraryforlarge linearclassiﬁcation,J.Mach.Learn.Res.9(2008Jun)1871–1874http://dl.

acm.org/citation.cfm?id=1390681.1442794.

[9]N.Haghighatpanah,R.Amirfattahi,V.Abootalebi,B.Nazari,Asingle channel-singletrialP300detectionalgorithm.,201321stIranianConference onElectricalEngineering(ICEE)(2013)1–5,http://dx.doi.org/10.1109/

IranianCEE.2013.6599576,May.

[10]M.Hossin,M.Sulaiman,Areviewonevaluationmetricsfordataclassiﬁcation evaluations,Int.J.DataMiningKnowl.Manag.Process5(2)(2015)1.

[11]D.P.Kingma,J.Ba,Adam:AMethodforStochasticOptimization,2014, Comment:PublishedasaConferencePaperatthe3rdInternational ConferenceforLearningRepresentations,SanDiego,2015arxiv:1412.6980.

[12]D.J.Krusienski,E.W.Sellers,F.Cabestaing,S.Bayoudh,D.J.McFarland,T.M.

Vaughan,J.R.Wolpaw,AcomparisonofclassiﬁcationtechniquesfortheP300 speller,J.NeuralEng.3(4)(2006)299–305,http://dx.doi.org/10.1088/1741- 2560/3/4/007.

[13]O.Ledoit,M.Wolf,Honey,ishrunkthesamplecovariancematrix,J.Portf.

Manag.30(4)(2004)110–119,http://dx.doi.org/10.3905/jpm.2004.110.

[14]F.Lotte,L.Bougrain,A.Cichocki,M.Clerc,M.Congedo,A.Rakotomamonjy,F.

Yger,AreviewofclassiﬁcationalgorithmsforEEG-basedbrain-computer interfaces:a10yearupdate,J.NeuralEng.15(3)(2018)031005,http://dx.

doi.org/10.1088/1741-2552/aab2f2.

[15]S.J.Luck,AnIntroductiontotheEvent-RelatedPotentialTechnique,MITPress, Cambridge,MA,2005.

[16]N.V.Manyakov,N.Chumerin,A.Combaz,M.M.VanHulle,Comparisonof classiﬁcationmethodsforP300brain-computerinterfaceondisabled subjects,Intell.Neurosci.(2011),http://dx.doi.org/10.1155/2011/519868, 2:1–2:12.

[17]D.J.McFarland,W.A.Sarnacki,G.Townsend,T.Vaughan,J.R.Wolpaw,The P300-basedbrain-computerinterface(BCI):effectsofstimulusrate,Clin.

Neurophysiol.122(4)(2011)731–737.

[18]D.J.McFarland,J.R.Wolpaw,Brain-computerinterfacesforcommunication andcontrol,Commun.ACM54(5)(2011)60–66,http://dx.doi.org/10.1145/

1941487.1941506,May.

[19]R.Mouˇcek,L.Vaˇreka,T.Prokop,J. ˇStˇebeták,P.Br ˚uha,Event-relatedpotential datafromaguessthenumberbrain-computerinterfaceexperimentonschool children,Sci.Data4(2017).

[20]R.Mouˇcek,L.Vaˇreka,T.Prokop,J. ˇStˇebeták,P.Br ˚uha,ReplicationDatafor:

EvaluationofConvolutionalNeuralNetworksUsingALargeMulti-Subject P300Dataset,2019,http://dx.doi.org/10.7910/DVN/G9RRLN.

[21]A.Pinegger,G.Müller-Putz,Notraining,sameperformance!?–ageneric P300classiﬁerapproach,Proceedingsofthe7thInternationalBCIConference Graz2017(2017),http://dx.doi.org/10.3217/978-3-85125-533-1-77,9.

[22]Y.Roy,H.J.Banville,I.Albuquerque,A.Gramfort,T.H.Falk,J.Faubert,Deep Learning-BasedElectroencephalographyAnalysis:ASystematicReview,2019 arXiv:1901.05498.

[23]R.T.Schirrmeister,J.T.Springenberg,L.D.J.Fiederer,M.Glasstetter,K.

Eggensperger,M.Tangermann,F.Hutter,W.Burgard,T.Ball,DeepLearning WithConvolutionalNeuralNetworksforBrainMappingandDecodingof Movement-RelatedInformationfromtheHumanEEG,2017

arXiv:1703.05051.

[24]N.Sharma,Single-TrialP300ClassiﬁcationUsingPCAWithLDA,QDAand NeuralNetworks,2017arXiv:1712.01977.

[25]D.S.Tan,A.Nijholt,Brain-ComputerInterfaces:ApplyingOurMindsto Human-ComputerInteraction,1sted.,SpringerPublishingCompany, Incorporated,2010.

[26]L.Vaˇreka,CNNforGTN,2019https://bitbucket.org/lvareka/cnnforgtn/src/

master/.

[27]L.Vaˇreka,T.Prokop,P.Mautner,R.Mouˇcek,J. ˇStˇebeták,Applicationofstacked autoencoderstoP300experimentaldata,Proceedingsofthe16th

InternationalConferenceonArtiﬁcialIntelligenceandSoftComputing,ICAISC 2017(2017).

[28]M.D.Zeiler,R.Fergus,Visualizingandunderstandingconvolutionalnetworks, 2013arXiv:1311.2901.

[29]X.Zhang,L.Yao,X.Wang,J.Monaghan,D.McAlpine,ASurveyonDeep LearningBasedBrainComputerInterface:RecentAdvancesandNew Frontiers,2019arXiv:1905.04149.