




版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1arXivv[cs.LG]31DecarXivv[cs.LG]31Dec2022Zhu1*andWanleiZhou21*SchoolofComputerScience,UniversityofTechnologySydney,Broadway,Sydney,2007,NSW,Australia.2*SchoolofDataScience,CityUniversityofMacau,Macau,China.*Correspondingauthor(s).E-mail(s):Tianqing.Zhu@.au;Contributingauthors:Yunjiao.Lei@.au;Yuleisuiutseduauwlzhoucityuedu.mo;AbstractReinforcementlearning(RL)isoneofthemostimportantbranchesofAI.Duetoitscapacityforself-adaptionanddecision-makingindynamicenvironments,reinforcementlearninghasbeenwidelyappliedinmultipleareas,suchashealthcare,datamarkets,autonomousdriv-ing,androbotics.However,someoftheseapplicationsandsystemshavebeenshowntobevulnerabletosecurityorprivacyattacks,resultinginunreliableorunstableservices.Alargenumberofstudieshavefocusedonthesesecurityandprivacyproblemsinreinforcementlearning.However,fewsurveyshaveprovidedasystematicreviewandcomparisonofexistingproblemsandstate-of-the-artsolutionstokeepupwiththepaceofemergingthreats.Accordingly,wehereinpresentsuchacomprehensivereviewtoexplainandsummarizethechallengesassociatedwithsecurityandprivacyinreinforcementlearningfromanewperspective,namelythatoftheMarkovDecisionProcess(MDP).Inthissurvey,we?rstintroducethekeyconceptsrelatedtothisarea.Next,wecoverthesecurityandprivacyissueslinkedtothestate,action,environment,andrewardfunctionoftheMDPprocess,respectively.Wefurtherhighlightthespecialcharacteristicsofsecurityandprivacymethodologiesrelatedtoreinforcementlearning.Finally,wediscussthepossiblefutureresearchdirectionswithinthisarea.SpringerNature2021LATEXtemplate2NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyKeywords:ReinforcementLearning,Security,PrivacyPreservation,MarkovDecisionProcess,Multi-agentSystem1IntroductionReinforcementlearning(RL)isoneofthemostimportantbranchesofAI.Duetoitsstrongcapacityforself-adaptation,reinforcementlearninghasbeenwidelyappliedinmultipleareas,includinghealthcare[1],?nancialmar-kets[2],mobileedgecomputing(MEC)[3,4]androbotics[5].Reinforcementlearningisconsideredtobeaformofadaptive(orapproximate)dynamicpro-gramming[6]andhasachievedoutstandingperformanceinsolvingcomplexsequentialdecision-makingproblems.Reinforcementlearning’sstrongperfor-mancehasledtoitsimplementationanddeploymentacrossabroadrangeof?eldsinrecentyears,suchastheInternetofthings(IoT)[7],recommendsys-tems[8],healthcare[9],robotics[10],?nance[11],self-drivingcars[12],andsmartgrids[13],andsoon.Unlikeothermachinelearningtechniques,Rein-forcementlearninghasastrongabilitytolearnbytrialanderrorindynamicandcomplexenvironments.Inparticular,itcanlearnfromtheenvironmentwhichhasminimuminformationabouttheparameterstobelearned[14],andcanasamethodtoaddressoptimalproblems[15,16].Inthereinforcementlearningcontext,anagentcanbeviewedasaself-contained,concurrentlyexecutingthreadofcontrol[17].Itcaninteractwiththeenvironmentandobtainastateoftheenvironmentasinput.Thestateoftheenvironmentcanbethesituationsurroundingtheagent’slocation.Taketheroadconditionsinanautonomousdrivingscenarioasanexample.In?gure1,thegreenvehicleisanagent,andalltheobjectsarounditcanberegardedastheenvironment;thus,theenvironmentcomprisestheroad,thetra?csigns,othercars,etc.Basedonthestateoftheenvironment,theagentchoosesanactionasoutput.Next,theactionchangesthestateoftheenvironment,andtheagentwillreceiveascalarsignalthatcanberegardedasanindicatorofthevalueforthestatetransitionfromtheenvironment.Thisscalarsignalisalwaysrepresentedasareward.Theagent’spurposeistolearnanoptimalpolicyovertimebytrialanderrorinordertogainamaximalaccumulatedrewardasreinforcement.Inaddition,thecombinationofdeeplearningandreinforcementlearningfurtherenhancestheabilityofreinforcementlearning[18].1.1ReinforcementlearningsecurityandprivacyissuesHowever,reinforcementlearningisweaktosecurityattacks.Itistenderforattackerstoleveragethebreachabledatasource[19].Forexample,datapoi-soningattacks[20]andadversarialperturbations[21]areverypopularexistingproposedoverthepastfewyearstoaddressthesesecurityconcerns.SomeresearchershavefocusedonprotectingthemodelfromattacksandensuringSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy3Fig.1Anautonomousdrivingscenario.Thegreencarisanagent.theenvironmentcomprisestheroad,thetra?csigns,othercars,etc.thatthemodelstillperformswellwhileunderattack.Theaimistomakesurethemodeltakessafeactionsthatareexactlyknown,ortogetoptimalpolicyunderworsesituations,suchasbyusingadversarialtraining[22].Figure2presentsanexampleofsecurityattacksinreinforcementlearn-inginanautonomousdrivingscenario.Anautonomouscarisdrivingontheroadandobservingitsenvironmentthroughsensors.Tokeepsafewhiledriv-ingautonomously,itwillcontinuallyadjustitsbehaviorbasedontheroadconditions.Inthiscase,anattackermayfocusonin?uencingtheautonomousdrivingconditions.Forexample,ataparticulartime,theoptimalactionforthecartotakeistogostraight;however,anactionattackmaydirectlyin?uencetheagenttoturnright(theattackmayalsoimpactthevalueofthereward).Withregardtoenvironmentalin?uencingattacks,theattackermayconceiveorfalselyinsertacarintherightfrontoftheenvironment,andthisdisturbingmaymisleadtheautonomouscarintotakingawrongaction.Asforrewardattacks,rivalsmaytrytochangethevalueofthereward(e.g.,from+1to-1)andtherebyimpactthepolicyoftheautonomouscar.Moreover,reinforcementlearningalsohasbeensubjecttoprivacyattacksduetoitsweaknessesthatcanbeleveragedbyattackers.Establishedsamplesusedinreinforcementlearningcontainthelearningagent’sprivateinforma-tion,whichisvulnerabletoawidevarietyofattacks.Forexample,indiseasetreatmentapplicationswithreinforcementlearning[1],real-timehealthdataisrequired,andtoachieveanaccuratedosageofmedicine,theinformationisalwayscollectedandtransmittedinplaintext.Thismaycausedisclosureofusers’privateinformation;consequently,thereinforcementlearningsystemmaycollectdatafrompublicresources.Mostcollecteddatasetscontainpri-vateorsensitiveinformationthathasahighprobabilityofbeingdisclosed[23].Moreover,reinforcementlearningmayalsorequiredatasharing[24]andneedstotransmitinformationduringthesharingprocess.Thus,attacksonnetworklinkscanalsobesuccessfulinareinforcementlearningcontext.Furthermore,cloudcomputing,whichisalwaysusedforreinforcementlearningcomputationSpringerNature2021LATEXtemplate4NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyFig.2Asimpleexampleofasecurityattackinreinforcementlearninginthecontextofautomaticdriving.Anactionattack,environmentalattackandrewardattackareshownrespectively.Anactionattackworksbyin?uencingthechoiceofactiondirectly,suchasbytemptingtheagenttotaketheaction“turnright”ratherthantheoptimalaction“gostraight”.Environmentalattacksattempttochangetheagent’sperceptionoftheenviron-mentsoastomisleaditintotakinganincorrectaction.Finally,therewardattackworksbychangingthevalueofarewardgivenforaspeci?cactioninastate.andstoragehasinherentvulnerabilitiestocertainattacks[25].Ratherthanchangingora?ectingthemodel,theattackersmaychoosetofocusonobtainingorinferringtheprivacydata;forexample,Panetal.[26]inferredinformationaboutthesurroundingenvironmentbasedonthetransitionmatrix.Themainapproachestodefendingprivacyandsecurityinthereinforcementlearningcontextincludeencryptiontechnology[27]andinformation-hidingtechniques,suchasdi?erentialprivacy[28].Inaddition,somearti?cialalgo-atedlearning(FL)whichcanpreserveprivacyforthelearningmechanismandstructure.Yuetal.[30]adoptfederatedlearning(FL)intoadeepreinforce-mentlearningmodelinadistributedmanner,withthegoalofprotectingdataprivacyforedgedevices.1.2OutlineandSurveyOverviewAsanincreasingnumberofsecurityandprivacyissuesinreinforcementlearn-ingemerge,itismeaningfultoanalyzeandcompareexistingstudiestohelpsparkideasabouthowsecurityandprivacymightbeimprovedinfutureinthisspeci?c?eld.Overrecentyears,severalsurveysonthesecurityandprivacyofreinforcementlearninghavebeencompleted:(1)Chenetal.[31]reviewedtheresearchrelatedtoreinforcementlearningfromtheperspectiveofarti?cialintelligencesecurityaboutadversarialattacksanddefence.Theauthorsanalysedthecharacteristicsofadversarialattackiesrespectively(2)Luongetal.[32]presentedaliteraturereviewonapplicationsofdeepreinforcementlearningincommunicationsandnetworking;SuchastheInternetofThings(IoT).Theauthorsdiscusseddeepreinforcementlearningapproachesproposedaboutissuesincommunicationsandnetworking,whichSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy5includedynamicnetworkaccess,dataratecontrol,wirelesscaching,datao?oading,networksecurity,andconnectivitypreservation.(3)Anothersurveypaper[14]conductedaliteraturereviewonsecuringIoTdevicesusingreinforcementlearning.Thispaperpresenteddi?erenttypesofcyber-attacksagainstdi?erentIoTsystemsanddiscussedsecuritysolutionsbasedonreinforcementlearningagainsttheseattacks.(4)Wuetal.[33]surveyedthesecurityandprivacyrisksofthekeycompo-nentsofablockchainfromtheperspectiveofmachinelearning,andhelptoabetterunderstandingofthesemethodsinthecontextofIIoT.Chenetal.[34]alsoexploreddeepreinforcementlearninginthecontextofIoT.Ourworkdi?ersfromtheaboveworks.However,theworksmentionedaboveareallfocusedontheIoTorcommu-nicationnetworks.Theyareabouttheapplicationofreinforcementlearning.Veryfewexistingsurveyshavecomprehensivelypresentedthesecurityandprivacyissuesinreinforcementlearningratherthantheapplication.Someofthemconcentrateontheattackand/ordefensemethods.However,theyarejustanalysingthewholein?uence.Accordingly,inthispaper,wehighlighttheobjectsthattheattacksaimatandprovideacomprehensivereviewofthekeymethodsusedtoattackanddefendtheseobjects.Themaincontributionsofoursurveycanbesummarizedasfollows:●ThesurveyorganizestherelevantexistingstudiesfromanovelanglethatisbasedonthecomponentsoftheMarkovdecisionprocess(MDP).WeclassifycurrentresearchesonattacksanddefencesbasedontheirobjectsinMDP.Thisprovidesanewperspectivethatenablesfocusingonthetargetofthemethodsacrosstheentirelearningprocess.●Thesurveyprovidesaclearaccountoftheimpactcausedbythetargetedobjects.TheseobjectsarecomponentsinMDPthatarerelatedtoeachotherandmayexistinthesametimeor/andspace.AdoptingthisapproachenablesustofollowtheMDPtocomprehendtherelevantobjectsandtherelationshipsbetweenthem●Thesurveycomparesthemainmethodsofattackingordefendingthecom-ponentsofMDP,andtherebyshedssomelightsontheadvantagesanddisadvantagesofthesemethods.Theremainderofthispaperisstructuredasfollows.We?rstpresentpre-liminaryconceptsinreinforcementlearningsystemsinSection2.WethenoutlinethesecurityandprivacychallengesinreinforcementlearninginSection3.Next,wepresentfurtherdetailsonsecurityinreinforcementlearninginSection4,followedbyanoverviewofprivacyinreinforcementlearninginSection5.Wefurtherdiscussthesecurityandprivacyinreinforcementlearn-ingapplicationsinsection6.Finally,Sections7and8presentouravenuesfordiscussionandfutureworkandconclusionrespectively.SpringerNature2021LATEXtemplate6NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy2Preliminary2.1NotationTable1liststhenotationsusedinthisarticle.RLisreinforcementlearning,andDRLisdeepreinforcementlearning.MDPstandsfortheMarkovDecisionProcess,whichiswidelyusedinreinforcementlearning.MDPcanbedenotedbyatuple(S,A,T,r,γ),whichismadeupoftheagentactionspaceA,theenvironmentstatespaceS,therewardfunctionr,thetransitionmatrixT,andadiscountfactorγe[0,1).Thetransitionmatrixisaprobabilitymappingfromstate-actionpairstostatesT:(SxA)xS→[0,1].Theagent’spurposeisto?ndanoptimalpolicythatcanmapenvironmentstatestoagentactionstomaximizelong-termreward.vπ(s)andQπ(s,a)arethestateandaction-statevalues,whichcanregardasameansofevaluatingthepolicy.Table1Themainnotationsthroughthepaper.notationsmeaningRLReinforcementlearningDRLDeepreinforcementlearningMDPMarkovdecisionprocessATheactionspaceoftheagentSThestatespaceoftheenvironmentTThetransitionmatrixrTherewardfunctionγAdiscountfactorwhichiswithintherange(0,1)πPolicyv┐(s)StatevalueQ┐(s,a)Action-statevalue2.2ReinforcementlearningThereinforcementlearningmodelcontainstheenvironmentstatesS,theagentactionsA,andscalarreinforcementsignalsthatcanberegardedasrewardsr.Alltheelementsandtheenvironmentcanbeconceptualizedasawholesystem.Atstept,whenanagentinteractswiththeenvironment,itcanreceiveastateoftheenvironmentstasinput.Basedonthestateoftheenvironmentst,theagentchoosesanactionatusingthepolicyπasoutput.Next,theactionchangesthestateoftheenvironmenttost+1.Atthesametime,theagentwillobtainarewardrtfromtheenvironment.Thisrewardisascalarsignalthatcanberegardedasanindicatorofthevalueforthestatetransition.Inthisprocess,theagentlearnsapieceofknowledge,whichmayberecordedasst,at,rt,st+1inaQtable.Qtablehascalculatedthemaximumosethebestactionateachstate.Inthenextstep,theupdatedst+1andrt+1willbesenttotheagentagain.Theagent’spurposeistolearnanoptimalpolicyπsoastogainthehighestpossibleaccumulatedrewardr.ToarriveattheSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy7optimalpolicyπ,theagentcantrainbyapplyingatrialanderrorapproachoverthelong-termepisodes.AMarkovDecisionProcess(MDP)withdelayedrewardsisusedtohan-dlereinforcementlearningproblems,suchthatMDPisakeyformalisminreinforcementlearning.Fig.3TheinteractionbetweenagentandenvironmentwithMDP.Theagentinteractswiththeenvironmenttogainknowledge,whichmayberecordedasatableoraneuralnetworkmodel(inDRL),andthentakesanactionthatwillreacttotheenvironmentstate.Iftheenvironmentmodelisgiven,twosimpleiterativealgorithmscanbechosentoarriveatanoptimalmodelintheMDPcontext:namely,valueiter-ation[35]andpolicyiteration[36].Whentheinformationofthemodelisnotknowninadvance,theagentneedstolearnfromtheenvironmenttoobtainthisdatabasedonanappropriatealgorithm,whichisusuallyakindofstatisticalalgorithm.AdaptiveHeuristicCriticandTD(λ),whichisapolicyiterationmechanism,wereusedintheearlystagesofreinforcementlearningtolearnanoptimalpolicywithsamplesfromtherealworld[37].Subsequently,theQ-learningalgorithmincreasedinpopularity[38,39]andisnowalsoaveryimportantalgorithminreinforcementlearning.TheQ-learningalgorithmisalsoaniterativeapproachusedtoselectanactionwithamaximumQvalue,whichisanevaluationvalue,inordertoensurethatthechosenpolicyisopti-mal.Moreover,duetoitsabilitytodealwithhigh-dimensionaldataandtoapproximatethefunction,deeplearninghasbeencombinedwithreinforce-mentlearningtocreatethe?eldof“deepreinforcementlearning”(DRL)[40].Thiscombinationhasledtosigni?cantachievementsinseveral?elds,suchaslearningfromvisualperceptual[18]androbotics[41].AnexampleofreinforcementlearningispresentedinFigure4.The?guredepictsarobotsearchingforanobjectintheGridWorldenvironment.Theredcirclerepresentsthetargetobject,thegreyboxesdenotetheobstacles,andthewhiteboxesdenotetheroad.Therobot’spurposeisto?ndaroutetotheredcircle.Ateachstep,therobothasfourchoicesofaction:walkingup,down,leftandright.Inthebeginning,theagentreceivesinformationfromtheenvironmentwhichmaybeobtainedthroughsensorssuchasradarorcamera.SpringerNature2021LATEXtemplate8NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyTheagentthenchoosesanactionandreceivesacorrespondingreward.Inthepositionshowninthe?gure,choosingtheactionofup,leftorright,mayresultinalowerreward,asthereareobstaclesinthesethreedirections.However,takingtheactionofmovingdownwillresultinahigherreward,asitwillbringtheagentclosertoitsgoal.Fig.4Asimpleexampleofreinforcementlearning,inwhicharobottriesto?ndanobjectintheGridWorldenvironment.Thebluerobotcanbeseenastheagentinreinforcementlearning.Theredcircleisthetargetobject.Thegreyboxesdenotetheobstacles,whilethewhiteboxesdenotetheroad.Therobot’spurposeisto?ndaroutetotheredcircle.2.3MarkovDecisionProcess(MDP)TheMarkovdecisionprocess(MDP)isaframeworkusedtomodeldecisionsinanenvironment[42].Fromtheperspectiveofreinforcementlearning,MDPisanapproachwhichhasadelayedreward.InMDP,thestatetransitionsarenotrelatedtoanypreviousenvironmentstatesoragentactions.Thatistosay,thenextstateisindependentofthepreviousstatesandbasedonthecurrentenvironmentstate.MDPcanbedenotedasthetuple(S,A,T,r,γ),whichismadeupoftheagentactionspaceA,theenvironmentstatespaceS,therewardfunctionr,thetransitionmatrixT,andadiscountfactorγe[0,1).Thetransitionmatrixcanbede?nedasaprobabilitymappingfromstate-actionpairstostatesT:(SxA)xS→[0,1].Theagent’spurposeisto?ndanoptimalpolicyπthatcanmapenvironmentstatestoagentactionsinawaythatmaximizesitslong-termreward.Thediscountfactorγisappliedtotheaccumulatedrewardtodiscountfuturerewards.Inmanycases,thegoalofareinforcementlearningalgorithmwithMDPistomaximizetheexpecteddiscountedcumulativereward.Attimestept,wedenotetheenvironmentstate,agentaction,andrewardbyst,atandrtrespectively.Moreover,weusevπ(s)andQπ(s,a)toevaluateSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy9thestateandaction-statevalue.Thestatevaluefunctioncanbeexpressedasfollows:Vπ(s)=Eπ┌γkrt+k+1Ist=s,π┐Theaction-statevaluefunctionisasfollows:Qπ(s,a)=Eπ┌γkrt+k+1Ist=s,at=a,π┐(1)(2)whereγisthediscountfactorandrt+k+1istherewardoft+k+1step.Inawidevarietyofworks,Q-learningwasthemostpopulariterationmethodappliedtodiscountedin?nite-horizonMDPs.2.4DeepreinforcementlearningInsomecases,reinforcementlearning?ndsitdi?culttodealwithhigh-dimensionaldata,suchasvisualinformation.Deeplearningenablesreinforce-mentlearningtoaddresstheseproblems.Deeplearningisatypeofmachinelearningthatcanuselow-dimensionalfeaturestorepresenthigh-dimensionaldatathroughtheapplicationofamulti-layerArti?cialNeuralNetwork(ANN).Consequently,itcanworkwithhigh-dimensionaldatain?eldssuchasimageandnaturallanguageprocessing.Moreover,deepreinforcementlearning(DRL)combinesreinforcementlearningwithdeepneuralnetworks,therebyenablingreinforcementlearningtolearnfromhigh-dimensionalsituations.Hence,DRLcanlearndirectlyfromraw,high-dimensionaldata,andcanaccordinglyacquiretheabilitytounderstandthevisualworld.Moreover,DRLalsohasapowerfulfunctionapproximationcapacity,whichalsoemploysdeepneuralnetworkstotrainapproximatefunctionsinreinforcementlearning;forexam-ple,toproducetheapproximatefunctionofaction-statevalueQπ(s,a)andpolicyTheprocessofDRLisnearlythesameasthatofreinforcementlearning.Theagent’spurposeisalsotoobtainanoptimalpolicythatcanmapenvi-ronmentstatestoagentactionsinawaythatmaximizeslong-termreward.Themaindi?erencebetweentheDRLandreinforcementlearningprocessesliesintheQtable.AsshowninFigure3,inreinforcementlearning,thistablemaybeaformthatrecordsthemapfromstatetoaction;bycontrast,indeepreinforcementlearning,aneuralnetworkistypicallyusedtorepresenttheQtable.3SecurityandprivacychallengesinreinforcementlearningInthissection,wewillbrie?ydiscusssomerepresentativeattacksthatcausesecurityandprivacyissuesinreinforcementlearning.Inmoredetail,weSpringerNature2021LATEXtemplate10NewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacyexploredi?erenttypesofsecurityattacks(speci?cally,adversarialandpoi-soningattacks)andprivacyattacks(speci?cally,geneticalgorithm(GA)andinversereinforcementlearning(IRL)).Moreover,somerepresentativedefencemethodswillalsobediscussed(speci?cally,di?erentialprivacy,cryptography,andadversariallearning).Wefurtherpresentthetaxonomybasedonthecom-ponentsofMDPinthissection,alongwiththerelationshipsandimpactsamongthesecomponentsinreinforcementlearning.3.1Attackmethodology3.1.1SecurityattacksInthispart,wediscusssecurityattacksdesignedtoin?uenceorevendestroythereinforcementlearningmodelinthereinforcementlearningcontext.Specif-ically,webrie?yintroducesomerecentlyproposedattackmethodsdevelopedforthispurpose.Oneofthepopularmeaningsoftheterm”securityattack”isanadversar-ialattackwithadversarialexamples[43,44].Thecommonformofadversarialexamplesinvolvesaddingimperceptibleperturbationstodatawithapre-de?nedgoal;theseperturbationscandeceivethesystemintomakingmistakesthatcausemalfunctions,orpreventitfrommakingoptimaldecisions.Becausereinforcementlearninggathersexamplesdynamicallythroughoutthetrain-ingprocess,attackerscandirectlyaddimperceptibleperturbationstostates,environmentinformation,andrewards,allofwhichmayin?uencetheagentduringreinforcementlearningtraining.Forexample,considertheadditionoftinyperturbationstostatesinordertoproduces+6[40,45](6istheaddedperturbation).Eventhissmallchangemaya?ectthefollowingreinforcementlearningprocess.Attackersdeterminewhereandwhentoaddperturbations,andwhatperturbationstoadd,inordertomaximizethee?ectivenessoftheirattack.Manyalgorithmsthataddadversarialperturbationshavebeenproposed.Examplesincludethefastgradientsignmethod(FGSM),whichcancalculateadversarialexamples,thestrategically-timedattack,whichfocusesonselectingthetimestepofadversarialattacks,andenchantingattack(EA),whichcanmisleadtheagentregardingtheexpectedstatethroughaseriesofcraftedadversarialexamples.Moreover,defensestoadversarialexampleshavealsobeenstudied.Themostrepresentativemethodisadversarialtraining[46],whichtrainsagentsunderadversarialexamplesandtherebyimprovesmodelrobustness.Otherdefensivemethodsfocusonmodifyingtheobjectivefunction,suchasbyaddingtermstothefunctionoradoptingadynamicactivationfunction.Anothercommontypeofsecurityattackisthepoisoningattack,whichfocusesonmanipulatingtheperformanceofamodelbyinsertingmaliciouslycrafted”poisondata”intothetrainingexamples.Apoisoningattackisoftenselectedwhenanattackerhasnoabilitytomodifythetrainingdataitself;instead,theattackeraddsexamplestothetrainingset,andthoseexamplesSpringerNature2021LATEXtemplateNewChallengesinReinforcementLearning:ASurveyofSecurityandPrivacy11canalsoworkattesttime.Attacksbasedonapoisonedtraining
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五年度新股東入資生物制藥產(chǎn)業(yè)合作協(xié)議
- 2025年度電子商務(wù)平臺(tái)員工勞務(wù)外包及運(yùn)營(yíng)合同
- 二零二五年度長(zhǎng)租公寓退租服務(wù)保障協(xié)議
- 二零二五年度餐飲連鎖生意合作合同范本
- 房產(chǎn)證抵押貸款合同抵押物管理協(xié)議(2025年度)
- 二零二五年度精裝高層購(gòu)房定金合同
- 2025年度私人宅基地買(mǎi)賣(mài)轉(zhuǎn)讓協(xié)議書(shū)及配套設(shè)施建設(shè)補(bǔ)充協(xié)議
- 2025年度租房押金監(jiān)管及退還標(biāo)準(zhǔn)合同
- 二零二五年度文化產(chǎn)業(yè)投資入股協(xié)議
- 2025年黑龍江貨運(yùn)從業(yè)資格證的試題
- 《合同能源管理介紹》課件
- 臨水臨電管理制度
- 水準(zhǔn)儀使用方法及原理課件
- 機(jī)動(dòng)絞磨安全操作規(guī)程范本
- 初中體育與健康八年級(jí)全一冊(cè)第一章 體育與健康理論知識(shí)科學(xué)發(fā)展體能
- 橋梁工程地基與基礎(chǔ)的試驗(yàn)檢測(cè)-鉆(挖)孔灌注樁檢測(cè)
- DL-T 2578-2022 沖擊式水輪發(fā)電機(jī)組啟動(dòng)試驗(yàn)規(guī)程
- 兆歐表的使用課稿
- 勞動(dòng)教育-專(zhuān)題一崇尚勞動(dòng)(勞動(dòng)的意義)
- 自然辯證法概論-第4章(2018新大綱)
- 23年-制袋車(chē)間管理制度
評(píng)論
0/150
提交評(píng)論