人工智能計算的區(qū)塊鏈框架_第1頁
人工智能計算的區(qū)塊鏈框架_第2頁
人工智能計算的區(qū)塊鏈框架_第3頁
人工智能計算的區(qū)塊鏈框架_第4頁
人工智能計算的區(qū)塊鏈框架_第5頁
已閱讀5頁,還剩5頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

BlockchainFrameworkforArtificialIntelligenceComputation

JieYou1,2,*

1DasudianTechnologiesLtd.,Shenzhen,518057,China

2InstituteofComputerEngineering,HeidelbergUniversity,Heidelberg,69117,Germany

*

barco@

Abstract

Blockchainisanessentiallydistributeddatabaserecordingalltransactionsordigitaleventsamongparticipatingparties.Eachtransactionintherecordsisapprovedandverifiedbyconsensusoftheparticipantsinthesystemthatrequiressolvingahardmathematicalpuzzle,whichisknownasproof-of-work.Tomaketheapprovedrecordsimmutable,themathematicalpuzzleisnottrivialtosolveandthereforeconsumessubstantialcomputingresources.However,itisenergy-wastefultohavemanycomputationalnodesinstalledintheblockchaincompetingtoapprovetherecordsbyjustsolvingameaninglesspuzzle.Here,weposeproof-of-workasareinforcement-learningproblembymodelingtheblockchaingrowingasaMarkovdecisionprocess,inwhichalearningagentmakesanoptimaldecisionovertheenvironment’sstate,whereasanewblockisaddedandverified.Specifically,wedesigntheblockverificationandconsensusmechanismasadeepreinforcement-learningiterationprocess.Asaresult,ourmethodutilizesthedeterminationofstatetransitionandtherandomnessofactionselectionofaMarkovdecisionprocess,aswellasthecomputationalcomplexityofadeepneuralnetwork,collectivelytomaketheblocksnoteasytorecomputeandtopreservetheorderoftransactions,whiletheblockchainnodesareexploitedtotrainthesamedeepneuralnetworkwithdifferentdatasamples(state-actionpairs)inparallel,allowingthemodeltoexperiencemultipleepisodesacrosscomputingnodesbutatonetime.

Ourmethodisusedtodesignthenextgenerationofpublicblockchainnetworks,whichhasthepotentialnotonlytosparecomputationalresourcesforindustrialapplicationsbutalsotoencouragedatasharingandAImodeldesignforcommonproblems.

Introduction

SincetheappearanceofBitcoin1,blockchaintechnologieshavebroughtaboutdisruptionstotraditionalbusinessprocesses2,3,4,havebeenusedforindustrialadvance5-11,andhaveeventriggeredinnovationsinbiotechandmedicalapplications12-16.

Blockchainseekstominimizetheroleoftrustinachievingconsensus2.Therearedifferentconsensusmechanismsexit17,wherethemostwell-knownistheproof-of-workthatrequiressolvingacomplicatedcomputationalprocess,suchasfindinghasheswithspecificpatterns.Thisconsensusalgorithmdisincentivizesmisbehaviorbymakingitcostlyforanyagenttoalterthestate,sothereisnoneedfortrustinanyparticularcentralentity.Althoughthereareothermechanismsforachievingconsensus,proof-of-workisself-sufficientandrent-freesimultaneously18.

Proof-of-worksystemshaveseveralmajorbenefits.First,theyareanexcellentwaytodeterspammers.Inaddition,proof-of-worksystemscanbeusedtoprovidesecuritytoanentirenetwork.Ifenoughnodes

(computersordedicatedminingmachines)competetofindaspecificsolution,thenthecomputationalpowerneededtooverpowerandmanipulateanetworkbecomesunattainableforanysinglebadactororevenasinglegroupofbadactors.

However,thereisaprimarydisadvantagetoproof-of-worksystems.Theyconsumealargeamountofcomputingpowerandwasteenergy,asadditionalelectricityisusedforcomputerstoperformextracomputationalwork.Thiscanadduptoanextremelylargeamountofexcesselectricityconsumptionandenvironmentaldetriment19,20,21.

Machine-learningtechnologyhasbeenpoweringmanyaspectsofmodernsociety,fromwebsearchestocontentfilteringonsocialnetworkstorecommendationsone-commercewebsites,anditisincreasinglypresentinconsumerproductssuchascamerasandsmartphones.Machine-learningsystemsareusedtoidentifyobjectsinimages22,transcribespeechintotext23,matchnewsitems,postsorproductswithusers’interests,andselectrelevantsearchresults.ParticularlywiththeboomindigitaldataontheInternet,deeplearning,asarepresentation-learningmethod,hasshowngreatpowerindrivingmyriadintelligentapplicationsandwillhavemanymoresuccessesinthenearfuture24.Becauseitrequiresverylittleengineeringbyhand,deeplearningcaneasilytakeadvantageofincreasesintheamountofavailablecomputationanddata24.

Asonebranchofmachinelearningtechnology,reinforcementlearningisthetaskoflearningwhatactionstotake,givenacertainsituationorenvironment,tomaximizearewardsignal.Incontrasttodeeplearning,whichisasupervisedprocess,reinforcementlearningusestherewardsignaltodeterminewhethertheaction(orinput)thattheagenttakesisgoodorbad.Reinforcementlearninghasinspiredresearchinbothartificialandbiologicalintelligence25,26andhasbeenwidelyusedindynamictaskscheduling27,planningandcognitivecontrol28,andmoreinterestingtopicshavebeeninactiveresearch29.

Tousemachinelearninginpracticalscenarios,generallyplentyofcomputationalpowerisrequiredtosupportso-calledartificialintelligence(AI)modeltrainingandexecutionatdifferentscalesaccordingtothecomplexityofmodelsandtheamountofdatatobeprocessed.Forinstance,GPT-330andSwitchTransformers31haveshownthatAImodelperformancescalesasapowerlawofmodelsize,datasetsizeandamountofcomputation.ThecostofAIisincreasingexponentiallytoachievethedesiredtargetwithalargermodelsizeandmorecruncheddata.Ingeneral,whenAImodelsandthetrainingdatasetsarelargeenough,themodelsneedtobetrainedformorethanafewepochstolearnfullyfromthedataandgeneralizewell;therefore,thehardwarecostandtimecostarebothhighforwell-performingAIapplications.

Ontheonehand,blockchainsystemswastealargeamountofcomputationalpowertosolvethemeaninglesspuzzlesforproof-of-work,andontheotherhand,manyusefulAIapplicationsrequiresubstantialcomputingcapacitiestoachievehighperformance.Tobalancethesetwoaspects,inthispaper,wepresentablockchainmodelthatcombinesthecomputationforproof-of-workandforartificialintelligencemodellearningproceduresasoneprocess,achievingaconsensusmechanismofblockchainandartificialintelligencecomputationsimultaneouslyandinanefficientway.

Theblockchainmodel

Inthispaper,wemodeltheblockchainsystemasanagentofreinforcementlearning.AsdepictedbyFig.1,everyblockrepresentsastateofaMarkovstatemachine,whereasthecreationandlinkingprocessofblocksisaMarkovdecisionprocess(MDP)29,withthefollowingsetup:

Theenvironmentisdefinedasoracleinthisblockchainsystem,whichprovidesthedatatoblockchainviaitsstatetransitions(????→????+1).

Inthepresentstate(????),theagentchoosesanaction(????)accordingtothecurrentpolicy(????)and

receivesareward(????+1)fromtheenvironment,whilethestateoftheenvironmenttransformsfrom????to????+1.Afterwards,thenodesofblockchaintrainthepolicymodelandupdateitfrom????to????+1,whicharestoredinthememoryofcomputingnodesasthefunctionforchoosingthenextactionby

feedingthenextstate.Thecomputationthatoccursinthisprocessisdefinedastheproof-of-workforthecomputingnodes,whichcompetetodosointheblockchainsystem.

Computingnodesofthesystemcreateanewblock,recordingthecurrentstateofenvironment(????+1),thelastchosenaction(????),thereward(????+1)receivedfromtheenvironment,thedata(????+1)tobewrittenontoblockchainforatransaction,andtheHashvalueoflastblock(???+1=

???????(????,?????1,????,????,???)),asshowninFig.2.Whenanodefinishesthecomputationofproof-of-

workandcreatesanewblock,itissayingthataminingprocessiscompleted.

Whenaminingprocesscompletes,thenewlycreatedblockislinkedtothepreviousblockbythehashvalueofthepreviousblock(Fig.2).

Figure1Theblockchainmodelbasedonreinforcementlearning

Figure2Themechanismforblockstostoredataandbeinglinked

Inanyblockofthechain,thestoredHashvalueofthepreviousblockpreventsthedatafrombeingfalsifiedbecauseifanydataarechanged,theblock’sHashvaluemustbedifferentandinturnchangethedatastoredinthenextblock,whichinvalidatesthelinkageofblockswithinthechain.Inaddition,ifthestateoftheenvironment(????)oraction(?????1)storedinoneblockismodified,thenextstate(????+1),next

action(????)andreward(????+1)willprobablybedifferentfromthoseactuallystoredinthenextblockwhen

transformedbythepolicy,whichalsolargelydecreasesthepossibilityofandincreasesthedifficultyoftamperingwithdata.

Proof-of-work

Theproof-of-workalgorithmisimplementedasfollows:

Atpresentstate(????)chooseanaction(????)basedoncurrentpolicy(????);

Exert????ontotheenvironment,orsayinteractwiththeoracle,receivingareward(????+1),andthestateofenvironmentchangesto????+1;

Basedonthestatetransition(????→????+1),actionselected(????)andtherewardreceived(????+1),the

nodesofblockchaintrainthepredefinedaction-valuefunctionofthereinforcement-learningmodelandupdatethepolicyto????+1.

Inthispaper,theproof-of-workincludesthecomputingprocessesofselectingaction,generatingrewardregulatedbycurrentpolicy(????),andtrainingtheaction-valuefunctionmodelandupdatingthepolicy.ConsideringmanypracticalMDPproblems,thestatespacesarelargeenoughorevenwithunlimitedstates,whichrequirelargeandcomplicateddeepneuralnetworkstoachieveawell-performing

approximatoroftheaction-valuefunction,sothecomputationofproof-of-workishighlyresource-demanding.Therefore,anyattemptstotamperwithdataorhackthewholeblockchainarealmostunachievableduetothedauntingcostofcomputingresourcesandtime.

Consensusbasedonrewardingofreinforcement-learning

Whenanodeworkingfortheblockchainfinishesproof-of-work,orsayaminingprocess,itneedstosynchronizethenewlygeneratedblocktoothernodesinthenetworktoguaranteetheconsistencyofdatawithinthewholenetwork.However,becauseoftheoccurrencesofnetworkdelay,errorsandattacks,nodesmaykeepdifferentversionsoftheblockchaininformation,resultingininconsistency.Therefore,wedesignaconsensusmechanismfornodestoachievedataconsistencyacrossthewholenetwork,asfollows:

First,prioritizethelongestchain:ifnodeskeepchainsofdifferentlengths,thenthelongestchainsshouldbechosenastheprovenchains;

Ifatstep1,thereismorethanonechainkept,therearetwooptionalwaystodeterminethefinalchain:

Comparingtherewardvalue(??)atthelastblockofthechains,choosethechainwiththemaximumrewardasthefinalconsentedchain.

Comparingthesumofrewards(∑????)acrossallblocksofthechain,choosetheonewiththemaximumsummationasthefinalconsentedchain.

Althoughdifferentnodessharethesamepolicyalgorithm,theyexperienceself-uniquemodeltrainingand

policyupdatingprocessesandkeeptheirownaction-valuefunctionmodelandpolicyinstancesinmemory,whicharenotsynchronizedtoeachother,soforthesamestate(????),differentnodeswillnotnecessarilyselectthesameactionorreceivethesamereward.Thisbringsabouttwovaluableaspects:

Evenifmorethan51%ofthetotalnodeswithinthenetworkarehacked,whichattemptstofalsifythedataandregenerateanewchain,whentheycompletetheproof-of-work,themaximumreward

(????????)isnotdefinitelyreceivedbythembutratherpossiblybytheunhackednodes,inwhichcasethefalsifiedblockswillnotbeconsented.Thus,theconsensusmechanismdesignedinthispaperadditionallyenhancesthesafetyoftheblockchainsystembyreducingthepossibilityofbeinghampered.

Becauseeverynodekeepsitsowninstancesoftheaction-valuefunctionmodelandpolicyandcompetestoachievethemaximumreward(????????)byimplementingtheproof-of-work,thisallowsthereinforcement-learningalgorithmtolearnalongmorethanonepath(thenumberofpathsequalstheworkingnodeswithinthenetwork)onthesameenvironmentstateandatonetimepoint.It

equivalentlyreplacestimewithspaceforAImodeltraining,whichachievesmultipleepochsoftrainingatoneround.Inthisway,whiletheblockchainisgrowing,thereinforcement-learningalgorithmbackingitsproof-of-workandconsensusmechanismmorefullylearnsdiversified

possibilitiesandconvergesfaster,therebymakingmorepreciseprediction(????→????)asquickas

possible,whichisconducivetotheoverallgoalachievementinashortertermforthereinforcement-learningmodel.ThisisspecificallybeneficialforonlinelearningapplicationsofAI.

Insummary,theblockchainsystempresentedinthispaperisadistributedtrainingsystemforreinforcement-learningalgorithms,whichacceleratesthelearningprocessofAImodelswhilerealizingblockchainproperties.

Proof-of-workwithdeepQ-learning

Specifically,weusedeepQ-learning29,32,33asthepolicyupdatingalgorithmfortheagenttolearn.Theiterationoftheaction-valuefunctioninQ-learningisformulatedas:

??(????,????)←??(????,????)+??[????+1+????????????(????+1,??)???(????,????)] (1)

where??istheaction-valuefunctiontobelearnedfortheoptimaldecision;??and????+1aretheselectedactionandreceivedrewardatstate????+1,respectively;and??(0<??<1)and??(0<??<1)arethestep-sizeparameteranddiscount-rateparameter,respectively.

Adeepneuralnetworkisusedtorepresentthe??function,andeverynodeoftheblockchainwillbetheagenttolearnthe??functionanditeratesaccordingtoequation(1),withthepolicydeterminingwhichstate-actionpairsarevisitedandupdated.

Figure3TheblockchainmodelbasedondeepQ-learning

AsshowninFig.3,atanytimestep??thenodesoftheblockchaincalculatetheoptimalaction????accordingtothecurrent??functionandstate????andthenupdatethe??functionaccordingtoformula(1)forthenextstate.Specifically,inthisresearch,werepresentthe??functionasadeepneuralnetwork.As

illustratedinFig.4,thesectioninredrepresentsthetarget,whichhasthesameneuralnetworkarchitectureasthe??functionapproximator(sectioningreen)butwithfrozenparameters.Forevery??iterations(ahyperparameter),theparametersfromthepredictionnetworkarecopiedtothetargetnetwork.AlossfunctionisdefinedasthemeansquarederrorofthetargetQ-valueandpredictedQ-value:

????????=(??+??????????(??

,??;??′)???(??,??;??)2 (2)

?? ??+1

?? ?? )

where??′and??representtheparametersofthetargetnetworkandpredictionnetwork,respectively.Then,thisisbasicallyaregressionproblem,wherethepredictionnetworkupdatesitsgradientusingbackpropagationtoconverge.

Figure4SchematicdiagramforQfunctioniterationanditsneuralnetworkrepresentations

ThestepsinvolvedinthedeepQ-learningprocedureforeverynodeoftheblockchainareasfollows:

Attimestep??,everynodefeedsstate????intothepredictionQnetwork,whichwillreturntheQ-valuesofallpossibleactionsinthestate.

Selectsanactionusinganepsilon-greedypolicy:withprobabilityepsilon(0<??<1)toselectarandomactionandwithprobability1???toselectanactionthathasamaximumQ-value,suchas

????????????(??(????,??;??).

Performsthisaction????instate????andmovestoanewstate????+1toreceivereward????+1.Writesthistransitioninformationintoanewblockandstoresitinareplaybufferofthenodeas

(????,????,????+1,????+1).

Next,samplessomerandombatchesoftransitionsfromthereplaybufferandcalculatesthelossdefinedbyequation(2).

Gradientdescentisperformedwithrespecttothepredictionnetworkparameterstominimizethisloss.Then,thenodefinishesonceproof-of-workcomputationandprovesanewlygeneratedblock.

Afterevery??iterations,copiesthepredictedQnetworkweightstothetargetnetworkweights.

Repeatabovesteps.

Theawardingmechanismformining

Inthisframework,thecomputationsforthereinforcement-learningalgorithmandparticularlyforthetrainingofdeepneuralnetworksareassignedtothenodes(miningmachines)ofblockchaintocompetefortheproof-of-work,andafternodescompletetheproof-of-work,thenodesthatarefastesttofinishthecomputationandreceivethemaximumrewardcanfinallywintoprovetheblocks,whichistheconsensusmechanismofthisblockchain.Thus,inourdesign,westipulatethemaximumreward????????astheaward

tothenodethatfinallywinsthecompetitionofproof-of-workandconsensustoencouragemore

computerswithbettercapacitytojointheblockchainnetworkandcontributetoartificialintelligencecomputations.Thisawardvalue????????iscalledthetokenofthisblockchain.

Conclusion

Inthispaper,wepresentablockchainframeworkthatorganicallystitchescomputationsforreinforcement-learningandproof-of-workaswellasaconsensusmechanism,achievingaversatiledistributedcomputingsystem.Ontheonehand,takingadvantageofthecomplexityandhighcomputingcostofthereinforcement-learningprocessanddeepneuralnetworktrainingincreasesthedifficultyofhackingtheblockchainnetworkorfalsifyingthedata.Inparticular,becausethenodeskeepself-ownedinstancesofpolicyandneuralnetworks,theykeepuncertaintiesofstatetransition(????→????+1)andaction

selectionthatmaybedifferentnodesfromnodes.Theseuncertaintiesadditionallyconsolidatethestability

ofchainlinkagesthataredifficultforhackerstomutate.Theconsensusmechanismofmaximum-reward-winaddsanadditionalbarrierdeterringhackerstotamperwiththechain.Ontheotherhand,utilizingthenodeswithintheblockchainnetworktofulfilthetrainingandrunningofAIalgorithmsnaturallycontributescomputingpowertopracticalintelligentapplications.Meanwhile,bydistributingtheAImodeltrainingtomultiplenodesthatsimultaneouslycrunchthesamedatageneratedbytheenvironment,orsayingoracleinthisblockchainsystem,thenodeskeeptheirowninstancesoftheAImodel,sothenodesexperiencedifferentpathsoflearningwithdifferentparametervaluesandhiddenstatesoftheAImodelateverytimestep.Thisequivalentlyimplementsmultipleepochsoftrainingwithinonlyoneroundofthelearningprocess,whichimprovesthetrainingefficiencyandacceleratestheconvergenceofmodels.

Discussion

TheblockchainframeworkpresentedinthispaperpavesanavenueforAIapplicationsthatrequireintensivecomputingpowerandaquickergeneralizationrateandacrediblenetworkforfeedingdatatoAImodels.Therefore,thisprovidesapotentialsolutionforfacilitatingthedevelopmentofindustrialintelligence,whichhasbeendevelopingslowlyduetoalackofdata,becauseenterprisesinindustrial

verticalsarenotwillingtosharetheirassets.Inaddition,inindustry,thereareeitherinsufficientprofessionalAItalentorcomputingcapacitiesforAIapplications,sothisblockchainframeworkcouldprovideanopenplatformencouragingAIprofessionalstocontributetheirexpertiseaswellascomputingresourcessupportingtheadvancementofindustry.Furthermore,thisframeworkisparticularlypragmaticfornonepisodicreinforcement-learningproblemswithmodelscontinuouslyadaptingtotheenvironment,suchasfinancialmarkets,IoTnetworksandfactoryoperations.

Ultimately,itcouldbeexpectedthatbycombiningblockchainandartificialintelligenceintoonecomputationalframework,thetwomostimportantresources,dataandcomputingpower,canbeutilizedinamutuallysupportivewayoveracreditableplatformthatencouragesmoreinnovationsinartificialintelligenceapplications.Finally,webelievethatthisblockchainframeworkforAIcomputationcouldbeapotentialbackboneoftheindustrialInternet.

References

Nakamoto,S.Bitcoin:APeer-to-PeerElectronicCashSystem./bitcoin.pdf(2008).

Casino,F.,Dasaklis,T.K.&Patsakis,C.Asystematicliteraturereviewofblockchain-basedapplications:Currentstatus,classificationandopenissues.TelematicsandInformatics,36,55-81(2019).

Viriyasitavat,W.&Hoonsopon,D.Blockchaincharacteristicsandconsensusinmodernbusinessprocesses.JournalofIndustrialInformationIntegration,13,32-39(2019).

Pal,A.,Tiwari,C.K.&Haldar,N.Blockchainforbusinessmanagement:Applications,challengesandpotentials.TheJournalofHighTechnologyManagementResearch,32,Issue2(2021).

Javaid,M.etal.BlockchaintechnologyapplicationsforIndustry4.0:Aliterature-basedreview.

Blockchain:ResearchandApplications(2021).

Elghaish,F.etal.Blockchainandthe‘InternetofThings'fortheconstructionindustry:researchtrendsandopportunities.AutomationinConstruction,132(2021).

Esmaeilian,B.etal.BlockchainforthefutureofsustainablesupplychainmanagementinIndustry4.0.Resources.ConservationandRecycling,163(2020).

Liu,X.L.etal.Industrialblockchainbasedframeworkforproductlifecyclemanagementinindustry

4.0.RoboticsandComputer-IntegratedManufacturing,63(2020).

Leng,J.etal.Blockchain-empoweredsustainablemanufacturingandproductlifecyclemanagementinindustry4.0:Asurvey.RenewableandSustainableEnergyReviews,132(2020).

Gupta,R.etal.Blockchain-basedsecurityattackresilienceschemesforautonomousvehiclesinindustry4.0:Asystematicreview.Computers&ElectricalEngineering,86(2020).

Mehta,D.etal.Blockchain-basedroyaltycontracttransactionsschemeforIndustry4.0supply-chainmanagement.InformationProcessing&Management,58,Issue4(2021).

Wong,D.R.,Bhattacharya,S.&Butte,A.J.Prototypeofrunningclinicaltrialsinanuntrustworthyenvironmentusingblockchain.NatCommun10,917(2019).

Mamo,N.etal.Dwarna:ablockchainsolutionfordynamicconsentinbiobanking.EurJHumGenet

28,609–626(2020).

DeFrancesco,L.&Klevecz,A.YourDNAbroker.NatBiotechnol37,842–847(2019).

Guo,X.etal.Smartphone-basedDNA

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論