太空中的嵌入式深度學(xué)習(xí):Qormino 的人工智能_第1頁(yè)
太空中的嵌入式深度學(xué)習(xí):Qormino 的人工智能_第2頁(yè)
太空中的嵌入式深度學(xué)習(xí):Qormino 的人工智能_第3頁(yè)
太空中的嵌入式深度學(xué)習(xí):Qormino 的人工智能_第4頁(yè)
太空中的嵌入式深度學(xué)習(xí):Qormino 的人工智能_第5頁(yè)
已閱讀5頁(yè),還剩4頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

EmbeddedDeepLearninginSpace:ArtificialIntelligencewithQormino?

Abstract

ArtificialIntelligence(AI)algorithmsareknowntobehighlydemandingintermsofcomputingresources.Thankstotheincreaseofcomputationalpowerofthelatestprocessingdevices,AIisalsobecomingpopularfortheSpaceindustryforvariousapplicationssuchasOn-boarddataprocessingforobservationsatellites,automatedguidanceofSpacecrafts,On-boarddecisionforcollisionprevention,Communicationsatellites,Fusionofdatasourcesforbetterpredictability,…

Untilrecently,Spaceindustrywasfacingthechallengetogetaccesstostate-of-the-artprocessingcomponentsthatwouldcomplywithSpacerequirements,i.e.highreliability,robustness,andradiationtolerance.

LedbytheGrenobleUniversitySpaceCentre(CSUG),theQlevErSatprojectleveragesthehighcomputingcapabilitiesofQorminoQLS1046-SpaceradiationtolerantprocessingmodulestorunAIalgorithmson-board,togetherwiththehighresolutionoftheimagestakenbytheEmeraldsensor.

ThiswhitepaperfirstpresentsthegeneralperformancesandfunctionalityoftheQLS1046-Spaceprocessor.Then,themainresultsfromthosebenchmarkingactivitiesaregiven,todemonstratethefeasibilitytouseQLS1046-SpacetorunembeddedAIinSpace.

Introduction

ArtificialIntelligence(AI)algorithmsareknowntobehighlydemandingintermsofcomputingresources.Thankstotheincreaseofcomputationalpowerofthelatestprocessingdevices,AIisbecomingpopularforgroundapplications.AInowcompeteswithtraditionaldataprocessinginanumberofapplications,suchasfacerecognition,autonomousdriving,orrobots.

TheSpaceindustrycanalsobenefitfromAIinvariousapplications:

On-boarddataprocessingforearlywarningstosituations,

Observationandmeteorologicalsatellites,whereon-boardprocessingallowstosendonlyrelevantandpre-processeddatatotheground,reducingdownlinkbandwidthrequirements,

AIcanimproveperformanceinautomatedguidanceofSpacecraftsincriticalmaneuverssuchasdockingorlanding,

On-boarddecisionallowsbettercollisionpreventionthankstoearlyreaction,andofferspossibilitiesofself-healthmonitoringandultimatelyautonomousself-reconfiguration,

Communicationsatellitescanbenefitfromsmartdataroutingandoptimizedantennapointingbasedonactualtrafficandweatherconditionstoincreasedatarateandminimizepowerconsumption,

Fusionofdatasourcesfromvariouskindofsensors,allowingtoseewhatisnotvisibletothe“humaneye”,includingon-boardanalysisoflargedatasetsindeepSpaceandSciencemissions.

Untilrecently,despitethiswiderangeofnewpossibilities,Spaceindustrywasfacingthechallengetogetaccesstostate-of-the-artprocessingcomponentsthatwouldcomplywithSpacerequirements,i.e.highreliability,robustness,andradiationtolerance.

LedbytheGrenobleUniversitySpaceCentre(CSUG),theQlevErSatisdevelopingananosatelliteusingartificialintelligencealgorithmstoobservetheEarthandmeetsocialchallengessuchasobservationofillegaldeforestation,monitoringofCO2emissionsorevaluationofdamagesafteranaturaldisaster.

Figure1:QlevErSatNanosatellite

ThissmartsatellitewillembedanEmerald16MPimagesensorandaQormino?QLS1046-Spaceprocessingmodule,bothnewradiation-tolerantandSpace-qualifiedcomponentsfromTeledynee2v.TheprojectleveragesthehighcomputingcapabilitiesofQLS1046-SpacetobeabletoruntheAIalgorithmson-board,togetherwiththehighresolutionoftheimagestakenbytheEmeraldsensor.

Figure3:Qormino?QLS1046-4GB Figure2:EMERALDSensor

Intheframeofthisproject,apartofthefeasibilitystudyaimedatverifyingthecomputingcapabilityoftheQorminoQLS1046-SpaceforAIalgorithms.ThiswhitepaperfirstpresentsthegeneralperformanceandfunctionalityofQLS1046-Space.Then,mainresultsobtainedinthosebenchmarkingactivitiesaregiven,demonstratingthefeasibilitytouseQLS1046-SpacetorunAIinSpace.

GeneralperformanceandfunctionalityofQormino?QLS1046-Space

QorminoisalineofprocessingmodulesfromTeledynee2vdedicatedtoSpaceandHigh-reliabilityapplications.ThosemodulescombineGHz-classmulticoreprocessors,withhighspeedDDR4memories,incompact44x26mmdimensions.Theycomeina0.8mmBGApackage,andaredesignedtorespondtoSWaP(Size,WeightandPower)constraints.Withbuilt-inDDR4buslayoutand“building-block”approach,designisfacilitatedwhileguaranteeingahighperformance.

QLS1046-SpaceistheQorminoversiondedicatedtoSpace.ItembedsaQuad-CoreArm?Cortex?-A72Microprocessorrunningupto1.8GHz,withECC-protectedL1andL2cachememoriesforreliablebehaviour.Itfeaturesarichsetofperipherals,includingintegratedpacketprocessingacceleration,highspeedseriallinkssupporting10GbEthernet,PCIe?Gen3,SATA3.0andUSB,aswellasanumberofgeneralpurposeinterfacessuchasSPI,I2C,andUART.Thecurrentversionintegrates4GBofDDR4withtransferspeedupto2.4GT/s,andaversionwith8GBisalsotargeted.

Figure4:ArchitectureofQLS1046-4GB-Space

Apartfromthepureperformanceaspect,thereasonforselectingthisdeviceisthatitisSpace-compliant.Boththeprocessorandthememoryareradiationtolerant:

SELfreeuptomorethan60MeV.cm2/mg

KnownSEU/SEFIcross-sectionsuptomorethan60MeV.cm2/mg

TID:100krad(Si)

Inaddition,QLS1046-Spaceanditscomponentsarequalified,manufactured,andscreenedfollowingNASAorECSSstandards.

Benchmark&Results

BenchmarkingactivitieswereperformedtoverifyinpracticethecomputingcapabilityofQLS1046-SpacetorunAIalgorithmsforSpaceapplications.ThefocusismainlyonAIforimageprocessing,sincetheQlevErSatprojecttargetsearthobservationusecases.Inthisstudy,onlyneuralnetworkswithdeeplearninghavebeentested.Classicalmachinelearningusuallyrequireslesscomputingresources,thusitwouldbeexpectedtogetevenbetterresultsinmachinelearning.

Inthisstudy,theperformancesofQLS1046-Spacewereevaluatedonthreedifferentaxes:

ThepurecomputingperformanceswereevaluatedintermsofGFLOPS(GigaFloatingPointOperationsPerSecond),sincethisisthetypicalwayofevaluatingthecomputingperformanceofadeviceinAIapplications.

Aninferencebenchmarkwasperformedtoverifythecapabilityofthedevicetoexecuteneuralnetworks.Severalclassicalneutralnetworkarchitectureshavebeentested.

Trainingperformancewasbrieflyassessed,toevaluatethepossibilityofapplyinglearningorfine-tuningonQLS1046-Space.

Benchmarksetup

TheperformanceassessmentwasrealizedwithaQLS1046-Spacedevelopmentkit,whichhasanumberofavailableinterfaces.TheoperatingsystemusedthroughoutthebenchmarkwasLinux(Ubuntu18.04).TheQSL1046-Spacedeviceinsidethedevelopmentkithad4GBofintegratedDDR4memory.Theversionwith8GBofDDR4memorywouldhavebeenmoreefficienttoexecuteAI,butitwasnotavailableatthetimeofthetesting.Inaddition,theprocessorwasrunningat1.6GHz,insteadof1.8GHzmaximumfrequency.ThismeansthattheresultspresentedinthiswhitepaperaresomewhatlimitedbytheamountofDDR4memoryavailableandtherunningfrequencyoftheprocessor.

Figure5:QLS1046-SpaceDevelopmentKit

Insomeofthefollowingbenchmarkresults,aregularcomputerwasusedasabasisforcomparisontorateQLS1046-Spaceperformance.ThiscomputerhadanIntel?Core?i7-9750Hprocessorrunningat

2.6GHzand32GBofDDR4.ItwasrunningLinux.ItisconsideredasagoodcomputertoperformAI,whichiswhyitisaconvenientreferenceinthefollowing.

Benchmarkresults

PerformancesofQLS1046-Spacewereevaluatedonthefollowingthreeaxes:

Figure6:Benchmarks

Purecomputingperformance

Forthepurecomputingperformanceevaluation,thebenchmark[1]wasused,whichconsistsinasmallandsimpletestsoftware.Intheresults,theperformanceofQLS1046-SpaceiscomparedtothatofthecomputerwiththeIntel?Core?i7-9750Hprocessor.Itshouldbenoticedthattheexecutionofthesoftwaredoesnottakeadvantagethehardwareacceleratorsoftheprocessors.ThisexplainsinparticularwhytheGFLOPSnumbersobtainedherearelowerthatwhatcanbefoundintheliteratureforthoseprocessors.Figure7presentsthepureresultsinGFLOPStocomparebothtargets.Figure8comparespowerefficiencysincethisisakeytopicinSpaceapplications.

Figure7:Summaryofthecomputingperformancecomparison.

Figure8:Powerefficiencyinquadcoreoperation.

Calculatedfromthermalpowercharacteristicsofbothdevices,45W@100°Cforthei7(Table5-2of[2]),14.6W@105°CforQLS1046-Space(Table8of[3]).

Itisobservedthatthegapbetweenthetwodevicesdependsonthenumberofcoresused,andwithhighernumberofcoresthedifferenceinperformancereduces.ThoseresultshighlightthatQLS1046-

Spaceoffersabouthalfofthecomputingcapabilitiesofthei7inthequad-coreconfiguration,whichisknowntobeagoodprocessortoperformAIonground.Hence,QLS1046-SpaceoffersafairamountofcomputingperformancetoperformAIinSpace.Inaddition,QLS1046-SpaceexhibitshigherpowerefficiencymakingitwellsuitedforSpacesystems.

Deeplearninginferencebenchmark

Inthisbenchmark,testsareperformedtoevaluatetheperformanceofQLS1046-Spaceininference,meaningwhenthedeviceusesaneuralnetworktoprocessanimage.Onlyclassicalneuralnetworksaretestedinthestudy,firstwithArmComputeLibrary[5],thenAI-Benchmark[4]onTensorFlow[7],andConvNet[6]onPyTorch[8].ItshouldbenoticedthatthetwomostpopularlibrariesusedforIAareTensorFlowandPyTorchproposedbyGoogleandFacebookrespectively,withbothlibrariessupportedbyArm[9].Thosenetworksarepre-trainedtoidentifyobjectsinpicturesandarewidelyusedintheexistingobjectclassifierssuchasr-cnn,fast-rcnn,fasterr-cnn[10]orCenterNet[11].However,TensorFlowandPyTorchlibrariesareevolvingveryquickly,andthisisthereasonforevaluatingfirsttheperformancewithArmComputeLibrary,whichisconsideredmorestable.

ArmComputeLibrary

Inthisbenchmark,ArmComputeLibrary[5]isusedtorundifferentclassicalneuralnetworks.TheresultsobtainedonQLS1046-SpaceareshownintheTable1:

Network

Executiontime[ms]

Numberofoperationsforaninference[MFLOP]

Computingperformances[GFLOPS]

Singlecore

Quadcore

Single

core

Quadcore

Alexnet

153

74

727

5

10

Googlenet

286

109

1500

5

14

Inceptionv3

848

314

6000

7

19

Inceptionv4

1870

655

13000

7

20

Mobilenet

118

44

570

5

13

Resnet50

501

206

4000

8

19

Squeezenet

145

64

360

2

6

Vgg16

1090

418

16000

15

38

Yolov3

6540

2500

66000

10

26

Table1:PerformanceofQLS1046-SpacewithArmcomputelibrary.

Thoseresultsconfirmthatitispossibletoperformon-boardimageclassificationusingQLS1046-Space,withthiskindofcommonclassifiers,andwithreasonableexecutiontime.ThoseresultsareespeciallyinterestingconsideringthatArmcomputelibraryisoneofthemajorframeworksforAI.

AI-Benchmark

AI-Benchmark[4]instantiatesbackbonesintheTensorFlowformat,whichareverycommonneuralnetworksoriginallycreatedforimageclassification.TheresultsofthebenchmarkfordifferentneuralnetworksaregivenintheTable2:

Backbone

Picture

size

Execution

time[ms]

Variability

[ms]

Description

VGG16[9]

224x224

1320

7

NetworktrainedonImageNet[12]to

classify1000objects.

VGG19[9]

512x512

13562

144

NetworktrainedonImageNet[12]to

classify1000objects.

ResNet-V2-50

346x346

868

5

Classifierbasedonresidualneural

network[13]

ResNet-V2-152

256x256

1538

18

Classifierbasedonresidualneural

network

Table2:AI-BenchmarkresultsonQLS1046-Space.

TheresultsshowthatQLS1046-Spaceallowstoperformanon-boardimageclassificationwithclassicalneuralnetworksinabout1s.ThisimpliesanoptimizedmemorymanagementwiththeuseofFP16type,andwithpicturesizesuitablewiththememoryavailableof4GB.ItisnoticedthatVGG19[9]isaround10timeslongertoexecutethanothertests,whichmaybeduetocachememoriesconfigurationandDDR4sizelimitation.

Basedontheresults,QLS1046-Spaceobtainsascoreof103.Neuralnetworksareknowntorequirelargeamountsofmemory,hencetheperformanceobtainedhereislimitedbytheDDR4sizeof4GBonthetestedversion.Muchhigherrankingisexpectedwithan8GBversion.

Convnet

Inthisbenchmark,ConvNet[6]onPyTorchistestedonQLS1046-Space.PytorchtendstobeusedmoreandmoreoftenoverTensorFlow.PyTorchwasoriginallymorecomplextousebutwasmoreflexible.FromPyTorchversion1.8,animportantreductionincomplexityisexpectedtobenefittoQLS1046-Space.ItshouldalsobenoticedthatPyTorchisnowcanhandletoolssuchasSLURM[14]onpytorch-lightning[15].ConvnetbenchmarkresultsonPyTorcharegivenintheTable3:

Network

Executiontime[ms]

QLS1046-Space@1.6GHz

Intel?Core?i7-9750H@2.6GHz

Alexnet

187

1.72

VGG11

764

4.28

ResNet50

578

7.29

Squeezenet1_0

328

2.28

Densenet121

1283

17.93

Mobilenet_v2

2337

6.38

Shufflenet

1278

8.49

Unet

1263

4.98

Table3:ConvnetresultsonQLS1046-Space.

Thebenchmarkshowsthatthei7isperformingmuchfasterthanQLS1046-Space,whichislimitedagainbythesizeofmemoryavailable.Despitethegapinperformance,itisstillconsideredthattheperformancelevelofferedbyQLS1046-Spaceisacceptabletoimplementon-boardAIprocessing.

Deeplearningtrainingperformance

TrainingperformanceusingQLS1046-SpacewasquicklytestedonConvnetwithTensorFlow.Itwasnotextensivelytestedsincemostup-to-datebackpropagation[16]benchmarksrequireatleast8GBofRAMmemory.Table4showsthecomparisonofthetrainingtimeforonesampleonResNet50betweenQLS1046-SpaceandtheIntel?i7.

Network

Trainingtimeforonesample[ms]

OnQLS1046-Space

OnIntel?Core?i7-9750H

ResNet50

3782

20

Table4:Comparisonoftrainingperformance.

ThisresultclearlyshowsthepenaltyofthelackofRAMmemoryonthecurrentversionofQLS1046-Spacefortrainingontraditionalimageclassifiers.Itshouldbenoticedthatacompletetrainingusuallyrequireshundredsofsamples.Thisresulthastobemitigatedduetothefactthatimageclassifiersareknowntobehighlydemandingincomputingresources.Sinceitwillbetime-consumingtoperformacompletetrainingonQLS1046-Space,analternativethatcanbeconsideredistoperformfine-tuning

[17]on-board.

TrainingsmallconvolutionalneuralnetworksforsimpledetectionusecasesseemsfeasiblewithQLS1046-Space,aswellasdeeplearningforprocessingtime-seriesor1-Dsignals.Intermsoftrainingcapabilitiesonimages,QLS1046-Spacewouldbemoreefficientinclassicalmachinelearning,butthosemodelsaremorecomplextobuild.

Discussion

QLS1046-SpaceoffersadecentamountacomputingcapabilityallowingtorundeeplearningAIforimageprocessinginSpace.Thedeviceisnotaspowerfulastailored-madesolutionsthatareavailableforAIinferenceingroundapplications,butitisthemostpowerfulSpace-qualifiedCPUavailableonthemarket.Intermsofpurecomputingcapabilities,itoffersperformanceinthesameorderofmagnitudeasanIntel?Core?i7-9750H.FromtheAIperformancepointofview,themaindrawbackofthecurrentversionisthe4GBmemory,whichrequiresanoptimizedmemorymanagementtorunAIforimageprocessing.Onnextversionswith8GBDDR4memoryormore,AIperformancewouldbesignificantlyincreased,andwouldreducetheburdenofoptimizedmemorymanagement.

PerformanceobtainedinthepreviousbenchmarkswasevaluatedwithclassicaldeepneuralnetworkswithouttakingadvantageofthespecificQLS1046-Spacearchitecture.DifferentAItopologiesaremoreoptimizedtorunonembeddedtargets,whichwouldbringabetterefficiencyoftheAIrunningonQLS1046-Space.ApartfromAIcomputingperformance,thestudyshowsthatQLS1046-SpaceexhibitsgoodpowerefficiencymakingitwellsuitedforSpacesystemswhereelectricalpowerislimitedandpowerdissipationisanissue.Fromtheelectronicarchitecturepointofview,itmightberelevanttoaddanFPGAasacompanion-chipforQLS1046-Space,inwhichcasetheFPGAcouldtakecareefficientlyofthepre-processing,andQLS1046-Spacewouldthenperformtheheavywork.

Inthisstudy,theprimaryfocuswasondeeplearningAIforimageprocessing,whichisconsideredoneofthemostdemandingapplicationintermsofcomputingresources.Forinstance,processingof1-Dtimeseriesismuchlessdemandingthatimageprocessing.Hence,theoutcomeofthestudyisthatQLS1046-Spacewouldalso

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論