計算機專業(yè)英語論文_第1頁
計算機專業(yè)英語論文_第2頁
計算機專業(yè)英語論文_第3頁
計算機專業(yè)英語論文_第4頁
計算機專業(yè)英語論文_第5頁
已閱讀5頁,還剩6頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

######專業(yè)英語結課論文學號:**********姓名:**********論文題目:TheRelationshipandDistinctionBetweenBigDataandDataMining任課教師:************專業(yè)名稱:計算機技術所屬學院:計算機科學與工程學院桂林電子科技大學研究生院**年*月*日TheRelationshipandDistinctionBetweenBigDataandDataMiningStudentID:*Name:*Adviser:*GuilinUniversityofElectronicTechnology**,*Abstract:Inthispaper,dataminingisdiscussedinthecontextofbigdata.Firstly,weelaboratethefactthatbigdataplaysaprimaryroleinattractingacademiccommunity,businessindustryandgovernments.Secondly,theadverseofbigdataisdiscussed,suchasmuchgarbage,heavypollutionanditsdifficultiesinutilization.Finally,wedissectthevalueinbigdata,expoundthetechniquestodiscoverknowledgefrombigdata,andinvestigatethetransformationfromknowledgeintodataintelligences.Keywords:bigdata;datamining;dataintelligenceIntroductionAsdatavolumescontinuetoincreaseexponentially,thedatatsunamicaneasilyoverwhelmtraditionalanalyticstoolsorplatformsdesignedtoingest,analyzeandreport.

Everyday,2.5quintillionbytesofdataarecreatedand90percentofthedataintheworldtodaywereproducedwithinthepasttwoyears[1].Thechallengewearefacingisnotonlyhowtostoreandmanagediversedatabutalsotoeffectivelyanalyzethedatatogaininsightknowledgetomakesmarterdecisions.

Currently,anumberofworkshavebeenpresented.

Theseresearchesintroducebigdata,miningandanalyzingfromdifferentaspects,suchasstatusquo,ideasorimplementations.

Forexample:introducesthe“LambdaArchitecture”whichprovidesageneralpurposeapproachtoimplementarbitraryfunctionsonmassivedatasetinrealtime;ascalabledeepanalyticsplatformhasbeenimplemented.Becauseofthecomplexity,thereisnosingletoolorone-size-fits-allsolutionfordeeplyminingandanalyzingthebigdata.Moreover,extractingvaluableknowledgefrommassivedatasetsrequiresfurtherstudies,experimentsaswellasscalableandsmartservices,programmingtoolsandapplicationsachieved.Theremainderofthispaperisstructuredasfollows.

Section2

elaboratethefactthatbigdataplaysaprimaryroleineveryfields.Thentheadverseofbigdataisdiscussedinsection3.Afteranalyzingthevalueofbigdata,weintroduces

therelated

knowledgeanddevelopmentofdataminingin

section5.In

Section6,theeffectivenessofdataminingisintroduced.Finally,theconclusionfollow.AboutbigdataBigdataiscomplexdatasetthathasthefollowingmaincharacteristics:Volume,Variety,VelocityandVeracity[2][3].

Thesemakeitdifficulttousetheexistingtoolstomanageandmanipulate.Inthesedata,bigdataspecificallyaccountsforthevastmajority.

Bigdataisthebasisofdataandsourceofwisdomforpeopletounderstandthereal-worldthroughtheinformationworld.

BigDataiscloselyrelatedtoapplications[4][5],andbigdataminingisitsprincipalapplication.2.1Fromunderstandingthereal-worldtocreatingtheinformationworldHumancivilizationisaprocessfromunderstandingthereal-worldtocreatingtheinformationworld,whichhasgonethroughthefollowingstages:preliminarysensingtheworld,helpingmemorybyinformation,recordedandinheritedbyinformation,exchangeandcommunicationbyinformationandunderstandingtheworldonceagainbyinformation.Initially,Humantakeadvantageofstonesandshellstocountaccordingtotheprincipleofone-to-one.AndtheytieknotsNotetohelpmemory.Later,Humanusesimplegraphics,drawnotes,andinheritmoreaccuratememorythroughtheirownemotionalprompted.Whenthegraphicsbecomebodyrelativelyfixedcommonsymbol,andassociatewiththewordsinthelanguage,itproducestexts.Textsabstractandgeneralizetheworld,promoteculturalunderstanding,andpreparethenecessaryfoundationforthedevelopmentofscience.Aimedatbreakingthroughtherestrictionswhichthewrittensymbolsdependonartificialcopyingorengraving,Humanusemachinesafterindustrialrevolutiontovolumemechanizedproduction,whichimprovestheefficiencyoftheculturaltransmission.Computercentershigh-speedcomputing,andspinsoffthesoftwarefromthehardware,contributingtothedisseminationofinformation“electronically”and“automatically”.Internetcentersnetwork,interrelatescomputers,breakinglocalinformationrestriction.Mobilecommunicationcentersusers,makingthemachinefollowsuser'smovementsandunboundshumanfromthemachine.InternetofThingscentersapplications,automaticallyidentifiesobjects,toenabletheinformationsharingbetweenthehumanandthings.Cloudcomputingcentersservicebyconsolidatingexpertiseandoptimizingtheallocationofresources.

Bigdatacentersdata,andminesknowledgeintheentiredata,breakingthesamplingrandomnessofthesample[6][7],anddemonstratingonbigdatacenterandmobileterminal.

Theseinformationtechnologiesservefortheunderstandingandtransformingoftherealworld.2.2BigdataisattractingmuchattentionAshumansexploretherealworldthroughscientificresearch,humansunravelthemysteriesintheinformationworldthroughbigdataanddatamining,whichareattractingmuchattentionfromacademia.InMay2011,McKinseypublished“Bigdata:thenextfrontierforinnovation,competition,andproductivity”,analyzedapplicationpotentialofbigdataindifferentindustriesfromtheeconomicandcommercialdimensions,spelledoutthedevelopmentpolicyfortheGovernmentandindustrydecisionmakersdealingwithbigdata.

InJanuary2012,the“WallStreetJournal”arguedthatbigdata,smartproductionandwirelessnetworktwillleadtoneweconomicprosperity[8].

InMarch2012,theUnitedStatesgovernmentreleased“BigDataResearchandDevelopmentInitiative”,whichrosesthedevelopmentandapplicationofbigdatafrombusinessconducttonationaldeploymentstrategicinordertoimprovetheabilitytoextractknowledgefromlargeandcomplexdata,tohelpsolvesomeofthenation'smostpressingchallenges.

InApril2012,“NatureBiotechnology”invitedeightbiologiststoevaluateanarticlewhichpublishedinDecember2011on“Science”titling“DetectingNovelAssociationsinLargeDataSets”inapapertitled“Findingcorrelationsinbigdata”.

InJuly2012,Gartnerreleasedthefirstdatasurveyreport“HypeCycleforBigData,2012”,whichthoughtdeeplyinbigdata[9].InChina[10],bigdataattractsasmuchattentionasitdoesaroundtheworld.BaiduusesHadooptodooff-lineprocessingsince2007.Currently,Baiduhasover10,000Hadoopservers,whichismorethanYahooandFacebook,anditplanstoreach20,000in2013.Intheseservers,80%Hadoopclustersareprocessing0totalof6TBdataeverydayonloganalysis.Tencent,TaobaoandAlipayarealsousingHadooptoestablishdatawarehouseandhandlebigdata.InApril2010,Taobaolaunchedadataminingplatform“datacube”,basedonanonehundredbillionleveldatabasenamedOceanBase,whichsupportsfor4to5milliontimesupdateoperation,includingover2billionrecords,containingmorethan2.5TBdatainoneday.InMay2010,ChinaMobileestablishedamassivedistributedsystemsandstructuredmassdatamanagementsystemonthecloud.Huaweianalyzesdatabasedonmobileterminalsandstoragemassivedatathroughthecloudtoobtainvaluableinformation.Alibabaanalyzesbusinesstransactiondatathroughbigdatatechnologytodocreditapproval.BigdatadisasterBigdataiscloselyrelatedtohumandailylife,permeatedallwalksoflife.Thenumber,sizeandcomplexityareallinsharpincreasing.

Alargeamountofdatahasbeenstoredinthedatabaseandwarehouseintypesoftext,graphics,imagesandmultimedia[11].

TheresearchfromInternationalDataCorporation

hasshownthat,asof2003humanshavecreatedatotalof5EBdata,whileintheyearof2011,theamountofdatathathadbeencopiedandproducedisexceeded1.8ZB.Itisexpectedthatby2020globaldatausagewillreach35.2ZB,whichneeds37.6billionharddrivesof1TBcapacitytostore.Ontheonehandthesedatabroadensthescopeofavailablebigdataavailableforhumantogainwisdom.Ontheotherhandthevalueofasingleunitofthedataisrapidlydeclining.Humanissubmergedbythedataoceanbutthirstyforknowledge.3.1GarbageBigdataisvoluminousanditgrowsquickly,butithasverylowdensityinvalue,whichmeansthereisalotofjunkdata[12].Thestudyontheelectron-positroncolliderhasbeenabletoshoot40millionpicturespersecond,butonlyafewthousandsareuseful.RomaniaInternetsecuritycompanyBitDefenderpointedoutthat

spamandfishinginformationinthesocialnetworkgamehasincreasedbymorethan50%.Comparedtootheronlinecommunicationenvironment,socialnetworkusersaremoreeasilytounknowinglyacceptandloadgarbageinformation.Bigdataandapplicationsarecloselyrelated,andprofessionallabelingofthedataisthebasicobjectiveofrationalanalysisandsoundjudgment.

Whetherscientificexperimentaldataorobservationdataneedtobelabeledbyexpertsinthefield.

AccordingtotheIDCstatistics,in2012only23%ofallinformationisuseful,ofwhichonly3%ofpotentiallyusefulinformationhadbeenlabeled,andtheproportionofdatawhichhadbeenanalyzedismuchless.Withthedevelopmentofmodernmeasuringtechniqueanddigitalrecordingmethod,inthefaceofhugeinformation,traditional,artificial,experienceeliminationandanalysismethodshavebecomepowerless.3.2ContaminationDatacollectedfromtherealworldiscontaminated.Moreover,asearlyas1992,theMassachusettsInstituteofTechnologyfoundthatdatacontaminationproblemsarenotisolated.Inthe50unitsandagenciesthataresampledforthesurvey,mostofthedataaccuracyislessthan95%.

Regardlessofaccesstos8atialdata,therearesomeinevitableproblemsorerrors[13][14],suchascontentsincomplete,precisionerror,dataredundancy,formatcontradictory,differenttype,structureuncertainties,differentscales,differentstandard,outdated,errorexception,dynamicchangeandlocalsparse.Moreovereachissuehasanumberofcauses.Forexample,thenoisecanbeperiodicnoise,stripenoise,isolatednoiseandrandomnoise.Further,thesedataareoftenaffectedbygrosserrors,systemerrorsandrandomerrorsindividuallyorcollaboratively.

Itisboundtodamagetheexpecteddataaccuracyifthesethreekindsoferrorcannotbecorrectlyfoundandeliminatedintheadjustment.3.3DifficulttouseDataisnotonlycontaminated,butalsodifficulttouse.Theproduction,transmission,replicationandaccumulationofdatahavegonefarbeyondpeople'scapacityforanalyzing,understandingandimplementing.Duetothelargeamountof“bigdata”,itisdifficulttocollect,store,search,share,analyzeandmaterialize.

Commercialimageprocessingsoftware(ERDAS,IMAGINE,PCI,ENVI,etc.)aredifficulttocompletethefollowingmission:mixpixel,imagematchautomatically,targetextractautomatically,andotherautomaticprocessingmissionbecausethelackofnewtheoriesandmethods.Anewspaperpublishedthesamearticleofthesameauthorontwodifferentpagesof“l(fā)egalcommunity”and“youthtopics”.AnothernewspaperpublishedthreearticlesintheEditionof“homeappliances”,“l(fā)ifestyle”and“scienceandtechnology”,alltocompareamongVCD,CVDandDVDonthesameday,andgotthreedifferentconclusions,buttheeditordidnotevenrealizeit.Overtime,allwalksoflifearesubmergedbycontaminateddatagarbage,andthenitcouldleadthebigdatainto“garbagein,garbageout”,andthe“bigdata”becomestheuseless“biggarbage”.Now,usefuldataisburied,andimpliedvalueisblankedinbigdata.

Onsuchapredicament,followingproblemsarethebottlenecksforbigdataresearchtobreakthrough:howtounderstandthespatialdata,howtoextractinformationfromthedata,howtoturndataintoknowledgecanbeavailable,andfinallyhowtorealizethevalueofdata.ThevalueofdataBigdataiscollectedfromnumerousandinterconnectedsources.Realusefulnessisitsmaximumvalue.Thegenerallyacceptedruleofbigdatais“decisionondata”.Thefirstprerequisiteistokeepdataalwaysusefulandactivated.Theultimatevalueofbigdataistogainhumanintelligence.4.1OverallcognitiveoriginalappearanceBigdataprovidesanunprecedentedopportunitytoobservetherealworldinafullviewratherthanpartialsamples.Withoutbigdata,probabilitystatisticscanonlybeproducedbasedonrandomsamplingfromtherealworld,becausespacedataisconstrainedbycollection,storage,computingandtransmission.Liketheproverbialblindmengraspinganelephantcanonlytakeapartforthewhole,thereisonlyalimitedview.Incompletedatasamplingandsampledatadispersionmakeitdifficulttounderstandtheoveralltrendsortonoticetheabnormalchanges.4.2BasicresourcesMcKinseybelievesthatdataisthebasicresource,andcanbecomparedwithphysicalassets,humancapital,createsignificantvaluefortheworldeconomy,improvetheproductivityandcompetitivenessoftheenterprisesandthepublicsector,andcreatealargenumberofeconomicsurplusforconsumers.In2011,theWorldEconomicForumcalledbigdataasnewwealth.In2012,theDavosForum“BigData,BigImpact”treateddataaseconomicassetlikecurrencyorgold.In2012,Gartnerbelievesthat“Bigdataisbigmoney”.

TheU.S.governmentconsidersbigdataas“newoil”relatedtothecountry'seconomicrestructuringandindustrialupgrading[3].DataminingDataminingreferstothebasictechnologiestorealizethevalueofbigdata,relocatedataassets,anduseiteffectively.Spatialdataminingcanbeusedtoextractinformationfromdata,mineknowledgefrominformation,extractdataintelligenceinknowledge,improvetheabilityofself-learning,self-feedbackadaptation,finallyrealizehuman-machineintelligence.5.1BasicbigdatatechnologyThebasictechniquesofbigdataincludedatacollection,storage,processing,expression,andqualityevaluation.Bigdatacanbegeneratedinmobiledevices,trackingsystems,radiofrequencyidentificationdevices(RFIDs),sensornetworks,socialnetworking,Internetsearch,automaticrecordingsystems,videoarchives,e-commerce,aswellastheprocessinanalyzingthosedata.Bigdatastoragetechnologyisthebasisfordatamining.Itisdesignedtomeetthegrowingneedfordatastorage,whichaimstoprovidescalability,highreliability,excellentperformancedatastorage,access,andmanagementsolution,suchasdistributeddatastorage,multiplelevelscaching,loadbalancing,fault-tolerantmechanisms.Conventionalmethodsarenotadequateforthesemissions.Itneedstoestablishalargeplatformfordatathroughsoftware,toprovideplacestostoreandinterfacetoaccess.

Bigdataprocessingistoimplementthetransitions:fromdatatoinformation,frominformationtoknowledgeandfromknowledgetowisdom.

Bigdataexpressiontechnologyisdesignedtorepresentthedatainaclearandeffectivewaythatrevealsmeaningfulinformationtotheuser,orprovidetheuserwithanewperspectiveofview.Bigdataexpressiontechnologyincludesdigitalelevationmodels,digitalterrainmodels,flatmaps,three-dimensionalmaps,anddigitalcitymaps.Bigdataqualityassessmenttechnologyisaimedtoavoidtheriskofbigdatacollectingandhigh-densitymeasuring.Thetechnologyincludeslogicalassessmentmethod,exceptionvaluebasedassessmentmethod,andaccountingbasedassessmentmethod.5.2DiscoveryknowledgeKnowledgediscoveryisthetechnologythatusesdataminingmethodtoextractpreviouslyunknown,potentiallyuseful,andultimatelycomprehensiblerules.Itisalsoaprocessofgradualsublimationfromdatatoinformation,andtoknowledge,step-by-step.Dataminingsystemsaimstomakedatagraduallysummarizedintoknowledge.Throughtheintegrationofdata,itcandeeplyextractknowledge.Byusingsuchnewknowledge,datacanbeprocessedinrealtimeinordertounderstandandapplythedata,tomakeintelligentjudgmentsandwell-informeddecisions.Knowledgecanbeself-learning,self-enhance,universal,andeasilyrecognized.Itcouldserveasabasisfordecisionsupport.Ifbusinessestakefulladvantageofknowledge,itwillbemorepreciseanddynamicforhumanstolearn,work,life,andachievewisdomstate.Itwillhelptoimproveresourceutilizationandproductivitylevel.Moreover,itwillalsohelptorespondtotheeconomiccrisis,theenergycrisis,thedeteriorationoftheenvironmentandmanyotherglobalissues.5.3ExtractiondataintelligenceDataintelligenceistheabilitytoobtainamoreinnovative,systematicandcomprehensiveknowledgetosolveaparticularproblemthroughanin-depthanalysisofthecollecteddata.Itisanabilitytounderstandandsolveproblemsfast,flexiblyandcorrectly.Spatialdataintelligenthasthreefeatures:morethoroughlyperception,moreextensiveinteroperability,anddeeperintelligence.

Thethreefeaturesareaimedtogetbiggerandmorecomprehensivedata,toshareandco-operatedataviatheInternet,tododataanalysisanddataminingbyvarietyofadvancedtechniques,andtoconstituteahierarchyofspatialdataintelligences(Fig1).Figure1.

ThehierarchyofspatialdataintelligencesBigdataintelligencedoesnotrefertosimpleoverlaydifferentdataminingtechniques,butareasonablestructureofindustry-orientedorganization,goodrunner,andpowerfulwisdomsystem.Themorereasonableindustrystructurebecome,thesmallerinternalfrictiongot,thegreatereffectivenessgot,andthehigherwisdomsystemgot.Everytimewhenapersoninteractingwiththedatahe/shebecomesmoreefficientandmoreproductive,whichmeansitformsabetterwaytoanalyze,summarize,andcalculate.Throughtheconsolidationandanalysisoftrans-regional,trans-sectordata,withknowledgeappliedinspecificindustry,specificscenesandspecificsolution,bigdataintelligencecansupportdecision-makingandactioninabetterway.

Morein-depthdataintelligenceistocreatenewvalueofdata.Ontheonehand,whenmakingfulluseofspatialdataknowledgeinallwalksoflife,itcanproducesecondaryknowledge.Inordertoformaminingmechanismtomineknowledgeinknowledge,itneedstobringprimaryknowledgetogethertoformanintelligentformofexpression.Ultimately,thedestinationknowledgecanbeachieved.Ontheotherhand,basedonageneralindustrialorsocio-ecologicalsystem,itcanredefinetheinteractivemodeofgovernment,companiesandindividuals,sothatitimprovestheinteractionclarity,efficiency,flexibilityandresponsespeed.Itchangesfromthetraditionalsingledimensionsuchas:productionconsumption,managementbemanagement,orplanningexecution,toanewmulti-dimensionalcollaborativerelationship.Inthisnewrelationship,bothindividualsandorganizationscanfreelycontributeandgetinformationandexpertiseaccuratelyandtimely.Thisnewrelationshipexertsapositiveinfluenceoneachothertoreachsmartrunningmacro-effects.EffectivenessWhenwepossessthenecessaryknowledgeandabilitytocontrolit,thedatabecomesourvaluableassetthatleadstomarketdominationandhugeeconomicreturns.

Bigdatatechnologyprovidersusetechnologyforusersprocessingstructured,semi-structuredandunstructureddata.BigdataapplicationsareincreasinglyInternetubiquitous,richinterfaced,andfragmented.Itisaverticalintegrationintheapplicationindustry,therefore,businessthatisclosertoend-users,tendstohavealargerinfluenceintheindustrychain.MorganStanley'sreportinsiststhat“BigDataissoontobecomeAnyData[15]”,Inordertowinthefuture,therationalchoiceisthat“givingcustomersthetechnologiestheyneedtostoreandanalyze‘any’dataset-anytypeofdata,anysizeofdata,foranytypeofuser,andinanytimeframe.”ConclusionThedevelopmentofbigdataextendsthescopeofhumanactivities.Itdemandsproperattentionfromacademia,industryandgovernment.Theworldhasbeencooperatingandintegratingonaglobalscale.Humanisenforcedtochangemodefromthelocaltotheglobalintheireverydaylifeandwork.Itredefinestherelationshipamongindividuals,businesses,organizations,gov

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論