版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
######專業(yè)英語結課論文學號:**********姓名:**********論文題目:TheRelationshipandDistinctionBetweenBigDataandDataMining任課教師:************專業(yè)名稱:計算機技術所屬學院:計算機科學與工程學院桂林電子科技大學研究生院**年*月*日TheRelationshipandDistinctionBetweenBigDataandDataMiningStudentID:*Name:*Adviser:*GuilinUniversityofElectronicTechnology**,*Abstract:Inthispaper,dataminingisdiscussedinthecontextofbigdata.Firstly,weelaboratethefactthatbigdataplaysaprimaryroleinattractingacademiccommunity,businessindustryandgovernments.Secondly,theadverseofbigdataisdiscussed,suchasmuchgarbage,heavypollutionanditsdifficultiesinutilization.Finally,wedissectthevalueinbigdata,expoundthetechniquestodiscoverknowledgefrombigdata,andinvestigatethetransformationfromknowledgeintodataintelligences.Keywords:bigdata;datamining;dataintelligenceIntroductionAsdatavolumescontinuetoincreaseexponentially,thedatatsunamicaneasilyoverwhelmtraditionalanalyticstoolsorplatformsdesignedtoingest,analyzeandreport.
Everyday,2.5quintillionbytesofdataarecreatedand90percentofthedataintheworldtodaywereproducedwithinthepasttwoyears[1].Thechallengewearefacingisnotonlyhowtostoreandmanagediversedatabutalsotoeffectivelyanalyzethedatatogaininsightknowledgetomakesmarterdecisions.
Currently,anumberofworkshavebeenpresented.
Theseresearchesintroducebigdata,miningandanalyzingfromdifferentaspects,suchasstatusquo,ideasorimplementations.
Forexample:introducesthe“LambdaArchitecture”whichprovidesageneralpurposeapproachtoimplementarbitraryfunctionsonmassivedatasetinrealtime;ascalabledeepanalyticsplatformhasbeenimplemented.Becauseofthecomplexity,thereisnosingletoolorone-size-fits-allsolutionfordeeplyminingandanalyzingthebigdata.Moreover,extractingvaluableknowledgefrommassivedatasetsrequiresfurtherstudies,experimentsaswellasscalableandsmartservices,programmingtoolsandapplicationsachieved.Theremainderofthispaperisstructuredasfollows.
Section2
elaboratethefactthatbigdataplaysaprimaryroleineveryfields.Thentheadverseofbigdataisdiscussedinsection3.Afteranalyzingthevalueofbigdata,weintroduces
therelated
knowledgeanddevelopmentofdataminingin
section5.In
Section6,theeffectivenessofdataminingisintroduced.Finally,theconclusionfollow.AboutbigdataBigdataiscomplexdatasetthathasthefollowingmaincharacteristics:Volume,Variety,VelocityandVeracity[2][3].
Thesemakeitdifficulttousetheexistingtoolstomanageandmanipulate.Inthesedata,bigdataspecificallyaccountsforthevastmajority.
Bigdataisthebasisofdataandsourceofwisdomforpeopletounderstandthereal-worldthroughtheinformationworld.
BigDataiscloselyrelatedtoapplications[4][5],andbigdataminingisitsprincipalapplication.2.1Fromunderstandingthereal-worldtocreatingtheinformationworldHumancivilizationisaprocessfromunderstandingthereal-worldtocreatingtheinformationworld,whichhasgonethroughthefollowingstages:preliminarysensingtheworld,helpingmemorybyinformation,recordedandinheritedbyinformation,exchangeandcommunicationbyinformationandunderstandingtheworldonceagainbyinformation.Initially,Humantakeadvantageofstonesandshellstocountaccordingtotheprincipleofone-to-one.AndtheytieknotsNotetohelpmemory.Later,Humanusesimplegraphics,drawnotes,andinheritmoreaccuratememorythroughtheirownemotionalprompted.Whenthegraphicsbecomebodyrelativelyfixedcommonsymbol,andassociatewiththewordsinthelanguage,itproducestexts.Textsabstractandgeneralizetheworld,promoteculturalunderstanding,andpreparethenecessaryfoundationforthedevelopmentofscience.Aimedatbreakingthroughtherestrictionswhichthewrittensymbolsdependonartificialcopyingorengraving,Humanusemachinesafterindustrialrevolutiontovolumemechanizedproduction,whichimprovestheefficiencyoftheculturaltransmission.Computercentershigh-speedcomputing,andspinsoffthesoftwarefromthehardware,contributingtothedisseminationofinformation“electronically”and“automatically”.Internetcentersnetwork,interrelatescomputers,breakinglocalinformationrestriction.Mobilecommunicationcentersusers,makingthemachinefollowsuser'smovementsandunboundshumanfromthemachine.InternetofThingscentersapplications,automaticallyidentifiesobjects,toenabletheinformationsharingbetweenthehumanandthings.Cloudcomputingcentersservicebyconsolidatingexpertiseandoptimizingtheallocationofresources.
Bigdatacentersdata,andminesknowledgeintheentiredata,breakingthesamplingrandomnessofthesample[6][7],anddemonstratingonbigdatacenterandmobileterminal.
Theseinformationtechnologiesservefortheunderstandingandtransformingoftherealworld.2.2BigdataisattractingmuchattentionAshumansexploretherealworldthroughscientificresearch,humansunravelthemysteriesintheinformationworldthroughbigdataanddatamining,whichareattractingmuchattentionfromacademia.InMay2011,McKinseypublished“Bigdata:thenextfrontierforinnovation,competition,andproductivity”,analyzedapplicationpotentialofbigdataindifferentindustriesfromtheeconomicandcommercialdimensions,spelledoutthedevelopmentpolicyfortheGovernmentandindustrydecisionmakersdealingwithbigdata.
InJanuary2012,the“WallStreetJournal”arguedthatbigdata,smartproductionandwirelessnetworktwillleadtoneweconomicprosperity[8].
InMarch2012,theUnitedStatesgovernmentreleased“BigDataResearchandDevelopmentInitiative”,whichrosesthedevelopmentandapplicationofbigdatafrombusinessconducttonationaldeploymentstrategicinordertoimprovetheabilitytoextractknowledgefromlargeandcomplexdata,tohelpsolvesomeofthenation'smostpressingchallenges.
InApril2012,“NatureBiotechnology”invitedeightbiologiststoevaluateanarticlewhichpublishedinDecember2011on“Science”titling“DetectingNovelAssociationsinLargeDataSets”inapapertitled“Findingcorrelationsinbigdata”.
InJuly2012,Gartnerreleasedthefirstdatasurveyreport“HypeCycleforBigData,2012”,whichthoughtdeeplyinbigdata[9].InChina[10],bigdataattractsasmuchattentionasitdoesaroundtheworld.BaiduusesHadooptodooff-lineprocessingsince2007.Currently,Baiduhasover10,000Hadoopservers,whichismorethanYahooandFacebook,anditplanstoreach20,000in2013.Intheseservers,80%Hadoopclustersareprocessing0totalof6TBdataeverydayonloganalysis.Tencent,TaobaoandAlipayarealsousingHadooptoestablishdatawarehouseandhandlebigdata.InApril2010,Taobaolaunchedadataminingplatform“datacube”,basedonanonehundredbillionleveldatabasenamedOceanBase,whichsupportsfor4to5milliontimesupdateoperation,includingover2billionrecords,containingmorethan2.5TBdatainoneday.InMay2010,ChinaMobileestablishedamassivedistributedsystemsandstructuredmassdatamanagementsystemonthecloud.Huaweianalyzesdatabasedonmobileterminalsandstoragemassivedatathroughthecloudtoobtainvaluableinformation.Alibabaanalyzesbusinesstransactiondatathroughbigdatatechnologytodocreditapproval.BigdatadisasterBigdataiscloselyrelatedtohumandailylife,permeatedallwalksoflife.Thenumber,sizeandcomplexityareallinsharpincreasing.
Alargeamountofdatahasbeenstoredinthedatabaseandwarehouseintypesoftext,graphics,imagesandmultimedia[11].
TheresearchfromInternationalDataCorporation
hasshownthat,asof2003humanshavecreatedatotalof5EBdata,whileintheyearof2011,theamountofdatathathadbeencopiedandproducedisexceeded1.8ZB.Itisexpectedthatby2020globaldatausagewillreach35.2ZB,whichneeds37.6billionharddrivesof1TBcapacitytostore.Ontheonehandthesedatabroadensthescopeofavailablebigdataavailableforhumantogainwisdom.Ontheotherhandthevalueofasingleunitofthedataisrapidlydeclining.Humanissubmergedbythedataoceanbutthirstyforknowledge.3.1GarbageBigdataisvoluminousanditgrowsquickly,butithasverylowdensityinvalue,whichmeansthereisalotofjunkdata[12].Thestudyontheelectron-positroncolliderhasbeenabletoshoot40millionpicturespersecond,butonlyafewthousandsareuseful.RomaniaInternetsecuritycompanyBitDefenderpointedoutthat
spamandfishinginformationinthesocialnetworkgamehasincreasedbymorethan50%.Comparedtootheronlinecommunicationenvironment,socialnetworkusersaremoreeasilytounknowinglyacceptandloadgarbageinformation.Bigdataandapplicationsarecloselyrelated,andprofessionallabelingofthedataisthebasicobjectiveofrationalanalysisandsoundjudgment.
Whetherscientificexperimentaldataorobservationdataneedtobelabeledbyexpertsinthefield.
AccordingtotheIDCstatistics,in2012only23%ofallinformationisuseful,ofwhichonly3%ofpotentiallyusefulinformationhadbeenlabeled,andtheproportionofdatawhichhadbeenanalyzedismuchless.Withthedevelopmentofmodernmeasuringtechniqueanddigitalrecordingmethod,inthefaceofhugeinformation,traditional,artificial,experienceeliminationandanalysismethodshavebecomepowerless.3.2ContaminationDatacollectedfromtherealworldiscontaminated.Moreover,asearlyas1992,theMassachusettsInstituteofTechnologyfoundthatdatacontaminationproblemsarenotisolated.Inthe50unitsandagenciesthataresampledforthesurvey,mostofthedataaccuracyislessthan95%.
Regardlessofaccesstos8atialdata,therearesomeinevitableproblemsorerrors[13][14],suchascontentsincomplete,precisionerror,dataredundancy,formatcontradictory,differenttype,structureuncertainties,differentscales,differentstandard,outdated,errorexception,dynamicchangeandlocalsparse.Moreovereachissuehasanumberofcauses.Forexample,thenoisecanbeperiodicnoise,stripenoise,isolatednoiseandrandomnoise.Further,thesedataareoftenaffectedbygrosserrors,systemerrorsandrandomerrorsindividuallyorcollaboratively.
Itisboundtodamagetheexpecteddataaccuracyifthesethreekindsoferrorcannotbecorrectlyfoundandeliminatedintheadjustment.3.3DifficulttouseDataisnotonlycontaminated,butalsodifficulttouse.Theproduction,transmission,replicationandaccumulationofdatahavegonefarbeyondpeople'scapacityforanalyzing,understandingandimplementing.Duetothelargeamountof“bigdata”,itisdifficulttocollect,store,search,share,analyzeandmaterialize.
Commercialimageprocessingsoftware(ERDAS,IMAGINE,PCI,ENVI,etc.)aredifficulttocompletethefollowingmission:mixpixel,imagematchautomatically,targetextractautomatically,andotherautomaticprocessingmissionbecausethelackofnewtheoriesandmethods.Anewspaperpublishedthesamearticleofthesameauthorontwodifferentpagesof“l(fā)egalcommunity”and“youthtopics”.AnothernewspaperpublishedthreearticlesintheEditionof“homeappliances”,“l(fā)ifestyle”and“scienceandtechnology”,alltocompareamongVCD,CVDandDVDonthesameday,andgotthreedifferentconclusions,buttheeditordidnotevenrealizeit.Overtime,allwalksoflifearesubmergedbycontaminateddatagarbage,andthenitcouldleadthebigdatainto“garbagein,garbageout”,andthe“bigdata”becomestheuseless“biggarbage”.Now,usefuldataisburied,andimpliedvalueisblankedinbigdata.
Onsuchapredicament,followingproblemsarethebottlenecksforbigdataresearchtobreakthrough:howtounderstandthespatialdata,howtoextractinformationfromthedata,howtoturndataintoknowledgecanbeavailable,andfinallyhowtorealizethevalueofdata.ThevalueofdataBigdataiscollectedfromnumerousandinterconnectedsources.Realusefulnessisitsmaximumvalue.Thegenerallyacceptedruleofbigdatais“decisionondata”.Thefirstprerequisiteistokeepdataalwaysusefulandactivated.Theultimatevalueofbigdataistogainhumanintelligence.4.1OverallcognitiveoriginalappearanceBigdataprovidesanunprecedentedopportunitytoobservetherealworldinafullviewratherthanpartialsamples.Withoutbigdata,probabilitystatisticscanonlybeproducedbasedonrandomsamplingfromtherealworld,becausespacedataisconstrainedbycollection,storage,computingandtransmission.Liketheproverbialblindmengraspinganelephantcanonlytakeapartforthewhole,thereisonlyalimitedview.Incompletedatasamplingandsampledatadispersionmakeitdifficulttounderstandtheoveralltrendsortonoticetheabnormalchanges.4.2BasicresourcesMcKinseybelievesthatdataisthebasicresource,andcanbecomparedwithphysicalassets,humancapital,createsignificantvaluefortheworldeconomy,improvetheproductivityandcompetitivenessoftheenterprisesandthepublicsector,andcreatealargenumberofeconomicsurplusforconsumers.In2011,theWorldEconomicForumcalledbigdataasnewwealth.In2012,theDavosForum“BigData,BigImpact”treateddataaseconomicassetlikecurrencyorgold.In2012,Gartnerbelievesthat“Bigdataisbigmoney”.
TheU.S.governmentconsidersbigdataas“newoil”relatedtothecountry'seconomicrestructuringandindustrialupgrading[3].DataminingDataminingreferstothebasictechnologiestorealizethevalueofbigdata,relocatedataassets,anduseiteffectively.Spatialdataminingcanbeusedtoextractinformationfromdata,mineknowledgefrominformation,extractdataintelligenceinknowledge,improvetheabilityofself-learning,self-feedbackadaptation,finallyrealizehuman-machineintelligence.5.1BasicbigdatatechnologyThebasictechniquesofbigdataincludedatacollection,storage,processing,expression,andqualityevaluation.Bigdatacanbegeneratedinmobiledevices,trackingsystems,radiofrequencyidentificationdevices(RFIDs),sensornetworks,socialnetworking,Internetsearch,automaticrecordingsystems,videoarchives,e-commerce,aswellastheprocessinanalyzingthosedata.Bigdatastoragetechnologyisthebasisfordatamining.Itisdesignedtomeetthegrowingneedfordatastorage,whichaimstoprovidescalability,highreliability,excellentperformancedatastorage,access,andmanagementsolution,suchasdistributeddatastorage,multiplelevelscaching,loadbalancing,fault-tolerantmechanisms.Conventionalmethodsarenotadequateforthesemissions.Itneedstoestablishalargeplatformfordatathroughsoftware,toprovideplacestostoreandinterfacetoaccess.
Bigdataprocessingistoimplementthetransitions:fromdatatoinformation,frominformationtoknowledgeandfromknowledgetowisdom.
Bigdataexpressiontechnologyisdesignedtorepresentthedatainaclearandeffectivewaythatrevealsmeaningfulinformationtotheuser,orprovidetheuserwithanewperspectiveofview.Bigdataexpressiontechnologyincludesdigitalelevationmodels,digitalterrainmodels,flatmaps,three-dimensionalmaps,anddigitalcitymaps.Bigdataqualityassessmenttechnologyisaimedtoavoidtheriskofbigdatacollectingandhigh-densitymeasuring.Thetechnologyincludeslogicalassessmentmethod,exceptionvaluebasedassessmentmethod,andaccountingbasedassessmentmethod.5.2DiscoveryknowledgeKnowledgediscoveryisthetechnologythatusesdataminingmethodtoextractpreviouslyunknown,potentiallyuseful,andultimatelycomprehensiblerules.Itisalsoaprocessofgradualsublimationfromdatatoinformation,andtoknowledge,step-by-step.Dataminingsystemsaimstomakedatagraduallysummarizedintoknowledge.Throughtheintegrationofdata,itcandeeplyextractknowledge.Byusingsuchnewknowledge,datacanbeprocessedinrealtimeinordertounderstandandapplythedata,tomakeintelligentjudgmentsandwell-informeddecisions.Knowledgecanbeself-learning,self-enhance,universal,andeasilyrecognized.Itcouldserveasabasisfordecisionsupport.Ifbusinessestakefulladvantageofknowledge,itwillbemorepreciseanddynamicforhumanstolearn,work,life,andachievewisdomstate.Itwillhelptoimproveresourceutilizationandproductivitylevel.Moreover,itwillalsohelptorespondtotheeconomiccrisis,theenergycrisis,thedeteriorationoftheenvironmentandmanyotherglobalissues.5.3ExtractiondataintelligenceDataintelligenceistheabilitytoobtainamoreinnovative,systematicandcomprehensiveknowledgetosolveaparticularproblemthroughanin-depthanalysisofthecollecteddata.Itisanabilitytounderstandandsolveproblemsfast,flexiblyandcorrectly.Spatialdataintelligenthasthreefeatures:morethoroughlyperception,moreextensiveinteroperability,anddeeperintelligence.
Thethreefeaturesareaimedtogetbiggerandmorecomprehensivedata,toshareandco-operatedataviatheInternet,tododataanalysisanddataminingbyvarietyofadvancedtechniques,andtoconstituteahierarchyofspatialdataintelligences(Fig1).Figure1.
ThehierarchyofspatialdataintelligencesBigdataintelligencedoesnotrefertosimpleoverlaydifferentdataminingtechniques,butareasonablestructureofindustry-orientedorganization,goodrunner,andpowerfulwisdomsystem.Themorereasonableindustrystructurebecome,thesmallerinternalfrictiongot,thegreatereffectivenessgot,andthehigherwisdomsystemgot.Everytimewhenapersoninteractingwiththedatahe/shebecomesmoreefficientandmoreproductive,whichmeansitformsabetterwaytoanalyze,summarize,andcalculate.Throughtheconsolidationandanalysisoftrans-regional,trans-sectordata,withknowledgeappliedinspecificindustry,specificscenesandspecificsolution,bigdataintelligencecansupportdecision-makingandactioninabetterway.
Morein-depthdataintelligenceistocreatenewvalueofdata.Ontheonehand,whenmakingfulluseofspatialdataknowledgeinallwalksoflife,itcanproducesecondaryknowledge.Inordertoformaminingmechanismtomineknowledgeinknowledge,itneedstobringprimaryknowledgetogethertoformanintelligentformofexpression.Ultimately,thedestinationknowledgecanbeachieved.Ontheotherhand,basedonageneralindustrialorsocio-ecologicalsystem,itcanredefinetheinteractivemodeofgovernment,companiesandindividuals,sothatitimprovestheinteractionclarity,efficiency,flexibilityandresponsespeed.Itchangesfromthetraditionalsingledimensionsuchas:productionconsumption,managementbemanagement,orplanningexecution,toanewmulti-dimensionalcollaborativerelationship.Inthisnewrelationship,bothindividualsandorganizationscanfreelycontributeandgetinformationandexpertiseaccuratelyandtimely.Thisnewrelationshipexertsapositiveinfluenceoneachothertoreachsmartrunningmacro-effects.EffectivenessWhenwepossessthenecessaryknowledgeandabilitytocontrolit,thedatabecomesourvaluableassetthatleadstomarketdominationandhugeeconomicreturns.
Bigdatatechnologyprovidersusetechnologyforusersprocessingstructured,semi-structuredandunstructureddata.BigdataapplicationsareincreasinglyInternetubiquitous,richinterfaced,andfragmented.Itisaverticalintegrationintheapplicationindustry,therefore,businessthatisclosertoend-users,tendstohavealargerinfluenceintheindustrychain.MorganStanley'sreportinsiststhat“BigDataissoontobecomeAnyData[15]”,Inordertowinthefuture,therationalchoiceisthat“givingcustomersthetechnologiestheyneedtostoreandanalyze‘any’dataset-anytypeofdata,anysizeofdata,foranytypeofuser,andinanytimeframe.”ConclusionThedevelopmentofbigdataextendsthescopeofhumanactivities.Itdemandsproperattentionfromacademia,industryandgovernment.Theworldhasbeencooperatingandintegratingonaglobalscale.Humanisenforcedtochangemodefromthelocaltotheglobalintheireverydaylifeandwork.Itredefinestherelationshipamongindividuals,businesses,organizations,gov
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2024-2030年中國茶籽油市場深度調查研究報告
- 2024-2030年中國蘋果酸氯波必利行業(yè)應用潛力與盈利前景預測研究報告
- 2024-2030年中國花椒粉市場競爭戰(zhàn)略規(guī)劃與供需平衡預測研究報告
- 2024-2030年中國芝麻醬市場投資商機及營銷推廣模式建議研究報告
- 2024-2030年中國藝術陶瓷行業(yè)競爭格局與消費動態(tài)分析報告
- 2024-2030年中國舞臺激光燈行業(yè)發(fā)展分析及競爭策略與趨勢預測研究報告
- 2024-2030年中國自動化負載均衡器行業(yè)需求動態(tài)與投資前景預測報告
- 2024-2030年中國胃蛋白酶行業(yè)市場發(fā)展趨勢與前景展望戰(zhàn)略分析報告
- 2024-2030年中國腸道微生物組行業(yè)市場發(fā)展趨勢與前景展望戰(zhàn)略分析報告
- 濰坊第一中學2025屆高三寒假測試二化學試題含解析
- 【課件】“珍惜糧食從我做起”課件-2024-2025學年高中上學期世界糧食日主題班會
- 點亮文明 課件 2024-2025學年蘇少版(2024)初中美術七年級上冊
- 新質生產力-講解課件
- 中國非遺文化儺戲詳細介紹課件
- 醫(yī)院住院醫(yī)師規(guī)范化培訓師資管理辦法
- 天寶山風景區(qū)旅游總體規(guī)劃
- 自然拼讀法教案
- 2018年1120各部工作手冊范本模板
- 公司交通費用管理規(guī)定
- 納米刀治療惡性腫瘤
- NB∕T 32004-2018 光伏并網(wǎng)逆變器技術規(guī)范
評論
0/150
提交評論