




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、數(shù)據(jù)采集與清洗,2019|02|15,什么是大數(shù)據(jù),大數(shù)據(jù)處理流程,大數(shù)據(jù)的主要特征,大數(shù)據(jù)采集的概念,大數(shù)據(jù)采集應(yīng)用,1,什么是大數(shù)據(jù),淘寶推薦,依據(jù)購物行為偏好引薦,依據(jù)你最近的閱讀行為和消費(fèi)行為進(jìn)行引薦,依據(jù)你用的設(shè)備往來不斷猜特征,依據(jù)時節(jié)改變進(jìn)行引薦,2014-03,2015-08,2017-10,2016-03,2018,大數(shù)據(jù)工作首先寫入政府工作報告,十三五規(guī)劃綱要提出實(shí)施國家大數(shù)據(jù)戰(zhàn)略,2018年政府工作報告提出:實(shí)施大數(shù)據(jù)發(fā)展行動,注重用互聯(lián)網(wǎng)、大數(shù)據(jù)等提升監(jiān)管效能,國務(wù)院發(fā)布促進(jìn)大數(shù)據(jù)發(fā)展的行動綱要,十九大提出推動大數(shù)據(jù)戰(zhàn)略,與實(shí)體經(jīng)濟(jì)深度融合,行業(yè)現(xiàn)狀與前景,2019年人
2、社部擬最新發(fā)布15項(xiàng)新職業(yè),1.大數(shù)據(jù)工程技術(shù)人員 2.云計算工程技術(shù)人員 3.人工智能工程技術(shù)人員 4.物聯(lián)網(wǎng)工程技術(shù)人員 5,什么是大數(shù)據(jù),大數(shù)據(jù)(Big Data)是指無法使用傳統(tǒng)和常用的軟件技術(shù)和工具在一定時間內(nèi)完成獲取、管理和處理的數(shù)據(jù)集,大數(shù)據(jù)的主要特征,大數(shù)據(jù)主要特征,Volume,Velocity,Variety,Veracity,真實(shí)性(Veracity),即追求高質(zhì)量的數(shù)據(jù),容量大(Volume),指大規(guī)模的數(shù)據(jù)量,并且數(shù)據(jù)量呈持續(xù)增長趨勢,速度快(Velocity),指的是數(shù)據(jù)被創(chuàng)建和移動的速度,種類多(Variety),指數(shù)據(jù)來自多種數(shù)據(jù)源,數(shù)據(jù)種類和格式,Value,
3、價值密度低(Value),指隨著數(shù)據(jù)量的增長,數(shù)據(jù)中有意義的信息卻沒有成相應(yīng)比例增長,3,大數(shù)據(jù)處理流程,大數(shù)據(jù)處理流程,數(shù)據(jù)預(yù)處理 就是將采集來的數(shù)據(jù)從多種數(shù)據(jù)庫導(dǎo)入到大型的分布式數(shù)據(jù)庫中(目前主要是hfds或hive),并同時做一些簡單的清洗和預(yù)處理工作,數(shù)據(jù)統(tǒng)計分析 就是對上面已經(jīng)完成的存儲在大型分布式數(shù)據(jù)庫中的數(shù)據(jù)進(jìn)行歸類統(tǒng)計,可以滿足一般場景的分析需求,數(shù)據(jù)挖掘 是對數(shù)據(jù)進(jìn)行基于各種算法的分析計算,從而起到預(yù)測的效果,實(shí)現(xiàn)一些高級別數(shù)據(jù)分析的需求,數(shù)據(jù)采集 就是利用多種數(shù)據(jù)庫(關(guān)系型,NOSQL)去存儲不同來源的數(shù)據(jù),數(shù)據(jù)展示 就是對以上處理完的結(jié)果進(jìn)行分析,或者形成報表,大數(shù)據(jù)采集
4、的概念,大數(shù)據(jù)采集的概念,3、大數(shù)據(jù)采集技術(shù)方法 大數(shù)據(jù)采集技術(shù)就是對數(shù)據(jù)進(jìn)行 ETL 操作,通過對數(shù)據(jù)進(jìn)行提取、轉(zhuǎn)換、加載,最終挖掘數(shù)據(jù)的潛在價值。ETL指的是Extract-Transform-Load,也就是抽取、轉(zhuǎn)換、加載。 抽取-從各種數(shù)據(jù)源獲取數(shù)據(jù) 轉(zhuǎn)換-按需求格式將源數(shù)據(jù)轉(zhuǎn)換為目標(biāo)數(shù)據(jù) 加載-把目標(biāo)數(shù)據(jù)加載到數(shù)據(jù)倉庫中,2、數(shù)據(jù)采集與大數(shù)據(jù)采集的區(qū)別 傳統(tǒng)數(shù)據(jù)采集:來源單一,數(shù)據(jù)量相當(dāng)??;結(jié)構(gòu)單一;關(guān)系數(shù)據(jù)庫和并行數(shù)據(jù)庫 大數(shù)據(jù)的數(shù)據(jù)采集:來源廣泛,數(shù)量巨大;數(shù)據(jù)類型豐富;分布式數(shù)據(jù)庫,1、什么是數(shù)據(jù)采集 數(shù)據(jù)采集就是數(shù)據(jù)獲取,數(shù)據(jù)源主要分為線上數(shù)據(jù)和內(nèi)容數(shù)據(jù),大數(shù)據(jù)采集系統(tǒng),1
5、.日志采集系統(tǒng)(Apache Flume、Scribe,3.數(shù)據(jù)庫采集系統(tǒng)(關(guān)系型、nosql等各種數(shù)據(jù)庫,2.網(wǎng)絡(luò)數(shù)據(jù)采集系統(tǒng)(Scrapy 框架、Apache Nutch,5,大數(shù)據(jù)采集應(yīng)用,技能準(zhǔn)備,Python基礎(chǔ),Linux操作系統(tǒng)基本操作,數(shù)據(jù)庫基礎(chǔ)(SQL語句操作,環(huán)境準(zhǔn)備,Python,Jdk(java環(huán)境,數(shù)據(jù)庫(mysql,Thanks,YOUR TITLE,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficul
6、t to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,YOUR TITLE,Nothing is difficult to the man who wi
7、ll try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to
8、 the man who will try,OKPPT工作室,YOUR TITLE,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man w
9、ho will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,YOUR TITLE,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing
10、 is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,YOUR TITLE,21,9,28,42,3,OKPPT工作室,YOUR
11、 TITLE,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to
12、 the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,YOUR TITLE,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will t
13、ry.Nothing is difficult to the man who will try,Nothing is difficult to the man who will try,Nothing is difficult to the man who will try,Nothing is difficult to the man who will try,Nothing is difficult to the man who will try,YOUR TITLE,Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try.Nothing is difficult to the man who will try,Nothing is difficult to
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 船舶保養(yǎng)考試題及答案
- 2025年軍隊文職人員招聘之軍隊文職法學(xué)考前沖刺試卷A卷含答案
- 小升初科學(xué)測試題及答案
- 2019-2025年消防設(shè)施操作員之消防設(shè)備基礎(chǔ)知識能力檢測試卷B卷附答案
- 2019-2025年消防設(shè)施操作員之消防設(shè)備基礎(chǔ)知識模考模擬試題(全優(yōu))
- 2019-2025年消防設(shè)施操作員之消防設(shè)備基礎(chǔ)知識基礎(chǔ)試題庫和答案要點(diǎn)
- 社保知識培訓(xùn)課件北京
- 語文小說文本解讀技巧訓(xùn)練教案:以小說圍城為例
- 辦公室人員基本信息表
- 寫作技巧大揭秘:高中語文作文指導(dǎo)課程教案
- 2025年共青科技職業(yè)學(xué)院單招職業(yè)適應(yīng)性測試題庫完整版
- 2025年上半年潛江市城市建設(shè)發(fā)展集團(tuán)招聘工作人員【52人】易考易錯模擬試題(共500題)試卷后附參考答案
- 統(tǒng)編版語文二年級下冊15古詩二首 《曉出凈慈寺送林子方》公開課一等獎創(chuàng)新教學(xué)設(shè)計
- 旅游電子商務(wù)(第2版) 課件全套 周春林 項(xiàng)目1-8 電子商務(wù)概述-旅游電子商務(wù)數(shù)據(jù)挖掘
- 2025年安徽警官職業(yè)學(xué)院單招職業(yè)適應(yīng)性測試題庫帶答案
- 廣東廣東省錢幣學(xué)會招聘筆試歷年參考題庫附帶答案詳解
- 2025年福建省中職《英語》學(xué)業(yè)水平考試核心考點(diǎn)試題庫500題(重點(diǎn))
- 【課件】自然環(huán)境課件-2024-2025學(xué)年七年級地理下冊人教版
- 2025年河北省職業(yè)院校技能大賽智能節(jié)水系統(tǒng)設(shè)計與安裝(高職組)考試題庫(含答案)
- 2025-2030年中國蒸發(fā)器冷凝器行業(yè)發(fā)展?fàn)顩r及前景趨勢分析報告
- 2024年江西環(huán)境工程職業(yè)學(xué)院高職單招語文歷年參考題庫含答案解析
評論
0/150
提交評論