PyFlink技術(shù)入門及實踐

上傳人：I*** IP屬地：上海上傳時間：2022-08-25 格式：PPTX 頁數(shù)：28 大?。?59.47KB 積分：25 舉報 版權(quán)申訴

已閱讀5頁，還剩23頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

1、Apache Flink 中文學習網(wǎng)站： PyFlink(Python Table API)技術(shù)入門及實踐Apache Flink Community China01Why Python Table API?02What is Python Table API?03How to write/submit a Python Table API job?04Demo 演示及講解CONTENT目錄 Apache Flink 中文學習網(wǎng)站： 01Why Python Table APIFlinks API 總覽Process Function (events, state, time)DataStre

2、am API (streams, windows)SQL / Table API (dynamic tables)Stream- & Batch DataProcessingHigh-level Analytics APIStateful Event- Driven ApplicationsLayered abstractions to navigate simple to complex use cases-Conciseness+Expressiveness-Why Table API?Table API/SQL VS DataStreamTable API VS SQL聲明式高性能流批統(tǒng)

3、一標準穩(wěn)定易理解Table API 使得多聲明式數(shù)據(jù)處理寫起來比較容易Table API 使得擴展標準SQL更為容易（當且僅當需要的時候）Table API易用性功能性SQLtab.groupBy(“word”).select(“word, count(1) as count”)Why Python?簡單易用可移植性豐富的庫Apache Flink 中文學習網(wǎng)站： 02What is Python Table APIPython Table API架構(gòu)Python Table API發(fā)展歷程2018v1.8.x2019.8v1.9.x2020.2v1.10.x2020.5v1.11.x2020

4、.9v1.12.x?DataSet/StreamTable API 不支持 Python兩套各自獨立的PythonAPIJPython 無法支持 Python3.xStateless UDFsPython Scalar UDFUDF依賴的管理Vectorized UDFsPandas UDFML APISQL DDL支持Python UDFTable APIPython Table APIJava UDX 支持Python Table API架構(gòu)Java Table APIFlink(DataStream/DataSet)Blink(Query Processor)Local Single J

5、VMPython Table APICluster Standalone, YARNJava VMPython VMSocketCloudGCE，EC2Python Table API架構(gòu)Python UDF 架構(gòu)12345Apache Flink 中文學習網(wǎng)站： 03How to write/submit a Python Table API jobApache Flink 中文學習網(wǎng)站： 3.1 WordCount示例Table API算子介紹Python Table API with UDFJob提交Word Countt_env.scan(mySource).group_by(word

6、) .select(word, count(1) ).insert_into(mySink)wordcountFlink2Hadoop1Spark1wordFlinkHadoopSparkFlinkWord CountQuery aTableGet aTableEmit aTableWord Count1. 初始化環(huán)境2. 使用TableDescriptor 定義一個table source3.使用TableDescriptor定義一個table sink4. Query a TableHow to query a Table - Table API算子Tableselectasfilterw

7、heregroupBydistinctGroupedTableGroupWindowedTableOverWindowedTable- selectTable- groupByWindowGroupedTable- select- selectjoin(inner, left, right, full)joinLateral(inner, left)minusminusAllunionunionAllwindow(OverWindow)window(GroupWindow)dropColumnsmap/flatmap- 常用算子 - Projection & FilterOperatorsEx

8、amplesSelectresult = orders.select(a, c as d) result = orders.select(*)Aliasresult = orders.alias(x, y, z, t)Where/Filterresult = orders.where(b = red) ORresult = orders.filter(a % 2 = 0)常用算子 - AggregationsOperatorsExamplesGroupBy Aggregationresult = orders.group_by(a).select(a, b.count as d)GroupBy

9、 WindowAggregationresult = orders.window(Tumble.over(5.minutes).on(rowtime).alias(w).group_by(a, w).select(a, w.start, w.end, w.rowtime, b.sum as d)Over WindowAggregationresult = orders.over_window(Over.partition_by(a).order_by(rowtime).preceding(UNBOUNDED_RANGE).following(CURRENT_RANGE).alias(w).se

10、lect(a, b.avg over w, b.max over w, b.min over w)常用算子 - JoinOperatorsExamplesInner Joinleft = table_env.scan(Source1).select(a, b, c)right = table_env.scan(Source2).select(d, e, f)result = left.join(right, a = d).select(a, b, e)Outer Joinleft = table_env.scan(Source1).select(a, b, c) right = table_e

11、nv.scan(Source2).select(d, e, f)left_outer_result = left.left_outer_join(right, a = d).select(a, b, e) right_outer_result = left.right_outer_join(right, a = d).select(a, b, e) full_outer_result = left.full_outer_join(right, a = d).select(a, b, e)Time-windowed Joinresult = left.join(right, a = d & lt

12、ime = rtime - 5.minutes & ltime requirements.txtpip download -d cached_dir -r requirements.txt -no-binary :all:# python codetable_env.set_python_requirements(requirements.txt, cached_dir)Python Table API Job提交Python Table API Job提交Apache Flink 中文學習網(wǎng)站： 04Demo 演示和講解Demo以2013年初紐約市的出租車乘坐信息為原始數(shù)據(jù)，演示如何對數(shù)據(jù)進行讀寫、過濾、統(tǒng)計、關(guān)聯(lián)(join)，以及如何使用Python UDF來實現(xiàn)處理邏輯1.讀寫外部存儲。從kafka讀取乘車原始數(shù)據(jù)并寫到kafka。2.數(shù)據(jù)過濾。篩選特定的

人人文庫> 全部分類> 專業(yè)文獻 > IT計算機

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責。
6. 下載文件中如有侵權(quán)或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

PyFlink技術(shù)入門及實踐

文檔簡介

溫馨提示

最新文檔

評論

PyFlink技術(shù)入門及實踐

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔