尚大數(shù)據(jù)技術(shù)之kafka

上傳人：洞*** IP屬地：北京上傳時(shí)間：2022-09-29 格式：DOCX 頁數(shù)：49 大?。?MB 積分：12 舉報(bào) 版權(quán)申訴

已閱讀5頁，還剩44頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、尚硅谷大數(shù)據(jù)技術(shù)之Kafka【更多Java、HTML5、Android、python、大數(shù)據(jù) 資料下載，可訪問尚硅谷（中國(guó)）官網(wǎng) HYPERLINK / 下載區(qū)】尚硅谷大數(shù)據(jù)技術(shù)之Kafka(作者：大海哥)官網(wǎng)： HYPERLINK 版本：V1.0一 Kafka概述1.1 Kafka是什么在流式計(jì)算中，Kafka一般用來緩存數(shù)據(jù)，Storm通過消費(fèi)Kafka的數(shù)據(jù)進(jìn)行計(jì)算。1）Apache Kafka是一個(gè)開源消息系統(tǒng)，由Scala寫成。是由Apache軟件基金會(huì)開發(fā)的一個(gè)開源消息系統(tǒng)項(xiàng)目。2）Kafka最初是由LinkedIn公司開發(fā)，并于2011年初開源。2012年10月從Apache

2、Incubator畢業(yè)。該項(xiàng)目的目標(biāo)是為處理實(shí)時(shí)數(shù)據(jù)提供一個(gè)統(tǒng)一、高通量、低等待的平臺(tái)。3）Kafka是一個(gè)分布式消息隊(duì)列。Kafka對(duì)消息保存時(shí)根據(jù)Topic進(jìn)行歸類，發(fā)送消息者稱為Producer，消息接受者稱為Consumer，此外kafka集群有多個(gè)kafka實(shí)例組成，每個(gè)實(shí)例(server)成為broker。4）無論是kafka集群，還是producer和consumer都依賴于zookeeper集群保存一些meta信息，來保證系統(tǒng)可用性。1.2 消息隊(duì)列內(nèi)部實(shí)現(xiàn)原理（1）點(diǎn)對(duì)點(diǎn)模式（一對(duì)一，消費(fèi)者主動(dòng)拉取數(shù)據(jù)，消息收到后消息清除）點(diǎn)對(duì)點(diǎn)模型通常是一個(gè)基于拉取或者輪詢的消息傳送模型，

3、這種模型從隊(duì)列中請(qǐng)求信息，而不是將消息推送到客戶端。這個(gè)模型的特點(diǎn)是發(fā)送到隊(duì)列的消息被一個(gè)且只有一個(gè)接收者接收處理，即使有多個(gè)消息監(jiān)聽者也是如此。（2）發(fā)布/訂閱模式（一對(duì)多，數(shù)據(jù)生產(chǎn)后，推送給所有訂閱者）發(fā)布訂閱模型則是一個(gè)基于推送的消息傳送模型。發(fā)布訂閱模型可以有多種不同的訂閱者，臨時(shí)訂閱者只在主動(dòng)監(jiān)聽主題時(shí)才接收消息，而持久訂閱者則監(jiān)聽主題的所有消息，即使當(dāng)前訂閱者不可用，處于離線狀態(tài)。1.3 為什么需要消息隊(duì)列1）解耦：允許你獨(dú)立的擴(kuò)展或修改兩邊的處理過程，只要確保它們遵守同樣的接口約束。2）冗余：消息隊(duì)列把數(shù)據(jù)進(jìn)行持久化直到它們已經(jīng)被完全處理，通過這一方式規(guī)避了數(shù)據(jù)丟失風(fēng)險(xiǎn)。許多消息

4、隊(duì)列所采用的插入-獲取-刪除范式中，在把一個(gè)消息從隊(duì)列中刪除之前，需要你的處理系統(tǒng)明確的指出該消息已經(jīng)被處理完畢，從而確保你的數(shù)據(jù)被安全的保存直到你使用完畢。3）擴(kuò)展性：因?yàn)橄㈥?duì)列解耦了你的處理過程，所以增大消息入隊(duì)和處理的頻率是很容易的，只要另外增加處理過程即可。4）靈活性 & 峰值處理能力：在訪問量劇增的情況下，應(yīng)用仍然需要繼續(xù)發(fā)揮作用，但是這樣的突發(fā)流量并不常見。如果為以能處理這類峰值訪問為標(biāo)準(zhǔn)來投入資源隨時(shí)待命無疑是巨大的浪費(fèi)。使用消息隊(duì)列能夠使關(guān)鍵組件頂住突發(fā)的訪問壓力，而不會(huì)因?yàn)橥话l(fā)的超負(fù)荷的請(qǐng)求而完全崩潰。5）可恢復(fù)性：系統(tǒng)的一部分組件失效時(shí)，不會(huì)影響到整個(gè)系統(tǒng)。消息隊(duì)列降低了

5、進(jìn)程間的耦合度，所以即使一個(gè)處理消息的進(jìn)程掛掉，加入隊(duì)列中的消息仍然可以在系統(tǒng)恢復(fù)后被處理。6）順序保證：在大多使用場(chǎng)景下，數(shù)據(jù)處理的順序都很重要。大部分消息隊(duì)列本來就是排序的，并且能保證數(shù)據(jù)會(huì)按照特定的順序來處理。（Kafka保證一個(gè)Partition內(nèi)的消息的有序性）7）緩沖：有助于控制和優(yōu)化數(shù)據(jù)流經(jīng)過系統(tǒng)的速度，解決生產(chǎn)消息和消費(fèi)消息的處理速度不一致的情況。8）異步通信：很多時(shí)候，用戶不想也不需要立即處理消息。消息隊(duì)列提供了異步處理機(jī)制，允許用戶把一個(gè)消息放入隊(duì)列，但并不立即處理它。想向隊(duì)列中放入多少消息就放多少，然后在需要的時(shí)候再去處理它們。1.4 Kafka架構(gòu)1）Producer

6、：消息生產(chǎn)者，就是向kafka broker發(fā)消息的客戶端。2）Consumer ：消息消費(fèi)者，向kafka broker取消息的客戶端3）Topic ：可以理解為一個(gè)隊(duì)列。4） Consumer Group （CG）：這是kafka用來實(shí)現(xiàn)一個(gè)topic消息的廣播（發(fā)給所有的consumer）和單播（發(fā)給任意一個(gè)consumer）的手段。一個(gè)topic可以有多個(gè)CG。topic的消息會(huì)復(fù)制（不是真的復(fù)制，是概念上的）到所有的CG，但每個(gè)partion只會(huì)把消息發(fā)給該CG中的一個(gè)consumer。如果需要實(shí)現(xiàn)廣播，只要每個(gè)consumer有一個(gè)獨(dú)立的CG就可以了。要實(shí)現(xiàn)單播只要所有的consu

7、mer在同一個(gè)CG。用CG還可以將consumer進(jìn)行自由的分組而不需要多次發(fā)送消息到不同的topic。5）Broker ：一臺(tái)kafka服務(wù)器就是一個(gè)broker。一個(gè)集群由多個(gè)broker組成。一個(gè)broker可以容納多個(gè)topic。6）Partition：為了實(shí)現(xiàn)擴(kuò)展性，一個(gè)非常大的topic可以分布到多個(gè)broker（即服務(wù)器）上，一個(gè)topic可以分為多個(gè)partition，每個(gè)partition是一個(gè)有序的隊(duì)列。partition中的每條消息都會(huì)被分配一個(gè)有序的id（offset）。kafka只保證按一個(gè)partition中的順序?qū)⑾l(fā)給consumer，不保證一個(gè)topic的整

8、體（多個(gè)partition間）的順序。7）Offset：kafka的存儲(chǔ)文件都是按照offset.kafka來命名，用offset做名字的好處是方便查找。例如你想找位于2049的位置，只要找到2048.kafka的文件即可。當(dāng)然the first offset就是00000000000.kafka二 Kafka集群部署2.1 環(huán)境準(zhǔn)備2.1.1 集群規(guī)劃hadoop102hadoop103hadoop104zkzkzkkafkakafkakafka2.1.2 jar包下載 HYPERLINK /downloads.html /downloads.html2.1.3 虛擬機(jī)準(zhǔn)備1）準(zhǔn)備3臺(tái)虛擬機(jī)

9、2）配置ip地址 3）配置主機(jī)名稱4）3臺(tái)主機(jī)分別關(guān)閉防火墻roothadoop102 atguigu# chkconfig iptables offroothadoop103 atguigu# chkconfig iptables offroothadoop104 atguigu# chkconfig iptables off2.1.4 安裝jdk2.1.5 安裝Zookeeper0）集群規(guī)劃在hadoop102、hadoop103和hadoop104三個(gè)節(jié)點(diǎn)上部署Zookeeper。1）解壓安裝（1）解壓zookeeper安裝包到/opt/module/目錄下atguiguhadoop10

10、2 software$ tar -zxvf zookeeper-3.4.10.tar.gz -C /opt/module/（2）在/opt/module/zookeeper-3.4.10/這個(gè)目錄下創(chuàng)建zkDatamkdir -p zkData（3）重命名/opt/module/zookeeper-3.4.10/conf這個(gè)目錄下的zoo_sample.cfg為zoo.cfgmv zoo_sample.cfg zoo.cfg2）配置zoo.cfg文件（1）具體配置dataDir=/opt/module/zookeeper-3.4.10/zkData增加如下配置#cluster#server.2

11、=hadoop102:2888:3888server.3=hadoop103:2888:3888server.4=hadoop104:2888:3888（2）配置參數(shù)解讀Server.A=B:C:D。A是一個(gè)數(shù)字，表示這個(gè)是第幾號(hào)服務(wù)器；B是這個(gè)服務(wù)器的ip地址；C是這個(gè)服務(wù)器與集群中的Leader服務(wù)器交換信息的端口；D是萬一集群中的Leader服務(wù)器掛了，需要一個(gè)端口來重新進(jìn)行選舉，選出一個(gè)新的Leader，而這個(gè)端口就是用來執(zhí)行選舉時(shí)服務(wù)器相互通信的端口。集群模式下配置一個(gè)文件myid，這個(gè)文件在dataDir目錄下，這個(gè)文件里面有一個(gè)數(shù)據(jù)就是A的值，Zookeeper啟動(dòng)時(shí)讀取此文件，

12、拿到里面的數(shù)據(jù)與zoo.cfg里面的配置信息比較從而判斷到底是哪個(gè)server。3）集群操作（1）在/opt/module/zookeeper-3.4.10/zkData目錄下創(chuàng)建一個(gè)myid的文件touch myid添加myid文件，注意一定要在linux里面創(chuàng)建，在notepad+里面很可能亂碼（2）編輯myid文件vi myid在文件中添加與server對(duì)應(yīng)的編號(hào)：如2（3）拷貝配置好的zookeeper到其他機(jī)器上scp -r zookeeper-3.4.10/ HYPERLINK mailto:root:/opt/app/ root:/opt/app/scp -r zookeeper

13、-3.4.10/ HYPERLINK mailto:root:/opt/app/ root:/opt/app/并分別修改myid文件中內(nèi)容為3、4（4）分別啟動(dòng)zookeeperroothadoop102 zookeeper-3.4.10# bin/zkServer.sh startroothadoop103 zookeeper-3.4.10# bin/zkServer.sh startroothadoop104 zookeeper-3.4.10# bin/zkServer.sh start（5）查看狀態(tài)roothadoop102 zookeeper-3.4.10# bin/zkServer.

14、sh statusJMX enabled by defaultUsing config: /opt/module/zookeeper-3.4.10/bin/./conf/zoo.cfgMode: followerroothadoop103 zookeeper-3.4.10# bin/zkServer.sh statusJMX enabled by defaultUsing config: /opt/module/zookeeper-3.4.10/bin/./conf/zoo.cfgMode: leaderroothadoop104 zookeeper-3.4.5# bin/zkServer.s

15、h statusJMX enabled by defaultUsing config: /opt/module/zookeeper-3.4.10/bin/./conf/zoo.cfgMode: follower2.2 Kafka集群部署 1）解壓安裝包atguiguhadoop102 software$ tar -zxvf kafka_2.11-.tgz -C /opt/module/2）修改解壓后的文件名稱atguiguhadoop102 module$ mv kafka_2.11-/ kafka3）在/opt/module/kafka目錄下創(chuàng)建logs文件夾atguiguhadoop102

16、 kafka$ mkdir logs4）修改配置文件atguiguhadoop102 kafka$ cd config/atguiguhadoop102 config$ vi perties輸入以下內(nèi)容：#broker的全局唯一編號(hào)，不能重復(fù)broker.id=0#刪除topic功能使能delete.topic.enable=true#處理網(wǎng)絡(luò)請(qǐng)求的線程數(shù)量work.threads=3#用來處理磁盤IO的現(xiàn)成數(shù)量num.io.threads=8#發(fā)送套接字的緩沖區(qū)大小socket.send.buffer.bytes=102400#接收套接字的緩沖區(qū)大小socket.receive.buffer

17、.bytes=102400#請(qǐng)求套接字的緩沖區(qū)大小socket.request.max.bytes=104857600#kafka運(yùn)行日志存放的路徑log.dirs=/opt/module/kafka/logs#topic在當(dāng)前broker上的分區(qū)個(gè)數(shù)num.partitions=1#用來恢復(fù)和清理data下數(shù)據(jù)的線程數(shù)量num.recovery.threads.per.data.dir=1#segment文件保留的最長(zhǎng)時(shí)間，超時(shí)將被刪除log.retention.hours=168#配置連接Zookeeper集群地址zookeeper.connect=hadoop102:2181,hadoo

18、p103:2181,hadoop104:21815）配置環(huán)境變量roothadoop102 module# vi /etc/profile#KAFKA_HOMEexport KAFKA_HOME=/opt/module/kafkaexport PATH=$PATH:$KAFKA_HOME/binroothadoop102 module# source /etc/profile6）分發(fā)安裝包roothadoop102 etc# xsync profileatguiguhadoop102 module$ xsync kafka/7）分別在hadoop103和hadoop104上修改配置文件/opt

19、/module/kafka/config/perties中的broker.id=1、broker.id=2注：broker.id不得重復(fù)8）啟動(dòng)集群依次在hadoop102、hadoop103、hadoop104節(jié)點(diǎn)上啟動(dòng)kafkaatguiguhadoop102 kafka$ bin/kafka-server-start.sh config/perties &atguiguhadoop103 kafka$ bin/kafka-server-start.sh config/perties &atguiguhadoop104 kafka$ bin/kafka-server-start.sh co

20、nfig/perties &9）關(guān)閉集群atguiguhadoop102 kafka$ bin/kafka-server-stop.sh stopatguiguhadoop103 kafka$ bin/kafka-server-stop.sh stopatguiguhadoop104 kafka$ bin/kafka-server-stop.sh stop2.3 Kafka命令行操作1）查看當(dāng)前服務(wù)器中的所有topicatguiguhadoop102 kafka$ bin/kafka-topics.sh -zookeeper hadoop102:2181 -list2）創(chuàng)建topicatgui

21、guhadoop102 kafka$ bin/kafka-topics.sh -zookeeper hadoop102:2181 -create -replication-factor 3 -partitions 1 -topic first選項(xiàng)說明：-topic 定義topic名-replication-factor 定義副本數(shù)-partitions 定義分區(qū)數(shù)3）刪除topicatguiguhadoop102 kafka$ bin/kafka-topics.sh -zookeeper hadoop102:2181 -delete -topic first需要perties中設(shè)置delete

22、.topic.enable=true否則只是標(biāo)記刪除或者直接重啟。4）發(fā)送消息atguiguhadoop102 kafka$ bin/kafka-console-producer.sh -broker-list hadoop102:9092 -topic firsthello worldatguigu atguigu5）消費(fèi)消息atguiguhadoop103 kafka$ bin/kafka-console-consumer.sh -zookeeper hadoop102:2181 -from-beginning -topic first-from-beginning：會(huì)把first主題中以

23、往所有的數(shù)據(jù)都讀取出來。根據(jù)業(yè)務(wù)場(chǎng)景選擇是否增加該配置。6）查看某個(gè)Topic的詳情atguiguhadoop102 kafka$ bin/kafka-topics.sh -zookeeper hadoop102:2181 -describe -topic first 2.4 Kafka配置信息2.4.1 Broker配置信息屬性默認(rèn)值描述broker.id必填參數(shù)，broker的唯一標(biāo)識(shí)log.dirs/tmp/kafka-logsKafka數(shù)據(jù)存放的目錄?？梢灾付ǘ鄠€(gè)目錄，中間用逗號(hào)分隔，當(dāng)新partition被創(chuàng)建的時(shí)會(huì)被存放到當(dāng)前存放partition最少的目錄。port9092Bro

24、kerServer接受客戶端連接的端口號(hào)zookeeper.connectnullZookeeper的連接串，格式為：hostname1:port1,hostname2:port2,hostname3:port3?？梢蕴钜粋€(gè)或多個(gè)，為了提高可靠性，建議都填上。注意，此配置允許我們指定一個(gè)zookeeper路徑來存放此kafka集群的所有數(shù)據(jù)，為了與其他應(yīng)用集群區(qū)分開，建議在此配置中指定本集群存放目錄，格式為：hostname1:port1,hostname2:port2,hostname3:port3/chroot/path 。需要注意的是，消費(fèi)者的參數(shù)要和此參數(shù)一致。message.max.

25、bytes1000000服務(wù)器可以接收到的最大的消息大小。注意此參數(shù)要和consumer的maximum.message.size大小一致，否則會(huì)因?yàn)樯a(chǎn)者生產(chǎn)的消息太大導(dǎo)致消費(fèi)者無法消費(fèi)。num.io.threads8服務(wù)器用來執(zhí)行讀寫請(qǐng)求的IO線程數(shù)，此參數(shù)的數(shù)量至少要等于服務(wù)器上磁盤的數(shù)量。queued.max.requests500I/O線程可以處理請(qǐng)求的隊(duì)列大小，若實(shí)際請(qǐng)求數(shù)超過此大小，網(wǎng)絡(luò)線程將停止接收新的請(qǐng)求。socket.send.buffer.bytes100 * 1024The SO_SNDBUFF buffer the server prefers for socket

26、connections.socket.receive.buffer.bytes100 * 1024The SO_RCVBUFF buffer the server prefers for socket connections.socket.request.max.bytes100 * 1024 * 1024服務(wù)器允許請(qǐng)求的最大值，用來防止內(nèi)存溢出，其值應(yīng)該小于 Java heap size.num.partitions1默認(rèn)partition數(shù)量，如果topic在創(chuàng)建時(shí)沒有指定partition數(shù)量，默認(rèn)使用此值，建議改為5log.segment.bytes1024 * 1024 * 102

27、4Segment文件的大小，超過此值將會(huì)自動(dòng)新建一個(gè)segment，此值可以被topic級(jí)別的參數(shù)覆蓋。log.roll.ms,hours24 * 7 hours新建segment文件的時(shí)間，此值可以被topic級(jí)別的參數(shù)覆蓋。log.retention.ms,minutes,hours7 daysKafka segment log的保存周期，保存周期超過此時(shí)間日志就會(huì)被刪除。此參數(shù)可以被topic級(jí)別參數(shù)覆蓋。數(shù)據(jù)量大時(shí)，建議減小此值。log.retention.bytes-1每個(gè)partition的最大容量，若數(shù)據(jù)量超過此值，partition數(shù)據(jù)將會(huì)被刪除。注意這個(gè)參數(shù)控制的是每個(gè)par

28、tition而不是topic。此參數(shù)可以被log級(jí)別參數(shù)覆蓋。erval.ms5 minutes刪除策略的檢查周期auto.create.topics.enabletrue自動(dòng)創(chuàng)建topic參數(shù)，建議此值設(shè)置為false，嚴(yán)格控制topic管理，防止生產(chǎn)者錯(cuò)寫topic。default.replication.factor1默認(rèn)副本數(shù)量，建議改為2。replica.lag.time.max.ms10000在此窗口時(shí)間內(nèi)沒有收到follower的fetch請(qǐng)求，leader會(huì)將其從ISR(in-sync replicas)中移除。replica.lag.max.messages4000如果rep

29、lica節(jié)點(diǎn)落后leader節(jié)點(diǎn)此值大小的消息數(shù)量，leader節(jié)點(diǎn)就會(huì)將其從ISR中移除。replica.socket.timeout.ms30 * 1000replica向leader發(fā)送請(qǐng)求的超時(shí)時(shí)間。replica.socket.receive.buffer.bytes64 * 1024The socket receive buffer for network requests to the leader for replicating data.replica.fetch.max.bytes1024 * 1024The number of byes of messages to at

30、tempt to fetch for each partition in the fetch requests the replicas send to the leader.replica.fetch.wait.max.ms500The maximum amount of time to wait time for data to arrive on the leader in the fetch requests sent by the replicas to the leader.num.replica.fetchers1Number of threads used to replica

31、te messages from leaders. Increasing this value can increase the degree of I/O parallelism in the follower erval.requests1000The purge interval (in number of requests) of the fetch request purgatory.zookeeper.session.timeout.ms6000ZooKeeper session 超時(shí)時(shí)間。如果在此時(shí)間內(nèi)server沒有向zookeeper發(fā)送心跳，zookeeper就會(huì)認(rèn)為此節(jié)點(diǎn)

32、已掛掉。此值太低導(dǎo)致節(jié)點(diǎn)容易被標(biāo)記死亡；若太高，.會(huì)導(dǎo)致太遲發(fā)現(xiàn)節(jié)點(diǎn)死亡。zookeeper.connection.timeout.ms6000客戶端連接zookeeper的超時(shí)時(shí)間。zookeeper.sync.time.ms2000H ZK follower落后 ZK leader的時(shí)間。controlled.shutdown.enabletrue允許broker shutdown。如果啟用，broker在關(guān)閉自己之前會(huì)把它上面的所有l(wèi)eaders轉(zhuǎn)移到其它brokers上，建議啟用，增加集群穩(wěn)定性。auto.leader.rebalance.enabletrueIf this is e

33、nabled the controller will automatically try to balance leadership for partitions among the brokers by periodically returning leadership to the “preferred” replica for each partition if it is available.leader.imbalance.per.broker.percentage10The percentage of leader imbalance allowed per broker. The

34、 controller will rebalance leadership if this ratio goes above the configured value per erval.seconds300The frequency with which to check for leader imbalance.offset.metadata.max.bytes4096The maximum amount of metadata to allow clients to save with their offsets.connections.max.idle.ms600000Idle con

35、nections timeout: the server socket processor threads close the connections that idle more than this.num.recovery.threads.per.data.dir1The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.unclean.leader.election.enabletrueIndicates whether to enabl

36、e replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss.delete.topic.enablefalse啟用deletetopic參數(shù)，建議設(shè)置為true。offsets.topic.num.partitions50The number of partitions for the offset commit topic. Since changing this after deployment is currently

37、 unsupported, we recommend using a higher setting for production (e.g., 100-200).offsets.topic.retention.minutes1440Offsets that are older than this age will be marked for deletion. The actual purge will occur when the log cleaner compacts the offsets erval.ms600000The frequency at which the offset

38、manager checks for stale offsets.offsets.topic.replication.factor3The replication factor for the offset commit topic. A higher setting (e.g., three or four) is recommended in order to ensure higher availability. If the offsets topic is created when fewer brokers than the replication factor then the

39、offsets topic will be created with fewer replicas.offsets.topic.segment.bytes104857600Segment size for the offsets topic. Since it uses a compacted topic, this should be kept relatively low in order to facilitate faster log compaction and loads.offsets.load.buffer.size5242880An offset load occurs wh

40、en a broker becomes the offset manager for a set of consumer groups (i.e., when it becomes a leader for an offsets topic partition). This setting corresponds to the batch size (in bytes) to use when reading from the offsets segments when loading offsets into the offset managers mit.required.acks-1Th

41、e number of acknowledgements that are required before the offset commit can be accepted. This is similar to the producers acknowledgement setting. In general, the default should not be mit.timeout.ms5000The offset commit will be delayed until this timeout or the required number of replicas have rece

42、ived the offset commit. This is similar to the producer request timeout.2.4.2 Producer配置信息屬性默認(rèn)值描述metadata.broker.list啟動(dòng)時(shí)producer查詢brokers的列表，可以是集群中所有brokers的一個(gè)子集。注意，這個(gè)參數(shù)只是用來獲取topic的元信息用，producer會(huì)從元信息中挑選合適的broker并與之建立socket連接。格式是：host1:port1,host2:port2。request.required.acks0參見3.2節(jié)介紹request.timeout.m

43、s10000Broker等待ack的超時(shí)時(shí)間，若等待時(shí)間超過此值，會(huì)返回客戶端錯(cuò)誤信息。producer.typesync同步異步模式。async表示異步，sync表示同步。如果設(shè)置成異步模式，可以允許生產(chǎn)者以batch的形式push數(shù)據(jù)，這樣會(huì)極大的提高broker性能，推薦設(shè)置為異步。serializer.classkafka.serializer.DefaultEncoder序列號(hào)類，.默認(rèn)序列化成 byte 。key.serializer.classKey的序列化類，默認(rèn)同上。ducer.DefaultPartitionerPartition類，默認(rèn)對(duì)key進(jìn)行pression.cod

44、ecnone指定producer消息的壓縮格式，可選參數(shù)為： “none”, “gzip” and “snappy”。關(guān)于壓縮參見4.1節(jié)compressed.topicsnull啟用壓縮的topic名稱。若上面參數(shù)選擇了一個(gè)壓縮格式，那么壓縮僅對(duì)本參數(shù)指定的topic有效，若本參數(shù)為空，則對(duì)所有topic有效。message.send.max.retries3Producer發(fā)送失敗時(shí)重試次數(shù)。若網(wǎng)絡(luò)出現(xiàn)問題，可能會(huì)導(dǎo)致不斷重試。retry.backoff.ms100Before each retry, the producer refreshes the metadata of relev

45、ant topics to see if a new leader has been elected. Since leader election takes a bit of time, this property specifies the amount of time that the producer waits before refreshing the erval.ms600 * 1000The producer generally refreshes the topic metadata from brokers when there is a failure (partitio

46、n missing, leader not available). It will also poll regularly (default: every 10min so 600000ms). If you set this to a negative value, metadata will only get refreshed on failure. If you set this to zero, the metadata will get refreshed after each message sent (not recommended). Important note: the

47、refresh happen only AFTER the message is sent, so if the producer never sends a message the metadata is never refreshedqueue.buffering.max.ms5000啟用異步模式時(shí)，producer緩存消息的時(shí)間。比如我們?cè)O(shè)置成1000時(shí)，它會(huì)緩存1秒的數(shù)據(jù)再一次發(fā)送出去，這樣可以極大的增加broker吞吐量，但也會(huì)造成時(shí)效性的降低。queue.buffering.max.messages10000采用異步模式時(shí)producer buffer 隊(duì)列里最大緩存的消息數(shù)量，如

48、果超過這個(gè)數(shù)值，producer就會(huì)阻塞或者丟掉消息。queue.enqueue.timeout.ms-1當(dāng)達(dá)到上面參數(shù)值時(shí)producer阻塞等待的時(shí)間。如果值設(shè)置為0，buffer隊(duì)列滿時(shí)producer不會(huì)阻塞，消息直接被丟掉。若值設(shè)置為-1，producer會(huì)被阻塞，不會(huì)丟消息。batch.num.messages200采用異步模式時(shí)，一個(gè)batch緩存的消息數(shù)量。達(dá)到這個(gè)數(shù)量值時(shí)producer才會(huì)發(fā)送消息。send.buffer.bytes100 * 1024Socket write buffer sizeclient.id“”The client id is a user-spe

49、cified string sent in each request to help trace calls. It should logically identify the application making the request.2.4.3 Consumer配置信息屬性默認(rèn)值描述group.idConsumer的組ID，相同goup.id的consumer屬于同一個(gè)組。zookeeper.connectConsumer的zookeeper連接串，要和broker的配置一致。consumer.idnull如果不設(shè)置會(huì)自動(dòng)生成。socket.timeout.ms30 * 1000網(wǎng)絡(luò)請(qǐng)求

50、的socket超時(shí)時(shí)間。實(shí)際超時(shí)時(shí)間由max.fetch.wait + socket.timeout.ms 確定。socket.receive.buffer.bytes64 * 1024The socket receive buffer for network requests.fetch.message.max.bytes1024 * 1024查詢topic-partition時(shí)允許的最大消息大小。consumer會(huì)為每個(gè)partition緩存此大小的消息到內(nèi)存，因此，這個(gè)參數(shù)可以控制consumer的內(nèi)存使用量。這個(gè)值應(yīng)該至少比server允許的最大消息大小大，以免producer發(fā)送的消

51、息大于consumer允許的消息。num.consumer.fetchers1The number fetcher threads used to fetch mit.enabletrue如果此值設(shè)置為true，consumer會(huì)周期性的把當(dāng)前消費(fèi)的offset值保存到zookeeper。當(dāng)consumer失敗重啟之后將會(huì)使用此值作為新開始消費(fèi)的值。erval.ms60 * 1000Consumer提交offset值到zookeeper的周期。queued.max.message.chunks2用來被consumer消費(fèi)的message chunks 數(shù)量，每個(gè)chunk可以緩存fetch.

52、message.max.bytes大小的數(shù)據(jù)量。erval.ms60 * 1000Consumer提交offset值到zookeeper的周期。queued.max.message.chunks2用來被consumer消費(fèi)的message chunks 數(shù)量，每個(gè)chunk可以緩存fetch.message.max.bytes大小的數(shù)據(jù)量。fetch.min.bytes1The minimum amount of data the server should return for a fetch request. If insufficient data is available the r

53、equest will wait for that much data to accumulate before answering the request.fetch.wait.max.ms100The maximum amount of time the server will block before answering the fetch request if there isnt sufficient data to immediately satisfy fetch.min.bytes.rebalance.backoff.ms2000Backoff time between ret

54、ries during rebalance.refresh.leader.backoff.ms200Backoff time to wait before trying to determine the leader of a partition that has just lost its leader.auto.offset.resetlargestWhat to do when there is no initial offset in ZooKeeper or if an offset is out of range ;smallest : automatically reset th

55、e offset to the smallest offset; largest : automatically reset the offset to the largest offset;anything else: throw exception to the consumerconsumer.timeout.ms-1若在指定時(shí)間內(nèi)沒有消息消費(fèi)，consumer將會(huì)拋出異常。ernal.topicstrueWhether messages from internal topics (such as offsets) should be exposed to the consumer.zo

56、okeeper.session.timeout.ms6000ZooKeeper session timeout. If the consumer fails to heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur.zookeeper.connection.timeout.ms6000The max time that the client waits while establishing a connection to zookeeper.zookeep

57、er.sync.time.ms2000How far a ZK follower can be behind a ZK leader三 Kafka工作流程分析3.1 Kafka生產(chǎn)過程分析3.1.1 寫入方式producer采用推（push）模式將消息發(fā)布到broker，每條消息都被追加（append）到分區(qū)（patition）中，屬于順序?qū)懘疟P（順序?qū)懘疟P效率比隨機(jī)寫內(nèi)存要高，保障kafka吞吐率）。3.1.2 分區(qū)（Partition）消息發(fā)送時(shí)都被發(fā)送到一個(gè)topic，其本質(zhì)就是一個(gè)目錄，而topic是由一些Partition Logs(分區(qū)日志)組成，其組織結(jié)構(gòu)如下圖所示：我們可以看到

58、，每個(gè)Partition中的消息都是有序的，生產(chǎn)的消息被不斷追加到Partition log上，其中的每一個(gè)消息都被賦予了一個(gè)唯一的offset值。1）分區(qū)的原因（1）方便在集群中擴(kuò)展，每個(gè)Partition可以通過調(diào)整以適應(yīng)它所在的機(jī)器，而一個(gè)topic又可以有多個(gè)Partition組成，因此整個(gè)集群就可以適應(yīng)任意大小的數(shù)據(jù)了；（2）可以提高并發(fā)，因?yàn)榭梢砸訮artition為單位讀寫了。2）分區(qū)的原則（1）指定了patition，則直接使用；（2）未指定patition但指定key，通過對(duì)key的value進(jìn)行hash出一個(gè)patition（3）patition和key都未指定，使用輪詢選

59、出一個(gè)patition。DefaultPartitioner類public int partition(String topic, Object key, byte keyBytes, Object value, byte valueBytes, Cluster cluster) List partitions = cluster.partitionsForTopic(topic); int numPartitions = partitions.size(); if (keyBytes = null) int nextValue = nextValue(topic); List availab

60、lePartitions = cluster.availablePartitionsForTopic(topic); if (availablePartitions.size() 0) int part = Utils.toPositive(nextValue) % availablePartitions.size(); return availablePartitions.get(part).partition(); else / no partitions are available, give a non-available partition return Utils.toPositi

人人文庫(kù)> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

尚大數(shù)據(jù)技術(shù)之kafka

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

尚大數(shù)據(jù)技術(shù)之kafka

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔