了解IO協(xié)議棧_第1頁(yè)
了解IO協(xié)議棧_第2頁(yè)
了解IO協(xié)議棧_第3頁(yè)
了解IO協(xié)議棧_第4頁(yè)
了解IO協(xié)議棧_第5頁(yè)
已閱讀5頁(yè),還剩28頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、了解IO協(xié)議棧核心系統(tǒng)數(shù)據(jù)庫(kù)組 余鋒淘寶褚霸2012-03-181提綱IO子系統(tǒng)架構(gòu)圖IO子系統(tǒng)各層分解IO請(qǐng)求事件跟蹤點(diǎn)blktrace/btt解釋2IO子系統(tǒng)架構(gòu)圖3$stap -l ioscheduler.*ioscheduler.elv_add_requestioscheduler.elv_completed_requestioscheduler.elv_next_requestblktrace$stap -l ioblock.*ioblock.endioblock.requestDM層塊層框圖4buffered iommapdirect io思考IO子系統(tǒng)有幾層?各層的輸入輸出分別是

2、什么?5塊層probe ioblock.request Fires whenever making a generic block I/O be ioblock.end Fires whenever a block I/O transfer is complete.6DM 層7LVM2(Linux Volume Manager 2 version)EVMS(Enterprise Volume Management System)dmraid(Device Mapper Raid Tool)請(qǐng)求隊(duì)列/電梯probe ioscheduler.elv_add_request.k

3、p- kprobe based probe to indicate that a request was added to the request queueprobe ioscheduler.elv_next_request Fires when a request is retrieved from the request queueprobe ioscheduler.elv_completed_request Fires when a request is completed8調(diào)度器參數(shù)微調(diào)9文檔參考:Documentation/block/deadline-iosched.txt #

4、cat /sys/block/sda/queue/scheduler noop anticipatory deadline cfq 思考電梯算法的核心作用是什么?10驅(qū)動(dòng)程序中斷平衡/proc/irq/IRQ/smp_affinity軟中斷平衡 /sys/block/DEV/queue/rq_affinity11塊請(qǐng)求關(guān)鍵事件點(diǎn)12 block:block_rq_abort block:block_rq_requeue block:block_rq_complete block:block_rq_insert block:block_rq_issue block:block_bio_bounc

5、e block:block_bio_complete block:block_bio_backmerge block:block_bio_frontmerge block:block_bio_queue block:block_getrq block:block_sleeprq block:block_plug block:block_unplug_timer block:block_unplug_io block:block_split block:block_remap block:block_rq_remap$perf list|grep “block:”或者 $trace-cmd li

6、st |grep block:*$stap -l kernel.trace(“block_*”)Tracepoint解釋C - complete A previously issued request has been completed. D - issued A request that previously resided on the block layer queue or in the i/o scheduler has been sent to the driver.I - inserted A request is being sent to the i/o scheduler

7、 for addi-tion to the internal queue and later service by the driver.Q - queued This notes intent to queue i/o at the given location.B - bounced The data pages attached to this bio are not reachable by the hardware and must be bounced to a lower memory location. 13Tracepoint解釋(續(xù))M - back merge A pre

8、viously inserted request exists that ends on the boundary of where this i/o begins, so the i/o scheduler can merge them together.F - front merge Same as the back merge, except this i/o ends where a previously inserted requests starts.G - get request To send any type of request to a block device, a s

9、truct request container must be allocated first. S - sleep No available request structures were available, so the issuer has to wait for one to be freed.14Tracepoint解釋(續(xù))P - plug When i/o is queued to a previously empty block device queue, Linux will plug the queue in anticipation of future ios bein

10、g added before this data is needed.U - unplug Some request data already queued in the device, start sending requests to the driver.T - unplug due to timer If nobody requests the i/o that was queued after plugging the queue, Linux will automatically unplug it after a defined period has passed.X - spl

11、it On raid or device mapper setups, an incoming i/o may straddle a device or internal zone and needs to be hopped up into smaller pieces for service.A - remap For stacked devices, incoming i/o is remapped to device below it in the i/o stack.15思考如何可視化IO請(qǐng)求生命期?16IO行為觀察17不覺(jué)得信息量太少嗎?blktrace架構(gòu)圖18blktrace可

12、過(guò)濾事件barrier: barrier attributecomplete: completed by driverfs: requestsissue: issued to driverpc: packet command eventsqueue: queue operationsread: read tracesrequeue: requeue operationssync: synchronous attributewrite: write tracesnotify: trace messagesdrv_data: additional driver specific trace19bt

13、race第一感20blkiomon21btt# blktrace /dev/sdb# blkparse -i sdb -d sdb.bin# blkrawverify sdb# btt -i sdb.bin -A 22btt: Life of an I/OQ2I time it takes to process an I/O prior to it being inserted or merged onto a request queue Includes split, and remap timeI2D time the I/O is “idle” on the request queueD

14、2C time the I/O is “active” in the driver and on the deviceQ2I + I2D + D2C = Q2CQ2C: Total processing time of the I/O23btt解讀= All Devices = ALL MIN AVG MAX N Q2Q 0.000007098 0.085323752 1.189534849 14Q2G 0.000000685 0.000001737 0.000004757 12G2I 0.000000272 0.000001724 0.000004240 12Q2M 0.000000475

15、0.000001036 0.000001362 3I2D 0.000002502 0.000244633 0.002238651 12M2D 0.000004870 0.000065011 0.000178722 3D2C 0.000055488 0.000145720 0.000219068 15Q2C 0.000062048 0.000357405 0.002303758 1524btt解讀(續(xù))= Device Overhead = DEV | Q2G G2I Q2M I2D D2C | ( 8, 16) | 0.3889% 0.3859% 0.0580% 54.7575% 40.771

16、7% | Overall | 0.3889% 0.3859% 0.0580% 54.7575% 40.7717%25btt解讀(續(xù))= Device Merge Information = DEV | #Q #D Ratio | BLKmin BLKavg BLKmax Total | | ( 8, 16) | 15 12 1.2 | 8 10 24 12026btt解讀(續(xù))= Device Q2Q Seek Information = DEV | NSEEKS MEAN MEDIAN | MODE | | ( 8, 16) | 15 620978236.7 0 | 0(5) | | Ove

17、rall | NSEEKS MEAN MEDIAN | MODE Average | 15 620978236.7 0 | 0(5)= Device D2D Seek Information = DEV | NSEEKS MEAN MEDIAN | MODE | | ( 8, 16) | 12 776222795.9 0 | 0(2) | | Overall | NSEEKS MEAN MEDIAN | MODE Average | 12 776222795.9 0 | 0(2)27btt解讀(續(xù))= Plug Information = DEV | # Plugs # Timer Us |

18、% Time Q Plugged | | ( 8, 16) | 5( 1) | 0.226614061% DEV | IOs/Unp IOs/Unp(to) | ( 8, 16) | 0.8 1.0 | Overall | IOs/Unp IOs/Unp(to) Average | 0.8 1.028btt解讀(續(xù))= Active Requests At Q Information = DEV | Avg Reqs Q | ( 8, 16) | 0.929思考除了用戶應(yīng)用,誰(shuí)還在使用塊層?30頁(yè)面回寫機(jī)制31$stap l kernel.function(congestion_wait)$perf list|grep writeback:“writeback:writeback_nothread . writeback:writeback_nowork writeback:writeback_bdi_register writeback:writeback_bdi_unregister writeback:writeback_task_sta

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論