數(shù)據(jù)庫教學課件:第9章 恢復系統(tǒng)_第1頁
數(shù)據(jù)庫教學課件:第9章 恢復系統(tǒng)_第2頁
數(shù)據(jù)庫教學課件:第9章 恢復系統(tǒng)_第3頁
數(shù)據(jù)庫教學課件:第9章 恢復系統(tǒng)_第4頁
數(shù)據(jù)庫教學課件:第9章 恢復系統(tǒng)_第5頁
已閱讀5頁,還剩176頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

第9章恢復系統(tǒng)本章內容參考:數(shù)據(jù)庫概念(第四版)byA.SilberschatzChapter17RecoverySystem補充內容本章內容特色:數(shù)據(jù)庫恢復的“道理”較容易理解但編程實現(xiàn)較難,算法較多本章要解決的關鍵問題:如何預防和應付各種”故障”,保證ACID性質主要內容故障分類存儲器結構恢復與原子性基于日志的恢復ShadowPaging并發(fā)事務的恢復緩沖管理非易失性存儲器丟失信息的故障AdvancedRecoveryTechniquesARIESRecoveryAlgorithm故障分類事務故障:邏輯錯誤:因為某些內部錯誤條件導致事務不能完成系統(tǒng)錯誤:因為某種錯誤條件(如死鎖)導致數(shù)據(jù)庫系統(tǒng)終止一個活躍事務系統(tǒng)崩潰:停電故障或者其他軟硬件故障導致系統(tǒng)崩潰故障-停止假設:假設非易失性存儲器的內容不會因系統(tǒng)崩潰而破壞數(shù)據(jù)庫系統(tǒng)可通過許多完整性檢查來防止磁盤數(shù)據(jù)被破壞介質故障:磁頭損壞或類似的磁盤故障可能破壞全部或部分磁盤存儲器假設損壞是可以檢測到的:磁盤驅動器使用校驗和來檢測故障恢復算法恢復算法是指即使發(fā)生故障也能確保數(shù)據(jù)庫一致性和事務原子性及持久性的技術恢復算法兩個部分在正常事務處理過程中采取動作來確保有足夠的信息用于從故障恢復(即事前預防措施,保證有盡量多的信息在故障中保存下來)在故障發(fā)生后采取行動作將數(shù)據(jù)庫內容恢復到一個確保原子性,一致性和持久性的狀態(tài)(即事后的恢復措施,保證原子性,一致性,耐久性)存儲器結構易失性(Volatile)存儲器不能在系統(tǒng)崩潰后保存下來例如:主存,高速緩存非易失性(Nonvolatile)存儲器可以在系統(tǒng)崩潰后保存下來例如:磁盤,磁帶,閃存,非易失性RAM(電池供電)穩(wěn)定(Stable)存儲器虛構的能夠經(jīng)受任何故障的存儲器可用多個非易失性介質存儲相同的副本來近似穩(wěn)定存儲器的實現(xiàn)在不同磁盤上維持多個副本副本可以在遠程站點上,以防止象火災或洪水之類的災難在數(shù)據(jù)傳輸過程中的故障仍然可導致不一致的副本,塊傳輸?shù)慕Y果可能是成功完成部分失敗:目標塊有錯誤信息完全失敗:目標塊根本沒有更新在數(shù)據(jù)傳輸過程中防止存儲介質出故障的解決方法輸出操作如下執(zhí)行(假設每個塊有兩個副本):將信息寫到第一個物理塊當?shù)谝粋€寫操作成功完成,再將相同信息寫到第二個物理塊僅當?shù)诙€寫操作成功完成之后輸出才算完成穩(wěn)定存儲器的實現(xiàn)在數(shù)據(jù)傳輸過程中防止存儲介質出故障(續(xù)):塊的副本間可能因輸出操作過程中的故障而有不同,為了從故障恢復:首先找到不一致的塊高代價的解決方法:比較每個磁盤塊的兩個副本更好的解決方法將正在進行的磁盤寫操作記錄在非易失性存儲器上(非易失性RAM或者磁盤的特定區(qū)域)恢復時利用此信息找到可能不一致的塊,只需比較這些塊的副本用于硬件RAID系統(tǒng)如果不一致塊的一個副本上檢測到錯誤(錯誤的校驗和),則用另一副本覆蓋它即可如果兩個副本都沒有錯誤但卻不一致,則用第二個塊覆蓋第一個塊

數(shù)據(jù)存取物理塊是位于磁盤上的塊緩沖塊是臨時位于主存中的塊磁盤和主存之間的塊移動通過下列兩個操作來引發(fā):input(B):將物理塊

B傳入主存output(B):將緩沖塊

B傳到磁盤,并且替換相應的物理塊每個事務Ti

都有自己的私有工作區(qū),用來保存它存取和更新的所有數(shù)據(jù)項的局部副本

Ti

對應于數(shù)據(jù)項X的局部副本記為xi為簡單起見,假設每個數(shù)據(jù)項都能存入并且確實存儲在單個塊中數(shù)據(jù)存取事務使用下列操作來在系統(tǒng)緩沖塊和它的私有工作區(qū)之間傳送數(shù)據(jù)項:read(X)將數(shù)據(jù)項X

的值賦給局部變量xi.write(X)將局部變量xi

的值賦給緩沖塊中的數(shù)據(jù)項

X這兩條命令在賦值前都可能要求發(fā)出input(BX)指令,如果X所在的塊

BX

當前不在主存中的話數(shù)據(jù)存取事務第一次訪問X時執(zhí)行read(X)以后的訪問是對局部副本進行的最后一次訪問之后,執(zhí)行write(X)output(BX)不必緊隨write(X)立即執(zhí)行系統(tǒng)可在它認為適當?shù)臅r候執(zhí)行output操作數(shù)據(jù)存取例xYABx1y1緩沖區(qū)緩沖塊A

緩沖塊Binput(A)output(B)read(X)write(Y)磁盤T1的工作區(qū)T2的工作區(qū)內存x2恢復與原子性問題一個較長事務進行若干查插刪改操作設計時假定整個事務作為原子是保持一致性但執(zhí)行可能半途而廢(意外原因,停電等)如何恢復?恢復與原子性為了在故障的情況下仍確保原子性,應首先輸出描述修改操作的信息到穩(wěn)定存儲中(先記賬),描述將作的操作,然后動作(恢復有據(jù))下面介紹兩種方法:基于日志的恢復影子分頁shadow-paging先假設事務是串行執(zhí)行的基于日志的恢復日志(log)保存在穩(wěn)定存儲器上日志是日志記錄的序列,用來記錄對數(shù)據(jù)庫的更新活動當事務Ti

啟動,寫入一條登記自己的日志記錄

<Ti

,start>Ti

執(zhí)行write(X)之前,寫入日志記錄

<Ti,X,V1,V2>,各分量依次是:事務,變量,前值,后值當Ti

完成了最后一條語句,寫入日志記錄<Ti

,commit>基于日志的恢復目前假設日志記錄直接寫入穩(wěn)定存儲器(即不經(jīng)過緩沖)兩種使用日志的方法延遲更新數(shù)據(jù)庫(先記賬,后補工,保守,穩(wěn)妥)立即更新數(shù)據(jù)庫(邊記,邊做,積極,冒險)延遲更新數(shù)據(jù)庫延遲更新數(shù)據(jù)庫方案在日志中記錄所有更新,但推遲write直至部分提交之后假設事務串行執(zhí)行事務啟動時向日志寫入<Ti

,start>記錄write(X)操作導致寫入日志記錄<Ti,X,V>,其中V是X的新值注:這種方案不需要保留舊值這時并未執(zhí)行對X的寫操作,而是延遲了當Ti

部分提交時,向日志寫入<Ti,

commit>最后,讀取日志記錄并用來實際執(zhí)行前面延遲的寫操作(把被延遲的寫動作(欠工賬)補作完成)生活中,一個“拖”字或“研究研究”就是這種延遲策略延遲更新數(shù)據(jù)庫故障后恢復過程中,,事務需要重做(redo),當且僅當日志中存在<Ti

start>和<Ticommit>(這樣恢復有據(jù))重做事務Ti

(redoTi)將事務更新過的所有數(shù)據(jù)項的值置為新值(即有底子,有工賬,在底子上,重來一次)但正在恢復時出錯有怎么辦?例如事務T0

和T1

(T0

在T1之前執(zhí)行): T0:read(A) T1

:read(C) A:=A-50

C:=C-100 Write(A) write(C) read(B) B:=B+50 write(B)延遲更新數(shù)據(jù)庫下面給出了日志在三個時刻的情形如果故障發(fā)生時穩(wěn)定存儲器上的日志處于情形:

(a)不需重做(因為工賬并未實施) (b)必須執(zhí)行redo(T0),因為存在<T0commit> (c)必須先執(zhí)行redo(T0)再執(zhí)行redo(T1),因為存在<T0commit>

和<Ticommit>立即更新數(shù)據(jù)庫立即更新數(shù)據(jù)庫方案允許一個未提交事務在發(fā)出寫操作時更新數(shù)據(jù)庫由于可能需要取消操作,日志記錄必須包含舊值和新值對更新的日志記錄必須在數(shù)據(jù)庫數(shù)據(jù)修改之前寫入穩(wěn)定存儲器假設日志記錄直接輸出到穩(wěn)定存儲器可以擴展成推遲日志記錄的輸出,只要滿足:在執(zhí)行數(shù)據(jù)塊B的output(B)操作之前,所有對應于B

的日志記錄都已寫入穩(wěn)定存儲器被更新塊的輸出可能發(fā)生在事務提交之前或之后的任何時刻塊輸出的次序可以與修改塊的次序不同立即更新數(shù)據(jù)庫恢復過程需要兩種操作:undo(Ti)復原所有被Ti更新了的數(shù)據(jù)項的舊值,從Ti的最后一條日志記錄向前進行(后退)redo(Ti)將所有被Ti更新了的數(shù)據(jù)項的值置為新值,從Ti的第一條日志記錄向后進行(朝前)兩種操作都必須是冪等的亦即,即使該操作被執(zhí)行多次,效果與只執(zhí)行一次是一樣的由于在恢復過程中可能重復執(zhí)行這兩個操作,所以需要此性質立即更新數(shù)據(jù)庫故障后恢復如果日志中包含<Tistart>但不包含<Ti

commit>,則事務Ti

需要取消(undo)(即日志中有頭無尾時要復舊(急性子將有的麻煩))如果日志中包含<Tistart>和<Ti

commit>,則事務Ti

需要重做(redo)(即日志中有頭有尾時,需要重作)取消(undo)操作先做,重做(redo)操作后做立即更新數(shù)據(jù)庫的例子檢查點前述恢復過程的問題搜索整個日志很費時可能不必要地重做事務(已經(jīng)將更新輸出到數(shù)據(jù)庫的事務)通過周期性地執(zhí)行檢查點來精簡恢復過程將當前駐留在主存中的所有日志記錄輸出到穩(wěn)定存儲器將所有更新過的緩沖塊輸出到磁盤將一條<checkpoint>日志記錄寫入穩(wěn)定存儲器檢查點恢復時僅需考慮在檢查點之前啟動的最近的事務Ti,以及在Ti之后啟動的事務從日志末尾反向掃描,找到最近的<checkpoint>記錄繼續(xù)反向掃描直到找到一個<Ti,start>記錄僅需考慮日志中在上面這個啟動記錄之后的部分,之前的部分在恢復時可以忽略,如果愿意可以隨時刪除對所有沒有<Ticommit>的事務(Ti

或更晚),執(zhí)行undo(Ti)(僅當立即更新數(shù)據(jù)庫時才如此)正向掃描日志,對所有有<Ticommit>的事務(Ti

或更晚),執(zhí)行redo(Ti)檢查點的例子T1

可以忽略(由于檢查點,其更新已經(jīng)輸出到磁盤)T2

T3

重做T4

取消TcTfT1T2T3T4檢查點系統(tǒng)故障影子頁面(ShadowPaging)Idea保持真身和影子,影子在非易逝內存事務前留影,事務中影子不被修改事務成功,則影子與時俱進事務失敗,則恢復到影子技巧影子不是數(shù)據(jù)頁面的副本,只要頁面指針表的副本SamplePageTableExampleofShadowPagingShadowandcurrentpagetablesafterwritetopage4ShadowPagingTocommitatransaction:事務提交時1.Flush變動的值塊到磁盤2.Outputcurrentpagetabletodisk當前表寫盤3.Makethecurrentpagetablethenewshadowpagetable,asfollows:keepapointertotheshadowpagetableatafixed(known)locationondisktomakethecurrentpagetablethenewshadowpagetable,simplyupdatethepointertopointtocurrentpagetableondiskShadowPagingOncepointertoshadowpagetablehasbeenwritten,transactioniscommittedNorecoveryisneededafteracrash—newtransactionscanstartrightaway,usingtheshadowpagetable出錯后復舊pCurrPag=pShadowPagesnotpointedtofromcurrent/shadowpagetableshouldbefreed(garbagecollected)釋放不用的指針ShowPaging(11/4)Advantages優(yōu)點nooverheadofwritinglogrecords(快)recoveryistrivial(易恢復)ShowPagingDisadvantages:缺點Copyingtheentirepagetableisveryexpensive復制原指針表費時CanbereducedbyusingapagetablestructuredlikeaB+-treeNoneedtocopyentiretree,onlyneedtocopypathsinthetreethatleadtoupdatedleafnodes可用B樹解決,只復制路徑Commitoverheadishighevenwithaboveextension提交花銷大Needtoflusheveryupdatedpage,andpagetableDatagetsfragmentedrelatedpagesgetseparatedondisk磁盤上數(shù)據(jù)不連續(xù)Aftereverytransactioncompletion,thedatabasepagescontainingoldversionsofmodifieddataneedtobegarbagecollected需要回收內存空間Hardtoextendalgorithmtoallowtransactionstorunconcurrently并發(fā)程序難寫并發(fā)事務的恢復對基于日志的恢復方案擴展,以允許多個事務并發(fā)執(zhí)行簡化模型的3個假定:所有事務共享單個磁盤緩沖區(qū)和單個日志緩沖塊可以具有被一個或多個事務更新的數(shù)據(jù)項假設使用嚴格兩階段鎖的并發(fā)控制,即未提交事務所做的更新對其他事務是不可見的否則如果T1更新A,然后T2又更新A并提交,最后T1必須夭折時,該怎樣執(zhí)行undo?日志登記與前面描述的相同

不同事務的日志記錄在日志中可能交錯分布檢查點技術及恢復時的操作必須改變并發(fā),多檢查點導致復雜并發(fā)事務的恢復檢查點的執(zhí)行基本同前,除了檢查點日志記錄現(xiàn)在形如

<checkpoint,L>其中

L

是檢查點時活躍事務列表假設當執(zhí)行檢查點時沒有正在進行的更新操作(以后對此可放寬)當系統(tǒng)從崩潰恢復時,首先做下列動作:初始化undo-list

和redo-list

為空從末尾開始反向掃描日志,直至發(fā)現(xiàn)第一條<checkpointL>記錄時停止.對在反向掃描過程中發(fā)現(xiàn)的每條記錄:如果是<Ti

commit>,將Ti

加入redo-list如果是<Ti

start>,并且Ti不在redo-list中,則將Ti加入undo-list對L中的每個Ti,如果Ti不在redo-list中,則將Ti

加入undo-list并發(fā)事務的恢復至此undo-list

包含必須取消操作的未完成事務,而redo-list

包含必須重做的已完成事務現(xiàn)在恢復可以如下繼續(xù):從最近的記錄開始反向掃描日志,對undo-list中的每個Ti當遇到<Ti

start>記錄時停止掃描過程中,對屬于undo-list中事務的每條日志記錄執(zhí)行undo定位于最近的<checkpointL>記錄從<checkpointL>記錄開始正向掃描日志,直至日志末尾掃描過程中,對屬于redo-list中事務的每條日志記錄執(zhí)行redo恢復例對下列日志按照恢復算法的步驟走一遍:日志記錄緩沖模型約定日志記錄緩沖:在主存中緩沖日志記錄,而不是直接輸出到穩(wěn)定存儲器當緩沖區(qū)中的日志記錄塊滿時,或者當執(zhí)行日志強制(logforce)操作時,將日志記錄輸出到穩(wěn)定存儲器日志強制用來提交事務,方法是將它的日志記錄(包括提交記錄)強行輸出到穩(wěn)定存儲器由于一次輸出操作可以輸出多條日志記錄,從而減少I/O代價日志記錄緩沖日志緩存的規(guī)則日志記錄按其創(chuàng)建次序輸出到穩(wěn)定存儲器僅當<Ti

,commit>日志記錄已經(jīng)輸出到穩(wěn)定存儲器,事務Ti

才進入提交狀態(tài)在主存中的數(shù)據(jù)塊輸出到數(shù)據(jù)庫之前,所有屬于該塊中數(shù)據(jù)的日志記錄必須已經(jīng)輸出到穩(wěn)定存儲器這條規(guī)則稱為先寫日志(write-aheadlogging)或WAL規(guī)則嚴格地說,WAL只需要輸出undo信息數(shù)據(jù)庫緩沖數(shù)據(jù)庫維護一個數(shù)據(jù)塊在內存中的緩沖區(qū)當需要新塊時,如果緩沖區(qū)已滿,則需要將一個現(xiàn)有塊移出緩沖區(qū)如果所選的移出塊被更新過(“臟塊”),則必須將它輸出到磁盤由于先寫日志規(guī)則,如果要將一個帶有未提交更新的塊輸出到磁盤,必須先將記有那些更新的取消(undo)信息的日志記錄輸出到穩(wěn)定存儲器上的日志中數(shù)據(jù)庫緩沖當一個塊輸出到磁盤時,其上不能有正在進行的更新,可以用下列方法確保這一點寫數(shù)據(jù)項之前,事務在包含數(shù)據(jù)項的塊上獲得排它鎖一旦完成寫,鎖即可釋放這種持有時間很短的鎖稱為latch

(插銷)在一個塊輸出到磁盤之前,系統(tǒng)在該塊上獲得排他插銷確保該塊上沒有更新正在進行數(shù)據(jù)庫緩沖數(shù)據(jù)庫緩沖區(qū)的實現(xiàn)可以是在實際主存中為數(shù)據(jù)庫保留的一個區(qū)域在操作系統(tǒng)虛擬內存中在保留的主存中實現(xiàn)緩沖區(qū)有下列缺點:預先劃分內存給數(shù)據(jù)庫緩沖區(qū),限制了靈活性需求可能改變,盡管操作系統(tǒng)最清楚在什么時刻應該如何分配內存,它卻不能改變內存分配數(shù)據(jù)庫緩沖數(shù)據(jù)庫緩沖區(qū)可以在操作系統(tǒng)的虛擬內存中實現(xiàn),盡管有下列缺點:當操作系統(tǒng)需要移出一個被更新過的頁以便為另一頁騰出空間時,該頁被寫到磁盤上的交換區(qū)(swapspace)當數(shù)據(jù)庫決定將緩沖頁寫到磁盤時,緩沖頁可能在交換區(qū)中,于是可能必須從磁盤上的交換區(qū)讀入該頁再輸出到磁盤上的數(shù)據(jù)庫中,導致一次多余的I/O!稱為dualpaging

問題當換出數(shù)據(jù)庫緩沖頁時,理想的情況是操作系統(tǒng)將控制交給數(shù)據(jù)庫,由數(shù)據(jù)庫輸出該頁到數(shù)據(jù)庫中而不是交換區(qū)中(首先確保輸出日志記錄)因此Dualpaging被避免,但通常的操作系統(tǒng)不支持此功能缺點:間接管理,有extraI/O!非易失性存儲器丟失信息的故障到目前為止我們都假定非易失性存儲器不會丟失信息使用類似檢查點的技術來處理非易失性存儲器的丟失信息周期性地轉儲(dump)數(shù)據(jù)庫的全部內容到穩(wěn)定存儲器轉儲過程中不能有活躍事務:必須有一個類似于檢查點的過程輸出當前在主存中的所有日志記錄到穩(wěn)定存儲器輸出所有緩沖塊到磁盤復制數(shù)據(jù)庫內容到穩(wěn)定存儲器輸出<dump>記錄到穩(wěn)定存儲器的日志中從磁盤故障恢復時從最近的轉儲恢復數(shù)據(jù)庫查看日志并重做所有在轉儲后提交的事務可以擴展以允許轉儲過程中有活躍事務:稱為模糊轉儲(fuzzydump)或聯(lián)機轉儲(onlinedump)AdvancedRecoveryAlgorithmAdvancedRecoveryTechniquesSupporthigh-concurrencylockingtechniques,suchasthoseusedforB+-treeconcurrencycontrol支持并行B-樹OperationslikeB+-treeinsertionsanddeletionsreleaselocksearlyTheycannotbeundonebyrestoringoldvalues(physicalundo),sinceoncealockisreleased,othertransactionsmayhaveupdatedtheB+-tree.簡單的物理恢復不能復舊,因為其他事務可能干擾Instead,insertions(resp.deletions)areundonebyexecutingadeletion(resp.insertion)operation(knownaslogicalundo)因此,采用邏輯復舊AdvancedRecoveryTechniquesForsuchoperations,undologrecordsshouldcontaintheundooperationtobeexecutedcalledlogicalundologging(邏輯撤消日志)incontrasttophysicalundologging(物理撤消日志)AdvancedRecoveryTechniquesRedoinformationisloggedphysically(thatis,newvalueforeachwrite)evenforsuchoperationsLogicalredoisverycomplicated,sincedatabasestateondiskmaynotbe“operationconsistent”由于磁盤上的數(shù)據(jù)庫狀態(tài)可能不是”操作一致的”,故邏輯重做比物理重做復雜的多AdvancedRecoveryTechniquesOperationloggingisdoneasfollows:Whenoperationstarts,log<Ti,Oj,operation-begin>,HereOjisauniqueidentifieroftheoperationinstanceWhileoperationisexecuting,normallogrecordswithphysicalredoandphysicalundoinformationareloggedWhenoperationcompletes,<Ti,Oj,operation-end,U>islogged,whereUcontainsinformationneededtoperformalogicalundo

U中包含完成邏輯撤消所需要的信息AdvancedRecoveryTechniquesIfcrash/rollbackoccursaftertheoperationcompletes:theoperation-endlogrecordisfound,andinthiscaselogicalundoisperformedusingundoinformation

Uphysicalundoinformationfortheoperationisignored(忽略)Redoofoperation(aftercrash)stillusesphysicalredoinformationAdvancedRecoveryTechniquesRollbackoftransactionTi

isdoneasfollows:Scanthelogbackwards

Ifalogrecord<Ti,X,V1,V2>isfound,performtheundoandlogaspecialredo-onlylogrecord<Ti,X,V1>.Ifa<Ti,Oj,operation-end,U>recordisfoundRollbacktheoperationlogicallyusingtheundoinformationUUpdatesperformedduringrollbackareloggedjustlikeduringnormaloperationexecutionAttheendoftheoperationrollback,insteadoflogginganoperation-endrecord,generatearecord<Ti,Oj,operation-abort>SkipallprecedinglogrecordsforTiuntiltherecord<Ti,Oj,operation-begin>isfoundAdvancedRecoveryTechniquesIfaredo-onlyrecordisfoundignoreitIfa<Ti,Oj,operation-abort>recordisfound:skipallprecedinglogrecordsforTi

untiltherecord

<Ti,Oj,operation-begin>isfoundStopthescanwhentherecord<Ti,start>isfoundAdda<Ti,abort>recordtothelogAdvancedRecoveryTechniquesSomepointstonote:Cases3and4abovecanoccuronlyifthedatabasecrasheswhileatransactionisbeingrolledback當事務回滾期間發(fā)生系統(tǒng)崩潰導致Cases3and4出現(xiàn)Skippingoflogrecordsasincase4isimportanttopreventmultiplerollbackofthesameoperation避免同一個操作被多次撤消AdvancedRecoveryTechniquesThefollowingactionsaretakenwhenrecoveringfromsystemcrashScanlogforwardfromlast<checkpointL>recordRepeathistorybyphysicallyredoingallupdatesofalltransactionsCreateanundo-listduringthescanasfollowsundo-listissettoLinitiallyWhenever<Ti,start>isfoundTi

isaddedtoundo-listWhenever<Ti,commit>or<Ti,abort>isfound,Tiisdeletedfromundo-listThisbringsdatabasetostateasofcrash,withcommittedaswellasuncommittedtransactionshavingbeenredoneNowundo-listcontainstransactionsthatareincomplete,thatis,haveneithercommittednorbeenfullyrolledbackAdvancedRecoveryTechniquesScanlogbackwards,performingundoonlogrecordsoftransactionsfoundinundo-listTransactionsarerolledbackasdescribedearlierWhen<Ti,start>isfoundforatransactionTiinundo-list,writea<Ti,abort>logrecordStopscanwhen<Ti,start>recordshavebeenfoundforallTiinundo-listThisundoestheeffectsofincompletetransactions(thosewithneithercommitnorabortlogrecords)RecoveryisnowcompleteAdvancedRecoveryTechniquesCheckpointingisdoneasfollows:OutputalllogrecordsinmemorytostablestorageOutputallmodifiedbufferblockstostablestorageOutputa<checkpoint,L>recordtologonstablestorageTransactionsarenotallowedtoperformanyactionswhilecheckpointingisinprogressFuzzycheckpointingallowstransactionstoprogresswhilethemosttimeconsumingpartsofcheckpointingareinprogressPerformedasdescribedonnextslideAdvancedRecoveryTechniquesFuzzycheckpointingisdoneasfollows:TemporarilystopallupdatesbytransactionsWritea<checkpoint,L>logrecordandforcelogtostablestorageNotelistMofmodifiedbufferblocksNowpermittransactionstoproceedwiththeiractionsOutputallmodifiedbufferblocksinlistMtodiskblocksshouldnotbeupdatedwhilebeingoutputFollowWAL:alllogrecordspertainingtoablockmustbeoutputbeforetheblockisoutputStoreapointertothecheckpointrecordinafixedpositionlast_checkpointondiskAdvancedRecoveryTechniquesWhenrecoveringusingafuzzycheckpoint,startscanfromthecheckpointrecordpointedtobylast_checkpointLogrecordsbeforelast_checkpointhavetheirupdatesreflectedindatabaseondisk,andneednotberedoneIncompletecheckpoints,wheresystemhadcrashedwhileperformingcheckpoint,arehandledsafelyARIESRecoveryAlgorithmIntroductionNameAlgorithmforRecoveryandIsolationExploitingSemantics(ARIES)參考文獻:ARIES:ATransactionRecoveryMethodSupportingFine-GranularityLockingandPartialRollbacksUsingWrite-AheadLoggingC.MOHANIBMAlmadenResearchCenterDONHADERLEIBMSantaTeresaLaboratoryBRUCELINDSAY,HAMIDPIRAHESHandPETERSCHWARZIBMAlmadenResearchCenterIntroductionIntroductionARIESisastateoftheart(最新成果)recoverymethodIncorporatesnumerousoptimizationstoreduceoverheadsduringnormalprocessingandtospeeduprecovery核心思想:融合多種優(yōu)化技術,減少開銷,即盡量不作多余的事The“advancedrecoveryalgorithm”westudiedearlierismodeledafterARIES,butgreatlysimplifiedbyremovingoptimizationsIntroductionSupports

levellogicallogging

Updatesinplace就地更新(asopposedtomulti-versioning)

Record-levellocking行級鎖

Partialandtotalrollbacks部分及完全回滾

Mediarecovery

介質故障恢復

Nestedtransactions嵌套事務IntroductionAdvantages:優(yōu)點(1)Allowsfine-grainconcurrency(2)Usesonlyonetypeoflogtorecordlevelactions(3)Comparedtophysicallogging,producessmalllogfiles(4)Allowsflexiblebuffermanagement(no-force,steal)(5)Fastcheckpointingwithminimaldelayoftransactions(6)Allowsrelativelyfastrestart(7)Nolockingordeadlocksduringrestartorrollbacks(8)Avoidsundoingthesamelogrecordmultipletimes(9)Incurs招致onlyasmalloverheadperpage(pageLSN)(10)Allowsflexiblestructureswithgoodstoragereclamation(回收)IntroductionDisadvantage:缺點Verytrickyimplementation實現(xiàn)非常有技巧比較難理解ARIES:AssumptionsConcurrencycontrolisineffectStrict2PL,inparticularUpdatesarehappening“inplace”(就地更新)i.e.dataisoverwrittenonthediskHandlingtheBufferPoolBufferpoolisfinite,so…Issue:Howdoweguaranteedurabilityofcommitteddata?Solution:Policyonwhathappenswhenatransactioncompletes,whattransactionscandotogetmorepagesHandlingtheBufferPoolForceeverywritetodisk?優(yōu)點:Providesdurability(保證持久性)缺點:ButPoorresponsetime(響應時間差)Steal(偷竊)buffer-poolframesfromuncommitedXacts?Ifnot,poorthroughput(如果不允許,則吞吐率差)Ifso,howcanweensureatomicity?(如果允許,如何保證原子性)MoreonStealandForceNOFORCE(whyenforcingDurabilityishard)Whatifsystemcrashesbeforeamodifiedpageiswrittentodisk?Writeaslittleaspossible,inaconvenientplace,atcommittimetosupportREDOingmodifications為支持REDO,在提交時,盡可能少寫(在方便的地方)STEAL(whyenforcingAtomicityishard)TostealframeF:CurrentpageinF(sayP)iswrittentodisk;someXactholdslockonPWhatiftheXactwiththelockonPaborts?MustremembertheoldvalueofPatstealtimetosupportUNDOingthewritetopageP為支持UNDO,必須記錄偷竊時的舊值ARIESLoggingARIESLog:AnorderedlistofREDO/UNDOactionsRecordREDOandUNDOinformation,foreveryupdate,inalogLogrecordcontains:<TID,pageID,offset,length,olddata,newdata>Sequentialwritestolog(putitonaseparatedisk)按順序寫日志Minimalinfo(diff)writtentolog,somultipleupdatesfitinasinglelogpage將最少量的信息寫入日志,這樣多個更新可以放入一個日志頁中ARIESWrite-AheadLogging(WAL)TheWrite-AheadLoggingProtocol:Mustforcethelogrecordforanupdatebeforethecorrespondingdatapagegetstodisk在數(shù)據(jù)頁寫入磁盤之前,必須強制將日志記錄寫盤guaranteesAtomicity保證原子性MustwritealllogrecordsforaXactbeforecommit在事務提交之前必須將其所有日志記錄寫盤guaranteesDurability保證持久性ARIESOptimizationsUnliketheadvancedrecoveryalgorithm(ARIES特色):

Useslogsequencenumber(LSN)toidentifylogrecords用日志順序號(LSN)標識日志記錄StoresLSNsinpagestoidentifywhatupdateshavealreadybeenappliedtoadatabasepage在數(shù)據(jù)頁中存儲LSNs表示已經(jīng)用于該數(shù)據(jù)頁的更新SupportPhysic-logicalredo支持物理邏輯重作Dirtypagetabletoavoidunnecessaryredosduringrecovery使用”臟”頁表,避免恢復過程中不必要的重作Fuzzycheckpointingthatonlyrecordsinformationaboutdirtypages,anddoesnotrequiredirtypagestobewrittenoutatcheckpointtimeARIESOptimizationsPhysical-logicalredo(物理-邏輯重做)Affectedpageisphysicallyidentified,actionwithinpagecanbelogical受影響的頁被物理標識,而頁中的操作可以是邏輯的Usedtoreduceloggingoverheads用于減少日志費用e.g.whenarecordisdeletedandallotherrecordshavetobemovedtofillholePhysic-logicalredocanlogjusttherecorddeletionPhysicalredowouldrequireloggingofoldandnewvaluesformuchofthepageARIESOptimizationsPhysical-logicalredo(物理邏輯重做)Requirespagetobeoutputtodiskatomically需要頁面可以自動輸出到磁盤EasytoachievewithhardwareRAID可以通過硬件RAID實現(xiàn)Incompletepageoutputcanbedetectedbychecksumtechniques不完整的頁面輸出可以通過校驗和檢測到Butextraactionsarerequiredforrecovery但恢復時需要額外操作Treatedasamediafailure作為介質故障處理ARIESDataStructuresLogsequencenumber(LSN)identifieseachlogrecordMustbesequentiallyincreasing(順序遞增)Typicallyanoffset(偏移量)frombeginningoflogfiletoallowfastaccessEasilyextendedtohandlemultiplelogfilesARIESDataStructuresEachpagecontainsaPageLSNwhichistheLSNofthelastlogrecordwhoseeffectsarereflectedonthepageToupdateapage:X-latchthepage,andwritethelogrecordUpdatethepageRecordtheLSNofthelogrecordinPageLSNUnlockpagePageflushtodisk:S-latchespageThuspagestateondiskisoperationconsistentRequiredtosupportphysical-logicalredoPageLSNisusedduringrecoverytopreventrepeatedredo恢復期間使用PageLSN可以避免多次redoThusensuringidempotence(等冪性)ARIESDataStructuresEachlogrecordcontainsLSNofpreviouslogrecordofthesametransaction(prevLSN)

LSNinlogrecordmaybeimplicit(隱含的)Specialredo-only(僅重做)logrecordcalledcompensationlogrecord(補償日志記錄)(CLR)usedtologactionstakenduringrecoverythatneverneedtobeundoneAlsoservetheroleofoperation-abortlogrecordsusedinadvancedrecoveryalgorithm還起到高級恢復算法中operation-abortlogrecords的作用HaveafieldUndoNextLSNtonotenext(earlier)recordtobeundoneRecordsinbetweenwouldhavealreadybeenundoneRequiredtoavoidrepeatedundoofalreadyundoneactionsLSNTransIdPrevLSNRedoInfoUndoInfoLSNTransIDUndoNextLSNRedoInfoARIESDataStructuresARIESDataStructuresARIESDataStructuresDirtyPageTable臟頁表ListofpagesinthebufferthathavebeenupdatedContains,foreachsuchpagePageLSNofthepageRecLSNisanLSNsuchthatlogrecordsbeforethisLSNhavealreadybeenappliedtothepageversionondiskSettocurrentendoflogwhenapageisinsertedintodirtypagetable(justbeforebeingupdated)Recordedincheckpoints,helpstominimizeredoworkARIESDataStructuresCheckpointlogrecordContains:DirtyPageTableandlistofactivetransactionsForeachactivetransaction,LastLSN,theLSNofthelastlogrecordwrittenbythetransactionFixedpositionondisknotesLSNoflastcompletedcheckpointlogrecord(masterrecord)ARIESDataStructuresARIESCheckpointing(11/9)Periodically,theDBMScreatesacheckpoint,inordertominimizethetimetakentorecoverintheeventofasystemcrashWritetolog:begin_checkpointrecord:Indicateswhenchkptbeganend_checkpointrecord:ContainscurrentXacttableanddirtypagetable,Thisisa`fuzzycheckpoint’:OtherXactscontinuetorunsothesetablesaccurateonlyasofthetimeofthebegin_checkpointrecordNoattempttoforcedirtypagestodiskeffectivenessofcheckpointlimitedbyoldestunwrittenchangetoadirtypageSoit’sagoodideatoperiodicallyflushdirtypagestodisk!StoreLSNofchkptrecordinasafeplace(masterrecord)TheBigPicture:

What’sStoredWhereLogRecordStructureLogrecordscontainfollowingfieldsLSNType(CLR,update,special)TransIDPrevLSN(LSNofprevrecordofthistxn)PageID(forupdate/CLRs)UndoNxtLSN(forCLRs)indicateswhichlogrecordisbeingcompensatedonlaterundos,logrecordsuptoUndoNxtLSNcanbeskippedData(redo/undodata);canbephysicalorlogicalTransactionTableStoresforeachtransaction:TransID,StateLastLSN(LSNoflastrecordwrittenbytxn)UndoNxtLSN(nextrecordtobeprocessedinrollback)Duringrecovery:initializedduringanalysispassfrommostrecentcheckpointmodifiedduringanalysisaslogrecordsareencountered,andduringundoDirtyPagesTableDuringnormalprocessing:WhenpageisfixedwithintentiontoupdateLetL=currentend-of-logLSN(theLSNofnextlogrecordtobegenerated)ifpageisnotdirty,storeLasRecLSNofthepageindirtypagestableWhenpageisflushedtodisk,deletefromdirtypagetableDirtypagetablewrittenoutduringcheckpointThusRecLSNisLSNofearliestlogrecordwhoseeffectisnotreflectedinpageondiskDirtyPagesTableDuringrecoveryloaddirtypagetablefromcheckpointupdatedduringanalysispassasupdatelogrecordsareencounteredNormalExecutionofanXactSeriesofreads&writes,followedbycommitorabortWewillassumethatwriteisatomicondiskInpractice,additionaldetailstodealwithnon-atomicwritesStrict2PLSTEAL,NO-FORCEbuffermanagement,withWrite-AheadLoggingTransactionAbortFornow,consideranexplicitabortofaxacte.g.,validationerror,deadlock;nocrashinvolvedPlaybacktheloginreverseorder,UNDOingupdates:getlastLSNofxactfromxacttablecanfollowchainoflogrecordsbackwardviatheprevLSNfieldcanwedosoincrash-recoveryundoing?beforestartingundo,writeanabortlogrecordforrecoveringfromcrashduringundoTransactionAbortToperformUNDO,musthavealockondatanoproblem(strictlocking)Beforerestoringoldvalueofapage,writeaCLR:youcontinueloggingwhileyouundoCLRhasoneextrafield:undoNextLSNpointstothenextLSNtoundo(i.e.theprevLSNoftherecordwe’recurrentlyundoing)CLRsneverundone(butmightberedonewhenrepeatinghistoryafteranothercrash)AtendofUNDO,writean“End”logrecord10198T1abortT1:lastLSN=101120CLRundo101undonextLSN=98(T1:lastLSN=120)TransactionAbortWhenanundoisperformedforanupdatelogrecordGenerateaCLRcontainingtheundoactionperformed(actionsperformedduringundoareloggedphysicalyorphysi-logically)CLRforrecordnnotedasn’infigurebelowSetUndoNextLSNoftheCLRtothePrevLSNvalueoftheupdatelogrecordArrowsindicateUndoNextLSNvalueTransactionAbortARIESsupportspartial(部分)rollbacke.g.UsedtohandledeadlocksbyrollingbackjustenoughtoreleasereqdlocksFigureindicatesforwardactionsafterpartialrollbacksrecords3and4initially,later5and6,thenfullrollback12344'3'565'2'1'6'TransactionCommitWritecommitrecordtologAlllogrecordsuptoXact’slastLSNareflushedGuaranteesthatflushedLSN≥lastLSNNotethatlogflushesaresequential,synchronouswritestodiskManylogrecordsperlogpageCommit()returnsWriteendrecordtologARIESRecoveryAlgorithmARIESrecoveryinvolvesthreepassesAnalysispassRedopassUndopassCrashRecovery:BigPictureCrashRecoveryvs.TransactionAbort?Whatarethedifferences?Abortstateinmemory,thenundoonexactRecoveryreconstructstate,thenundoalluncommittedxactreconstruction:analysis+redoundo:mustconsiderglobalorderingofundosARIESRecovery:AnalysisPhaseDeterminesWhichtransactionstoundoWhichpagesweredirty(diskversionnotuptodate)attimeofcrashRedoLSN:LSNfromwhichredoshouldstartARIESRecovery:AnalysisPhaseStartsfromlastcompletecheckpointlogrecordReadsinDirtyPageTablefromlogrecordSetsRedoLSN=minofRecLSNsofallpagesinDirtyPageTableIncasenopagesaredirty,RedoLSN=checkpointrecord’sLSNSetsUndo-list=listoftransactionsincheckpointlogrecordReadsLSNoflastlogrecordforeachtransactioninundo-listfromcheckpointlogrecordARIESRecovery:AnalysisPhaseScansforwardfromcheckpointIfanylogrecordfoundfortransactionnotinundo-list,addstransactiontoundo-listWheneveranupdatelogrecordisfoundIfpageisnotinDirtyPageTable,itisaddedwithRecLSNsettoLSNoftheupdatelogrecordIftransactionendlogrecordfound,deletetransactionfromundo-listKeepstrackoflastlogrecordforeachtransactioninundo-listMaybeneededforlaterundoARIESRecovery:AnalysisPhaseAtendofana

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論