




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
方法作者:XuWang,Hailong摘CAPPACELC宣稱在分布式副本系統(tǒng)中存在一些性能評量的折RSMRSM-d,我們建立了概率模型去量化一致性和延遲性。一致性和延遲性都受d的影響,而d表示協(xié)關(guān)鍵詞:一致性,寫沖突,延遲性, 引[4]進一步解釋了CAP,并提出了一致性與可用性,一致性與延遲之間的兩種權(quán)衡。系統(tǒng)總是可寫的并保持最終一致性,如Dynamo[5],Cassandra[6]。然而其他系統(tǒng)如Chubby[7],Zookeeper[8]必須有段很短的時間來從錯誤中進行恢復來保證各副本管弱一致性已經(jīng)在一些商業(yè)系統(tǒng)中運用并且被許多最求高可用性和低延遲的開發(fā)者bing搜索業(yè)務(wù),21.8%4.3%;由于延時從400ms增加到900ms,谷歌在搜索領(lǐng)域的業(yè)務(wù)量下降了25%;100ms的延遲增加使亞馬遜的銷售額下降了一個百分點[13]。因此在系統(tǒng)設(shè)計中延遲是一項很重要的考慮RSMs[14]Quorum[15]系統(tǒng)是分布式系統(tǒng)和數(shù)據(jù)庫領(lǐng)域中比較流行的兩種容WR兩種副本集合。W集合用于寫操作,R用于讀操作。在隨機時間通過RSMsquorum系統(tǒng)提供RSMsquorumRSMs的寫RSMs一致性和延遲之間,基于不一致性的定量結(jié)果做出一個合理理性的權(quán)衡。束[19,20]和限制副本的寫分離[21,22]RSMs的寫不在這篇論文中,提出了一個定量的概率分析方法來計算RSMs的不一致性和延一點,調(diào)查和分析了三種代表性的RSM實現(xiàn),分別為非一致全序廣播[23,24],分調(diào)查和分析[27-29]。然后,我們提出了一種通用系統(tǒng)模型RSM-d,d的取值范圍為[1-n](n為副本數(shù)。D表示提交一個寫操作之前協(xié)調(diào)者必須收到的ACKs數(shù)。模型表明(1)在d∈[1,[n/2]+1]時一致性隨著d的增加而增強(2)當d大于[n/2]+1d的增加而單調(diào)遞增。第四,將寫不一致性的量化結(jié)RSM-d的延遲聯(lián)系起來,RSM-d的基礎(chǔ)上,我們?yōu)閷憶_突構(gòu)建了一個概率模型,寫沖突是造成不一致性都受到一個共同因素的影響,這個共同因素就是d。 背RSMs的背景,RSMs致力于容錯在分布式系統(tǒng)中得到ChandraToueg[30]已經(jīng)證明全序廣播和分布一致性問題是可解決的,在這次研究中,我們也遵循這些假設(shè)。這就意味著我們討論的RSMs有2f+1個副本,能f個崩潰錯誤,并且最終能夠偵測到崩潰的節(jié)點。然而,實際的錯誤探測器在一定時間內(nèi)必須輸出是否崩潰的結(jié)果,而不是長時間等待一個最終結(jié)論。[31]將在[23-29]調(diào)查中,RSMs通過非一致全序廣播,分布式一致和一致全序廣播實現(xiàn)。如圖2.1所示,這三種具有代表性的RSM實現(xiàn)時這樣實現(xiàn)的:一個獨一的序列號并且將其提交給所有的副本。當寫操作消息抵達副本,副本會記錄下這個寫操作,并且根據(jù)序列號進行提交,然后回復協(xié)調(diào)者。只要協(xié)調(diào)者受到第一個回復,它就會向客戶端發(fā)送響應(yīng)信息。任意時刻如果協(xié)調(diào)者被認RSMs的分布式一致,一旦協(xié)調(diào)者收到一個寫操作命令,首先會1[n/2]+1n RSM-d系統(tǒng)模型,這個模型RSM模型的抽象表示。因RSMs,通過調(diào)查研究它們來獲取盡管已經(jīng)在第二部分介紹過了全序廣播算法,也僅是全序廣播綜合調(diào)查結(jié)果5外兩種是基于全序邏輯時鐘[16]的歷史交流和目的地協(xié)議實現(xiàn)。RSM和分布式一致的解決方案都是目的地協(xié)議。Paxos用于對單個寫操作排序的協(xié)商,multi-paxos針3.1RSM-dPaxos算法進行全序廣播。原因是:1)固定的定序器算法在所有的全序廣播里的延遲是最少的。2)Paxos在3)在動態(tài)定序器中的標示符,基于特權(quán)的算法,在通信Paxos算法相結(jié)合建立一個統(tǒng)一的系統(tǒng)RSM-d。這是切實可行的,因為全序廣播和分布式一致是基本相等的。如圖2dd屬于[1;n]d副本也包括了協(xié)調(diào)器,也就是一果d=1,RSM-d降低為不均一全序廣播,協(xié)調(diào)器可以直接進入提交階段。如果d=[n=2]+1,RSM-dPaxosd=n,RSM-d變成為一致的全序廣播。我們不只是涉及到了這三個代表性的RSMs,而且將深入探討其他的在過去被忽視的RSM-dRSMRSMs的假設(shè)。RSMs假設(shè)同時崩潰的效檢測器將開始選擇新的協(xié)調(diào)器,并且只在歷史記錄的基礎(chǔ)上更改錯誤。在RSM-d有多個協(xié)調(diào)器,如Multi-RingPaxos,他與Paxos相協(xié)調(diào)。 寫沖突概RSM-d,如ACK數(shù)??≥[??/2]+1,所有的寫計算被提交到他們普通序和障修復后,f??[??/2]有故障出現(xiàn),副本仍將保持一致性。然后如果??[??/21,ACK消息的圖4.1顯示了當至少d節(jié)點接收到了“PROPOSE產(chǎn)生故障時,寫日志丟失的方案。假設(shè)????是一個寫計算所有日志丟失的可能。如d(b)之下,????=????。4.1因為歷史記錄只寶庫了日志歷史,所以????也是在歷史記錄中的一個寫計算丟????= 如今我們不考慮條件(c,但(a)和(b)仍保留。這意味著我們允許提交的)????????=∑??????1????????[??????=????????????= ∑ ??????1???????? 現(xiàn)在我們進一步的不考慮(b),并只考慮(a。因此我們應(yīng)該考慮到不同步的度作用是隨機數(shù)據(jù)下的????????。為了便捷,我們用以下公式表示:????=??????∑??????1?????d<x<n。明顯的可以看出方程式(2)可以被寫成??????????。當??????時,寫沖突的條件幾率也是如此。因此我們可以通過對每一個可能的??=??條件幾率之??????=∑??????????
=∑??????????????∑??????1????? 如上述的,在不同條件下協(xié)調(diào)器選擇和故障恢復,等式(1(2)和(3)都代??????= =∑??????????4.3圖4.3顯示了寫重復的場景。當一個不準確的故障檢測器對協(xié)調(diào)者存在錯誤懷疑,至少存在一個調(diào)解節(jié)點。然而,如果dn2Pno表示沒有相同節(jié)點的回復確認消息的兩個節(jié)點集合的概率。除去兩個協(xié)調(diào)者本身,其他兩個隨機(d1)節(jié)點集可能不重疊,且都是不同的節(jié)點。因此,Pno是兩個隨機(d1)節(jié)點集數(shù),沒有重疊和排除兩個協(xié)調(diào)者除以所有兩個隨機節(jié)點集(??? ??????
=??? ??? ???
?????????圖4.4所示,主要工作如下:在時距Te4.4分布函數(shù)分別為??和?????2???1是它們的兩個樣本,分別代表兩個連續(xù)心跳信息的延遲信息。“副本在定時器超時前沒有接收到新的心跳信息”意味著???2+????1+????2+????1+??,副本仍然認為它崩潰了。兩個節(jié)點之間的這種錯誤懷疑的概率由Pfs表示:????8=???????2+??>???1+=???????2????1>???=????8
??
??Var(T)T的方差。如果我們不知道T的分布,上述不等式可用????8的上限Pgfs中,對協(xié)調(diào)者錯誤懷疑的概率可以通過加和n-f-1活躍副本的概率,乘以至少有一個活躍副本錯誤懷疑協(xié)調(diào)者的n-f-1活躍副本來計算:
???
=∑(
) 1? 1?1?Pgfs的流傳式故障檢測器(由另一個符號Pgs-gfs表示)有一點不同: ??? ???1???8?????8=∑
) 1? Pwd表示寫重復概率。寫重復發(fā)生如果(1)協(xié)調(diào)者正常工作但(2)其他活認信息不重疊,即,Pwd等于這三個事件的概率的乘積:????=1????? 但難以準確計算Pno,因為它取決于特定協(xié)調(diào)者選定的持續(xù)時間,故障恢復算法,[]??????=????+????=∑??????+????+1????? 雖然??????的最終的結(jié)果給出了,我們應(yīng)該在不同的情況下采用它。例如, 時正如圖5.1所示,我們已經(jīng)在即將使用的表5.2??????=????+??????+??????=??=????=????=0。若不考慮這種序。例如,????1≤????2≤?≤???????1。????k小的????????????????“COMMIT”消息從協(xié)調(diào)者到副本i所需時??????????????=?????1+?????? 當d2????1=??????????????????????+?????? ??????????????????????+min(????)+????????+??5.1接下來我們關(guān)注信息的推遲而忽略登陸時間??????(或者可以認為是時間非常短。因????和??????都是隨機變量T的樣本(T的定義在第IV部分,遵循概率密度函遵循概率密度函 ??????=????+??
???? ???????(f(t)IV部分)和累計分布函數(shù)G(t)。??????的pdf和cdf分別由??? 和????t1≤k≤n?1表示。直觀上來說,對于足夠小的正實數(shù)??,??????∈[t,t+ε],只要存在一個??????∈[tt+ε],使得??????取值(k-1)t,對于其他??????取值??1????????(??????∈[,+???= ) ?????[t, ???
???1???????? +ε ???2=???1??(????∈[t,t+ε]) ) ???? ???? +ε ??? ???=???1 )?????? ???1?????? +ε??? ???=???1
) ???1?????? +ε??? ???=???1 ) ???11? + ??? =??????????(????????[t,t+ε])=???1 ??? ???11? ???通過方程(7),我們可以猜想??????1????????≥1:????????+1????? = +min??????)???(?????1+min??????= ??????1以上兩個minLCi被抵消,因為提交階段沒有變化,并且????0=0f(t)t0時,f(t)=0t0時,g(t)=0,G(t)=0,G+∞=1。如果d=1,那么??????2????? =?????? = ???1 1? ???2= ???1? 0= 1? ???1???
1? ???1= 1? ?
????????00若假設(shè) 存在,那么?? =∫+∞ ??是一個常量。在這種情況下很容0以至于:??????2????? 如果??≥2
???????????1=0+∞1? ????????+1????? = )???????=∫ ???1 (???2) )?1(1? ???
??? ???
??? ?1 ( ) (1? ??= ) ? ? ???
???
???
=???
) 0
(1?
???
????類似的,如果??
???? =??
??+1?
=(???1)∫???
?11? ??? ??≥讓???????=(???1
? (1?
?? ????????+1????? =????? ??≥ 顯然???????>0。所以方程(8)f(x)n的數(shù) 一致性和延遲之間的長期運行的RSMs,系統(tǒng)的一致性象征著有序提交的寫操作比率,這能通過1-PwcE(Lw(d))進行估算。在這部分,B是系統(tǒng)總效益(比如,收入或用戶體驗,BcBl是一致性和延遲對系統(tǒng)效益的影響值。舉例來說,B能看做網(wǎng)上商店的總收入,強一致性的事務(wù)性購買會使收入增加,而高延遲將導致用戶減少和收入減少。ξ表示其他因素對總效益的????=????+??+??=??(1?????????)+??+??????????? +=??+??+??+?????????????+????????=?????????????+????????=??????????? ???????????+???????? ???????=??????????? ???其中??=??+??+??+??并且????=??????????+????????????????? 盡管??,??,??和??????1d的取值,對于一V(d)是上述等式的可變部分。V(d)取最小值時B(d)取最大值。????=??????????? ?????≤??????????? ?????′=????′=的d的值。B(1)=???????????1+???????? ??([2]+1)=???????????[2]+????=???????????通過上述方程組,我們能得到??,??,??d取值是一定的,而 實的失敗率,RNSanti-entropygossip-style故障檢測會使系統(tǒng)更復雜和更不確定。在本節(jié)中,我們關(guān)注的是驗證我們的結(jié)果,從而減少RSM-d實驗的假設(shè)變量數(shù)量的最低值,擴大應(yīng)用范圍。因此不考慮anti-entropy和gossip-style的故障檢測。為????和????。因此我們估計??????=????????/10000000。與此同時,我們計算延遲的平均值來估計????d的期望。s0.05(20ms,0.1(10ms)0.2(5ms被分別設(shè)置到(1000ms,2000ms)和(500ms,1000ms。在以上輸入?yún)?shù)下RMSE=0.0009%,stddev.=0.0052%.dn21,我們觀察到????和????為了證明方程(8均布fx)循=0:01;0:2;0:05;:1;:2,該分布已傳入相應(yīng)信息。我們執(zhí)行一個空寫(沒有操作)保證日志記錄和寫入提交的時間可以忽略。我們對每個??∈,9和??∈,??進行測試,得到延遲時間的平均值并預(yù)測方程(8)給出的延遲期望值。在所有情況下將?????? ?????1 的觀測均0.013ms和t.d0.019ms,符合們對延遲的預(yù)期。dλ=0:01,Pc=2%,Te=1000ms,To=2000ms。7.1所示,xn(2<n<9)和相應(yīng)的應(yīng)答d(1<d<[n=2]),y軸表示的是寫入沖突的概率。為了強調(diào)不同的差異,我們用log??????來代替??????。我們每步下降至少一個數(shù)量級。注意到當d>[n/2]時,寫入沖突的概率??????為0,所以并d方程8揭示了寫延遲與d的關(guān)系。在實驗中,我們想要更直觀的展示出寫延遲df(x)被設(shè)置為λ=0.01。同時,還執(zhí)行一個空寫入擾最終的結(jié)果,因為????1包括的所有值將被刪除?!蔥和相應(yīng)數(shù)量的ACKs數(shù)??∈[1,??/2],y軸表示代替????1的延遲????????????d的增大而增大。在這里我們關(guān)注??∈[2,??/2],實際上,在??∈[1,??]之間的任意值??????dd值的降低將導致n值的增大。原因是協(xié)調(diào)器會存在更多的ACKs去選擇最快的d。7.2d一致v.s.延正如第六節(jié)所定義的一致性等于1???????,現(xiàn)在我們結(jié)合以上兩個實驗的結(jié)果,7.3所示,x軸表示代替????1的延遲??????,y軸表示1??????d3n3,5,7和其相應(yīng)∈[??/2]n和??1??/2],它表明,更強的一致性必然導致糟糕的7.3V.S.RSMs系統(tǒng)作為其銷售系統(tǒng)。雖然難的。但是,如果我們不想知道系統(tǒng)的精確值的好處????,我們只需要通過統(tǒng)計方法知道一致性效益比??和延遲效益比??,最后確定??′的最大收益值????′。價格是V。一方面,如果每個正常的(一致)購買可以帶來20%的價格,和每一個異常(不一致)購買帶來雙重損失的價格(如賠償訂單沖突)。產(chǎn)生效益的為20%????1????????2????1?1???????,因此收益率的一致性為??=2.2????。另一方面,用戶相1001%。也就是說,收益的比例延遲??為0.01%????。因此,我們有??=2.2????和??=0.01%????。7.4第六節(jié)表明,整體利益????的最大值為其變量????=????2.2??????d0.01%????????????? 有最小值。在??∈[1,??]+1]和??∈[2,9]2經(jīng)計算出????11????部分描述了變量整體利益????。例如,n=3,d=2,????值為0.125????45d=2時????有最小值,????Paxos實現(xiàn)。因 相關(guān)較具有代表性的實現(xiàn)是非一致全序廣播,PaxosRSMs已經(jīng)RSMs一致性已經(jīng)被大量研究,但依舊在最近吸引著人們的目光。根據(jù)CAP[3]和RSMs中,我們的工作通過考慮崩潰錯誤時寫沖突發(fā)生的概率來量化不一致 討Quorum系Quorum系統(tǒng)定義了兩類副本集合,一類是W,另一類是讀R。它們能被配ONE(W=1),QUORUM(W=[n/2]+1ALL(W=n)或者是任意值。這和序,而RSMs為全序排列。所以一些在RSMs系統(tǒng)中引起寫沖突的情況不一定適用于Quorum系統(tǒng)。在現(xiàn)今的實際quorum系統(tǒng)中,一般把第二種寫看做一個新的寫操RSMs在對同步提交寫操作和副本歷史記錄狀態(tài)時懷疑協(xié)調(diào)者后會進行協(xié)調(diào)者10現(xiàn)的變體,我們首先提出了一種叫做RSM-d的通用模型來統(tǒng)一描述大多數(shù)運用參考.SummaryoftheAmazonEC2andAmazonRDSServiceDisruptionintheUSEastRegion.April2011J.Dean.Designs,lessons,andadvicefrombuildinglargedistributedsystems.KeynotefromLADIS2009.E.A.Brewer.TowardsRobustDistributedSystems.InPODC,pages7C7,D.J.Abadi.Consistencytradeoffsinmoderndistributeddatabasesystemdesign:CAPisonlypartofthestory.IEEEComputer,45(2):37C42,2012.G.DeCandia,D.Hastorun,M.Jampani,G.Kakulapati,A.Lakshman,A.Pilchin,S.Sivasubramanian,P.Vosshall,andW.Vogels.Dynamo:Amazonshighlyavailablekey-valuestore.InSOSP2007.TheApacheCassandraProject./[7]M.Burrows,”TheChubbylockserviceforloosely-coupleddistributedsystems,”inOSDI’06:Proceedingsofthe7thsymposiumonOperatingsystemsdesignandimplementation,2006,pp.P.Hunt,M.Konar,F.P.Junqueira,andB.Reed,ZooKeeper:Wait-freecoordinationforInternet-scalesystems,inUSENIXATC10:Proceedingsofthe2010USENIXAnnualTechnicalConference.USENIXAssociation,2010.W.Vogels.Eventuallyconsistent.Commun.ACM,52:40C44,JanuaryK.Birman.ReliableDistributedSystemsTechnologies,WebServicesandApplications.Springer,2005.E.SchurmanandJ.Brutlag.Performancerelatedchangesandtheiruserimpact.PresentedatVelocityWebPerformanceandOperationsConference,June2009.VelocityandtheBottomLine. /site/glinden/Home/StanfordDataMining.2006-11-29.ppt.29F.Schneider.Implementingfault-tolerantservicesusingthestatemachineapproach:atutorial.ACMComputingSurveys,22(4),1990.Merideth,MichaelG.,andMichaelK.Reiter.Selectedresultsfromthelatestdecadeofquorumsystemsresearch.Replication,SpringerBerlinHeidelberg,2010,185-206.Lamport,L.Time,clocks,andtheorderingofeventsinadistributedsystem.ACMCommunications,21(7),pp.558-565,1978.D.Malkhi,M.Reiter,A.Wool,andR.Wright.Probabilisticquorum.systems.InformationandCommunication,(170):184C206,2001.PeterBailis,ShivaramVenkataraman,JosephM.Hellerstein,MichaelFranklin,IonStoica.ProbabilisticallyBoundedStalenessforPracticalPartialQuorums.InVLDB2012S.S.Chawathe,H.Garcia-Molina,andJ.Widom.FlexibleConstraintManagementforAutonomousDistributedDatabases.IEEEDataEng.Bull.,17(2):23C27,1994.A.GuptaandS.Tiwari.Distributedconstraintmanagementforcollaborativeengineeringdatabases.InCIKM,pages655C664,1993.H.YuandA.Vahdat.DesignandEvaluationofaContinuousConsistencyModelforReplicatedServices.InOSDI,pages305C318,2000.S.Shah,K.Ramamritham,andP.J.Shenoy.ResilientandCoherencePreservingDisseminationofDynamicDataUsingCooperatingPeers.IEEETrans.Knowl.DataEng.,16(7):799C812,2004XavierDefago,AndreSchiper,andPeterUrban.Totalorderbroadcastandmulticastalgorithms:Taxonomyandsurvey.ACMComput.Surv.36,4(December2004)XavierDefago.Comparativeperformanceanalysisoforderingstrategiesinatomicbroadcastalgorithms.IEICETrans.onInformationandSystems.December2003L.Lamport.PaxosMadeSimple.ACMSIGACTNews,32(4):18C25,DecemberP.Marandi,M.Primi,N.Schiper,andF.Pedone.RingPaxos:Ahigh-throughputatomicbroadcastprotocol.InternationalConferenceonDependableSystemsandNetworks(DSN),2010,pp.527-536.L.Lamport.GeneralizedConsensusandPaxos.TechnicalReportMSRTR-2005-33,MicrosoftResearch,2005,B.KemmeandG.Alonso.Databasereplication:ataleofresearchcommunities.PVLDB,ParisaJaliliMarandiMarcoPrimiFernandoPedone.Multi-RingPaxos.InDSNChandra,T.,Toueg,S.:UnreliableFailureDetectorsforReliableDistributedSystems.JournaloftheACM43(2),225C267(1996).W.Chen.OntheQualityofServiceofFailureDetectors.IEEETransactionsonComputer.May2002.DiogoBecker,FlavioJunqueira,andMarcoSerafini.LeaderElectionforReplicatedServicesUsingApplicationScores.InMiddleware2011.B.Lampson.TheABCDsofPaxos.InProceedingsofthe20thACMSymposiumonPrinciplesofDistributedComputing(PODC01),page13.ACMPress,2001.IditKeidarandSergioRajsbaum.Onthecostoffault-tolerantconsensuswhentherearenofaults-atutorial.TechnicalReportMIT-LCS-TR-821,LaboratoryforComputerScience,MassachusettsInstituteTechnology,Cambridge,MA,02139,May2001.alsopublishedinSIGACTNews32(2)(June2001).A.Demers,D.Greene,C.Hauser,W.Irish,J.Larson,S.Shenker,H.Sturgis,D.Swinehart,andD.Terry.Epidemicalgorithmsforreplicateddatabasemaintenance.InPODC1987.MarinBertier,OlivierMarin,PierreSens,ImplementationandPerformanceEvaluationofanAdaptableFailureDetector,Proceedingsofthe2002InternationalConferenceonDependableSystemsandNetworks,p.354-363,June23-26,2002.M.RaynalandF.Tronel.GroupMembershipFailureDetection:ASimpleProtocolandItsProbabilisticAnalysis.DistributedSystemsEng.J.,vol.6,no.3,pp.95-102,R.vanRenesse,Y.Minsky,andM.Hayden.Agossip-stylefailuredetectionservice.InProceedingsofInternationalConferenceandDistributedSystemsPlatformsandOpenDistributedProcessing(IFIP),1998.J.Gray,P.Helland,P.ONeil,andD.Shasha.Thedangersofreplicationandasolution.InSIGMOD1996.B.F.Cooper,R.Ramakrishnan,U.Srivastava,A.Silberstein,P.Bohannon,H.-A.Jacobsen,N.Puz,D.Weaver,andR.Yerneni.PNUTS:Yahoo!sHostedDataServingPlatform.PVLDB,1:1277C1288,August2008.W.Lloyd,M.J.Freedmand,M.Kaminsky,andD.G.Andersen.Dontsettleforeventual:Scalablecausalconsistencyforwide-areastoragewithCOPS.InSOSP2011.ConsistencyorLatency?AQuantitativeAnalysisofReplicationSystemsBasedonReplicatedStateXuWang,HailongSun,TingDeng,JinpengSchoolofComputerScienceandEngineeringBeihangUniversityBeijing,ChinaEmail:{wangxu,sunhl,dengting,huaijp}—ExistingtheorieslikeCAPandPACELChaveclaimedthattherearetradeoffsbetweensomepairsofper-formancemeasuresindistributedreplicationsystems,suchasconsistencyandlatency.However,currentsystemstakeaveryvagueviewonhowtobalancethosetradeoffs,e.g.eventualconsistency.Inthiswork,weareconcernedwithprovidingaquantitativeanalysisonconsistencyandlatencyforwidely-usedreplicatedstatemachines(RSMs).BasedonourpresentedgenericRSMmodelcalledRSM-d,probabilisticmodelsarebuilttoquantifyconsistencyandlatency.Weshowthatbothareaffectedbyd,whichisthenumberofACKsreceivedbythecoordinatorbeforecommittingawriterequest.Andwefurtherdefineapayoffmodelthroughcombiningtheconsistencyandlatencymodels.Finally,withMonteCarlobasedsimulation,wevalidateourpresentedmodelsandshowtheeffectivenessofoursolutionsintermsofhowtoobtainanoptimaltradeoffbetweenconsistencyandlatency.—consistency;writeconflict;latency;replicatedPracticaldistributedsystemsinCloudandBigDatasolu-tionsoftenfacemanychallengessuchashighscalabilityandavailability,whicharebuiltonmassivecommoditycomput-ers,disks,networkdevicesandundercomplexmanagementtasks[1,2].Toservemoreusersandprovidehighenoughavailability,servicesanddataaretypicallyreplicatedacrossmultiplevirtualmachines(VMs),physicalmachinesandevengeographically-distributedclusters.TheCAPtheorem[3]showstheimpossibilityofobtainingallthreeofconsistency,availabilityandpartitiontolerancesimultaneously.AndPACELC[4]furtherinterpretsCAP,whichclaimstwotradeoffsincludingconsistency&availabilityandconsistency&latency.Ontheonehand,peoplemakeatradeoffbetweenconsistencyandavailability.Inordertoprovideextremelyhighavailability,manysystemsarealwayswritablewitheventualconsistency,suchasDynamo[5]andCassandra[6];whileothers,likeChubby[7]andZookeeper[8],mustexperienceashortperiodofunavailabilitytorecoverfromfailuresforguaranteeingstrongconsistencyofdifferentreplicas.Ontheotherhand,peoplefocusonthetradeoffbetweenconsistencyandlatency.Forexample,eventuallyconsistentreads,nomatterhowfasttheyare,cannotboundthestalenessasthenewestversionswillbeeventually[9];andstrongconsistencyleadstohigherlatencyfor
duetothewriteuniformity[10].Althoughweakconsistencyhasbeenusedinsomecommercialsystemsandisacceptablebymanypractitionersforhigheravailabilityandlowerlatency,itisnecessarytoboundtheinconsistencyandtoknowhowinconsistencytheweakconsistencyis.Sincelatencyimpactstheend-userexperienceofanIn-ternetapplication,ithas eanimportantsystemmetricformodernserviceproviders[11].Somestatisticalresultspre-sentedin[12]showthatend-usersareverysensitivetosystemlatency.Forinstance,atMicrosoftBing,a2-secondslowdownreducesqueries/userby1.8%andrevenue/userby4.3%;alatencyincreasefrom400msto900mswithGooglesearchresultsina25%dropoffinpagesearches;andhasa1%dropinsaleswitha100mslatencyincrease[13].Thereforelatencyisanimportantfactorinsystemdesign.Inthiswork,wefocusonthetradeoffbetweenconsistencyandlatencyindistributedreplicationprotocols.Twopopularfault-toleranceapproachesindistributedsys-temsanddatabasesarereplicatedstatemachines(RSMs)[14]andquorumsystems[15].RSMsdescribedesirablereplica-tionsemanticstomakeoperationscommittedintotalorder.QuorumsystemsdefinesetsofreplicasWandR,whereWisusedforwriteandRforread.Andtheycommitwritesbyvectorclocks[16]withsemanticreconciliation[5]incausalorder.Forquorum-likesystems,severalworks[17,18]boundreadstalenessfromtheaspectsofdataversions,staletimeandstalenessprobabilitytodescribetheinconsistency,andthendiscussthetradeoffbetweenconsistencyandlatency.However,littlehasbeendoneonquantifyinginconsistencyandlatencyforRSMs,whiletheyaredesignedtoproviderelativelystrongerconsistencythanquorumsystems.ForreadoperationsofRSMs,wecanborrowanalysistechniquesfromthequantitativereadstalenessinquorumsystems.ButforwriteoperationsofRSMs,weneedanewanalyticalandquantitativemethodtomeasurethedegreeofwriteinconsistency.ThenwecanrationallymakeatradeoffbetweenconsistencyandlatencyforRSMsbasedonthequantitativeresultofwriteAlthoughtherearesomeexistingalternativemodelstodescribewriteinconsistencyintheabsenceofcrashfailures,e.g.,writeconsistencyconstraint[19,20]andlimitedwritedivergencyofreplicas[21,22],wewanttoknowthedegreeofwriteinconsistencyforRSMswhenencountering978-1-4799-0181-4/13/$31.00?2013failures,ratherthanhowtosatisfyconstraintsorboundthemaximumdeviationofadataitem.ThereforewequantifywriteinconsistencyforRSMsbytheprobabilityofwriteconflictwhichrepresentstheprobabilityofawritecommittedinanabnormalorder.Inthispaper,wepresentaquantitativeprobabilisticanaly-sisapproachtomeasuringtheconsistencyandlatencyofRSM-s,andfurtherprovideasolutionformakingtradeoffbetweenconsistencyandlatencytoachievethemaximalsystembenefit.First,wehavesurveyedandanalyzedthreerepresentativeRSMimplementationsincludingnon-uniformtotalorderbroadcast[23,24],distributedconsensus(Paxos)[25,26]anduniformtotalorderbroadcast[23,24],aswellastheirvariants[27-29].ThenweproposeagenericsystemmodelRSM-d,whered∈[1,(nisthenumberofreplicas)representsthenumberofthatmustbereceivedbeforecommittingawrite.RSM-dcandescribethethreerepresentativeoneswhered=1,[n/2]+1,nrespectively.Second,wemeasurethewriteinconsistencyofRSM-dbasedonaprobabilisticmodel.Thewriteincon-sistencyisquantifiedbytheprobabilityofwriteconflictwhichrepresentstheprobabilityofanywritecommittedinanabnormalorder.Ourprobabilisticmodelshowsthat(1)consistencyincreaseswhendrisesifd∈[1,[n/2]+1]andstrongconsistencyisalwaysguaranteedifd∈[[n/2]+1,n].Third,weevaluatethelatencyofRSM-dthroughitsexpectationwhichshowsthatthelatencystrictlymono-tonicallyincreaseswithrespecttod.Fourth,combiningthequantitativeresultsofwriteinconsistencyandlatencyofRSM-d,weprovideasolutionfortradeoffbetweenconsistencyandlatencytoachievethemaximalsystembenefit.Finally,throughMonteCarlobasedeventdrivensimulations,wevalidateoursolutionfortradeoffbetweenconsistencyandlatencyfromtheaspectofoverallsystembenefit.WemakethecontributionsasWepresentagenericsystemmodelRSM-dforRSMstogiveanunifieddescriptionofmajorreplicationprotocolsusingRSMs;OnthebasisofRSM-d,webuildaprobabilisticmodelforwriteconflictthatisoneofthekeyfactorstocauseinconsistency,andaprobabilisticmodelforcharacterizinglatencyaswell.Wefindoutthatbothconsistencyandlatencyareaffectedbythecommonfactord.Wefurtherdefineapayoffmodeltomaketradeoffbetweenconsistencyandlatencyforachievingthemaximumsystembenefitthroughcombiningthewriteconflictmodelandlatencymodelfromtheperspectiveofaserviceprovider;WithMonteCarlobasedeventdrivensimulations,wevalidateourquantitativeresultsandshowtheeffectivenessofourpresentedsolutionsintermsofhowtoobtainanoptimaltradeoffbetweenconsistencyandlatency.Theremainderofthispaperisstructuredasfollows.SectionIIintroducesthebackground.SectionIIIpresentsoursystemmodel.SectionIVdescribeshowtoquantifytheprobabilityofwriteconflict.Thecalculationoflatency
presentedinSectionV.SectionVIshowsthetradeoffofconsistencyandlatency.SectionVIIprovidesourexperimentalresults.RelatedworkanddiscussionarepresentedinSectionInthissection,wepresentthebackgroundofRSMsarewidelyusedindistributedsystemswiththetargettotoleratefailures.Failuremodesfallintonon-Byzantine(fail-stop,crashandmessageloss)andByzantine(signedmessagesornot).Withthecomplexitythatatleast3f+1nodesfortoleratingfByzantinefailuresandthelowoccurrenceprobabilityofByzantinefailures,commonRSMsonlyconsidernon-Byzantineones.Andmessagelossfailurescanbeeasilyeliminatedbynetworkprotocols,sopractitionersusuallysaythatRSMscantoleratecrashfailures.ChandraandToueg[30]haveprovedthatthetotalorderbroadcastandthedistributedconsensusproblemsaresolvableandequivalenttoeachotherunderWfailuredetectorsandatmost[n/2]simultaneouscrashfailures.Inthiswork,wealsofollowtheseassumptions.ItmeansthatwediscussRSMswhichtolerateuptofcrashfailureswith2f+1replicasandcaneventuallydetectthecrashofnodes.However,practicalfailuredetectorsmustoutputaresult(crashornot)inaboundedtimeotherthanalong-waitingforeventualconclusions.[31]classifiesthequalityofserviceofpracticalfailuredetectorsintotwotypes:speed,i.e.,howfastafailuredetectordetectsacrash;andaccuracy,i.e.,howwellitavoidsfalsesuspicion.ThereforepracticalfailuredetectorsofRSMshaveaprobabilitytofalselysuspectonanormalAssurveyedin[23-29],RSMsaretypicallyimplementedbynon-uniformtotalorderbroadcast,distributedconsensusanduniformtotalorderbroadcast.AsshowninFigure1,thethreerepresentativeRSMimplementationsworkasfollows:Innon-uniformtotalorderbroadcast,onceacoordinatorreceivesawrite,itassignsthewriteauniquesequencenumberandthendirectlycommitsittoallreplicas.Whenthewritearrivesatareplica,thereplicawillrecordthewrite,andcommititaccordingtoitssequencenumberandthenrespondtothecoordinator.Oncethecoordinatorhasreceivedthefirstresponse,itwillreplytotheclient.Notethatanytimeacoordinatorissuspectedascrashedbyfailuredetectors,thenewcoordinatorwillbeelected[32]andrecoversfailuresbasedonthehistoricalcommitrecords.FordistributedconsensusbasedRSMs,onceacoordi-natorreceivesawrite,atfirstitpropagatesthewritewiththegivensequencenumbertoallreplicas.Afterloggingthewrite,everyreplicawillsendanacknowledgement(ACK)messagetothecoordinator.Untilatleast[n/2]+1ACKs(includingthecoordinatoritself)arereceivedbythecoordinator,itwillcommitthewrite.Whenthecoordinatorissuspectedascrashedbyfailuredetectors,thenewcoordinatorwillbeelectedandrecoverfailuresbasedonthehistoricallogrecords(notthehistoricalcommitrecords).Uniformtotalorderbroadcastisverysimilartodistribut-edconsensus,butthecoordinatormustwaitforallnACKstobereturned.Fig.1.ThreeRepresentativeRSMAssumingwithoutlossofgeneralitythecoordinatorsendsitselfanACKmessage,wecandiscoverthatanobviousdif-ferenceamongthethreerepresentativeRSMimplementationsisthenumberofACKs(denotedbyd)thatthecoordinatorwaitsforbeforecommittingawrite.Andthevaluesofdfornon-uniformtotalorderbroadcast,distributedconsensusanduniformtotalorderbroadcastare1,[n/2]+1andn,respectively.Thuswecaneasilyconcludethatthelatenciesofthemfromlowtohigharenon-uniformtotalorderbroadcast,distributedconsensusanduniformtotalorderbroadcast.Intermsofconsistency,distributedconsensus(Paxos)hasshoweditssafety(strongconsistency)in[33].Thatis,RSMsarealwaysconsistentif[n/2]+1≤d≤n.However,fornon-uniformtotalorderbroadcast,ifacommittingwriteislostinhistoricalcommitrecordsduetocrashfailures(i.e.,allcommittednodesincludingthecoordinatorcrash)orfailuredetectiononthecoordinatorhappens,thesequencenumberheldbythewriteorusedbythefalselysuspectedcoordinatorwillbereusedtokeepsystemliveness.Thesewillleadtowriteconflictandthenresultinwriteinconsistency.RSM-d:AGENERICRSMInthissection,wewillpresentoursystemmodelRSM-d,whichisthe ionofRSMimplementations.SinceourdesiredsystemmodelshouldcoverthethreerepresentativeRSMs,wehavesurveyedthemandtrytoderivethesystemAlthoughwepresent(non-uniformanduniform)totalorderbroadcastalgorithmsinsectionII,theyareonlyoneoffiveclasses.[23]providesacomprehensivesurveyfortotalorderbroadcast,consideringallfiveclassesoforderingmechanismsandbothnon-uniformanduniformalgorithms.Thefirstthreeorderingmechanisms:fixedsequencer(showninsectionII),movingsequencerandprivilege-basedarebuiltbasedonafixedsequenceroramovingtoken;Theothertwo:communicationhistoryanddestinationagreementareimplementedbasedonatotalorderlogicalclock[16].AndthesolutionofdistributedconsensusforRSMisthesamewiththedestinationagreement,wherePaxosisusedtomakeagreementontheorderforawrite,andmulti-paxos[25]forallwrites.Essentiallyweadda’sequencer’whichcancomparetheordersofanytwowritestocausalorder,andthengaintotalorder.Basedontheanalysisabove,weareconcernedwith
Fig.2.RSM-dtotalorderbroadcastwiththefixedsequencerandthePaxosalgorithm.Thereasonsare:(1)thefixedsequenceralgorithmhasthelowestlatencyamongalltotalorderbroadcastones(onpoint-to-pointnetworks)[24];(2)Paxosisprovedtobeoptimalingeneralconsensusalgorithms[34];(3)boththetokeninthemovingsequencerandprivilege-basedalgorithms,andthelogicalclockinthecommunicationhistoryanddesti-nationagreementalgorithms,canbeseenas”anotherfixedThenwecombinetotalorderbroadcastwiththefixedsequencerandthePaxosalgorithmtoanunifiedsystemmodelRSM-d,whichisfeasiblesincetotalorderbroadcastanddistributedconsensusareessentiallyequivalenttoeachother[30].AsdepictedinFigure2,RSM-dincludestheagreementphaseandthecommitphase.Intheagreementphase,anywritemustbeb
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 兒童音樂劇走進小學音樂教學中的實踐探索
- 倉儲返利合同范本
- 人才公寓裝修合同范例
- 公司公章制作合同范例范例
- 農(nóng)村賣方合同范例
- 倉庫建材配送合同范例
- 分紅簡約合同范例
- 代理派遣服務(wù)合同范例
- 與餐飲合作合同范例
- 中心設(shè)計合同范例
- GB/T 17639-2023土工合成材料長絲紡粘針刺非織造土工布
- 2023年廣東省深圳市龍華區(qū)中考道德與法治二模試卷及答案解析
- 舟山國儲基地擴建項目開山回填與隧道工程爆破項目設(shè)計方案
- 中國書畫藝術(shù)品投資(山東聯(lián)盟)知到章節(jié)答案智慧樹2023年山東財經(jīng)大學
- 信用修復申請文書(當事人適用)
- 高中學生社會實踐活動100例
- 2023年新改版教科版六年級下冊科學全冊教案(新課標)
- 天津漁港防波堤施工組織設(shè)計
- 03SG520-2 實腹式鋼吊車梁(中輕級工作制 A1~A5 Q345鋼 跨度6m,7.5m,9m)
- Access數(shù)據(jù)庫程序設(shè)計上機操作練習題2
- 《最優(yōu)化方法》復習題(含答案)
評論
0/150
提交評論