版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、COMP231 Tutorial 9Join AlgorithmsReview: Block Nested-Loop JoinRead in the outer relation r page by pageFor each page of r, scan the entire inner relation s.Best cost: br + bs (if s is small enough to fit in memory)br (bs): no. of pages of r (s)Buffer needed at least: 3 pages (1 for r, 1 for s, 1 fo
2、r output)If memory has M pages, read M 2 pages of r at a time and use the remaining two pages for s and the output.Cost = br / (M 2) bs + br0005000200040002000200010005000500020002000300020005rsReview: Indexed Nested-Loop JoinIndex lookup can replace file scan if an index is available on the join at
3、tribute of inner relationFor each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.0005000200040002000200010005r000500020002000300020005sindex on sCost of the join: br + nr cnr: no. of tuples in rc is the cost to traverse index and
4、fetch all matching s tuples for one tr.c can be estimated as cost of a single selection on s using the join condition.If indices are available on the join attribute of both r and s,use the relation with fewer tuples as the outer relation.Review: External Sort-MergeAlgorithm:Create sorted runsN-way m
5、erge:Read the first page of each run into buffer pageREPEATSelect the first record (in sorted order) among all buffer pagesWrite the record to the output buffer. If the output buffer is full write it to disk.Delete the record from its input buffer page.IF the buffer page becomes empty THENread the n
6、ext page (if any) of the run into the buffer. UNTIL all input buffer pages are empty1210111220791792800sorted run 1sorted run 2sorted run 80pass0pass1run 1run 2run 910*9 pagessorted run 110*8 pagessorted run 9pass2sorted file800 pagesEg: 800 pages of a file10 pages of main memoryReview: Sort-Merge J
7、oinSort both relations on the join attribute (if not already sorted)Merge the sorted relations to join themJoin step is similar to the merge stage of the MergeSort algorithm Main difference is the handling of duplicate values in join attribute every pair with the same value on join attribute must be
8、 matchedCan be used only for equi-joins and natural joinsEach block needs to be read only once Cost = br + bs + the cost of sorting if relations are unsorted0001000200020002000400050005000200020002000300050005Review: Hash Join00010002000200020004000500050002000200020003000500050001000200020002000400
9、050005000200020002000300050005 Applicable for equi-joins and natural joins A hash function h is used to partition tuples of both relations h maps JoinAttrs values to 0, 1, ., n, where JoinAttrs denotes the common attributes of r and s used in the natural join. r0, r1, . . ., rn denote partitions of
10、r tuples; each tuple tr r is put in partition ri,where i = h(tr JoinAttrs). s0, s1. . ., sn denotes partitions of s tuples; each tuple ts s is put in partition si, where i = h(ts JoinAttrs).Join ExerciseAssume the query: Find the names of sailors who have reservations# Records of Sailors nr = 10,000
11、; # Records of Reservations ns = 40, 000. # of memory pages M = 100Sailors(Sid, Name, Rating, Age)Reserves(Sid, Bid, Date)SELECT NameFROM Sailors, ReservesWHERE Sailors.Sid=Reserves.SidJoin ExerciseAssume the query: Find the names of sailors who have reservationsEach attribute is 20 bytes. Each page
12、 is 1000 bytes. There are 10,000 sailors, 40,000 reservations and a buffer of M=100 pages.Question 0: How many pages does Sailors contain? How many pages does Reservations contain?Each sailor record is 80 bytes and each reserves record 60 bytes. There are 1000/80=12 sailors per page and 1000/60=16 r
13、eservations per page. Sailors contains 10000/12=834 pages and reserves 40000/16=2500 pages. Since there are 10K sailors and 40K reservations, a sailor has on the average 4 reservations (but it is possible that some sailors have more, while others have none). Sailors(Sid, Name, Rating, Age)Reserves(S
14、id, Bid, Date)SELECT NameFROM Sailors, ReservesWHERE Sailors.Sid=Reserves.Sid# Records of Sailors ns = 10,000; # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Block Nested Loop JoinQuestion 1: Using Block Nested Lo
15、op, if use Sailors as outer relation, what is the cost?Mechanism:Buffer M-2 pages for Outer Relation, one for Inner relation, the other for output. For each time, read 98 pages of S. Scan R byto find matchTotal 834 pages, thus Need 834/98 = 9 blocks/timesThe total cost is: 834+9*2500= 23,334 bout +
16、bout / (M 2) * binner# Records of Sailors ns = 10,000; # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Block Nested Loop JoinQuestion 2: Using Block Nested Loop, if use Reservations as outer relation, what is the c
17、ost?Mechanism:Buffer M-2 pages for Outer Relation, one for Inner relatoin, the other for output. bout + bout / (M 2) * binnerFor each time, read 98 pages of R. Scan S byto find matchTotal 2500 pages, thus Need 2500/98 = 26 timesThe total cost is: 2500+26*834= 24,184 # Records of Sailors ns = 10,000;
18、 # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Index Nested LoopQuestion 3: Assume use Hash Index on Reserve.SID. Assume for each sailor the cost for searching on the hash index take 1.2 page accesses (because of
19、 overflow buckets). For each Sailor, 1.2 + 4 = 5.2 pages need to spend to retrieve matching Reserve recordTotal Cost: cost of reading Sailors + # records in sailors*5.2= 834+ 10,000*5.2=52,83456 Frank57 Sophie58 James59 Tom145858 $213 $158 $814 $258 $332 $758 $5H(58)234Find entry in the index cost 1
20、.2 pagesAverage 4 entries need to retrieveSaliorsIndex using hashingon ReserveH(SID)=SID % 11Reserve# Records of Sailors ns = 10,000; # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Index Nested LoopQuestion 3: Ass
21、ume use Hash Index on Reserve.SID. Assume for each sailor the cost for searching on the hash index take 1.2 page accesses (because of overflow buckets). How to improve? Remember the query: Thus do not need to retrieve records from ReserveNow for each Sailor, only 1.2 pages needed to get the names wo
22、uld be matchedTotal Cost: cost of reading Sailors + # records in sailors*1.2= 834+ 10,000*1.2=12,83459 Tom58 James57 Sophie56 Frank145832 $758 $558 $314 $258 $813 $158 $2H(58)234SaliorsIndex using hashingon ReserveH(SID)=SID % 11ReserveSELECT NameFROM Sailors, ReservesWHERE Sailors.Sid=Reserves.SidS
23、ort Merge Join Question 4: Using Sort Merge Join method, sort Sailors and Reserve first and then merge. Estimate the total cost. Cost of Sorting Sailors on SidSorting Sailors: 834+417 (pass 0) + 417 + 417 (pass 1) = 2085Sailors(Sid, Name, Rating, Age)Reserves(Sid, Bid, Date)128341250sorted run 1sort
24、ed run 2sorted run 9run 1run 2run 9PASS 0PASS 151100401417At each sorted run we read on 100 sailor pages but we only write 50 because we discard attributes Rating, Age(they are not needed for the join and are not required in the result)Sort Merge Join (cont)Sort Reserves on Sid Cost: 2500+850 (pass
25、0) + 850=4200. As each sorted page of Reserves is generated we can directly find the joining tuples of sailors (in order to avoid writing the result of pass 1 in a temporary file and reading it again for the merge phase). Thus, for joining we only need to scan the sorted sailors file.Total cost: 208
26、5+4200+417= 6702 Sailors(Sid, Name, Rating, Age)Reserves(Sid, Bid, Date)More example about Merge JoinR1 has 1000 pages, R2 has 500 pagesi R1i.CR2j.C j110 51220 202320 203430 304540 305 506 527MemoryR1R2.R1R2Given R1 and R2 are already sorted on CTotal cost: Read R1 cost + read R2 cost= 1000 + 500 =
27、1,500 pagesHash Join on Sailor and ReservationUse the small relation (Sailors = 834) as the build input. How to choose the number of buckets n?n should be such that each partition for Sailors fits in memory, e.g., n= 10 buckets (so that average bucket size is 83.4 pages M). 100 main memory buffersDi
28、skDiskOriginal SOUTPUT1INPUT0hashfunctionh 9Partitions019. . .100 main memory buffersDiskDiskOriginal ROUTPUT1INPUT0hashfunctionh 9Partitions019. . .Cost of Partition S:834 + 834READWriteCost of Partition S:2500 + 2500READWriteHash Join(cont)Finally matching process:Use the small relation (Sailors = 834) as the build input, R as probe input. Match bucket of Sailors with corresp
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2024-2030年中國(guó)納米銀行業(yè)需求動(dòng)態(tài)及盈利前景預(yù)測(cè)報(bào)告
- 2024-2030年中國(guó)紅色指示燈行業(yè)市場(chǎng)現(xiàn)狀分析及競(jìng)爭(zhēng)格局與投資發(fā)展研究報(bào)告
- 校園科技服務(wù)合同模板
- 施工雙方合同模板
- 簽訂勞務(wù)合同模板模板
- 舞蹈教室有償租賃合同模板
- 掛燈籠外包合同模板
- 物料制作采購(gòu)合同模板
- 食堂員工協(xié)議合同模板
- 房租減免補(bǔ)充協(xié)議合同模板
- 華為認(rèn)證智能協(xié)作中級(jí)HCIP-CollaborationH11-861考試題及答案
- 電站安全操作規(guī)程
- 插畫(huà)作品授權(quán)使用合同
- 2024至2030年中國(guó)磁懸浮軸承行業(yè)市場(chǎng)發(fā)展規(guī)模及投資機(jī)會(huì)分析報(bào)告
- 老年人多重用藥評(píng)估與管理中國(guó)專家共識(shí)(2024版)解讀課件
- 2020-2024年高考地理復(fù)習(xí)試題分類匯編:地球上的水(北京專用)(解析版)
- iso20002食品安全管理體系標(biāo)準(zhǔn)
- -企業(yè)償債能力分析-以小米有限公司為例
- 2024-2030年中國(guó)合成革行業(yè)發(fā)展分析及發(fā)展趨勢(shì)預(yù)測(cè)與投資風(fēng)險(xiǎn)研究報(bào)告
- 贛美版-美術(shù)-初二-八年級(jí)-上冊(cè)-全冊(cè)課件-江西美術(shù)出版社
- 公務(wù)車定點(diǎn)輛維修 投標(biāo)方案(技術(shù)方案)
評(píng)論
0/150
提交評(píng)論