數(shù)據(jù)庫(kù)管理系統(tǒng)概述英文版課件:tutorial9 Join Algorithms_第1頁(yè)
數(shù)據(jù)庫(kù)管理系統(tǒng)概述英文版課件:tutorial9 Join Algorithms_第2頁(yè)
數(shù)據(jù)庫(kù)管理系統(tǒng)概述英文版課件:tutorial9 Join Algorithms_第3頁(yè)
數(shù)據(jù)庫(kù)管理系統(tǒng)概述英文版課件:tutorial9 Join Algorithms_第4頁(yè)
數(shù)據(jù)庫(kù)管理系統(tǒng)概述英文版課件:tutorial9 Join Algorithms_第5頁(yè)
已閱讀5頁(yè),還剩13頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、COMP231 Tutorial 9Join AlgorithmsReview: Block Nested-Loop JoinRead in the outer relation r page by pageFor each page of r, scan the entire inner relation s.Best cost: br + bs (if s is small enough to fit in memory)br (bs): no. of pages of r (s)Buffer needed at least: 3 pages (1 for r, 1 for s, 1 fo

2、r output)If memory has M pages, read M 2 pages of r at a time and use the remaining two pages for s and the output.Cost = br / (M 2) bs + br0005000200040002000200010005000500020002000300020005rsReview: Indexed Nested-Loop JoinIndex lookup can replace file scan if an index is available on the join at

3、tribute of inner relationFor each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.0005000200040002000200010005r000500020002000300020005sindex on sCost of the join: br + nr cnr: no. of tuples in rc is the cost to traverse index and

4、fetch all matching s tuples for one tr.c can be estimated as cost of a single selection on s using the join condition.If indices are available on the join attribute of both r and s,use the relation with fewer tuples as the outer relation.Review: External Sort-MergeAlgorithm:Create sorted runsN-way m

5、erge:Read the first page of each run into buffer pageREPEATSelect the first record (in sorted order) among all buffer pagesWrite the record to the output buffer. If the output buffer is full write it to disk.Delete the record from its input buffer page.IF the buffer page becomes empty THENread the n

6、ext page (if any) of the run into the buffer. UNTIL all input buffer pages are empty1210111220791792800sorted run 1sorted run 2sorted run 80pass0pass1run 1run 2run 910*9 pagessorted run 110*8 pagessorted run 9pass2sorted file800 pagesEg: 800 pages of a file10 pages of main memoryReview: Sort-Merge J

7、oinSort both relations on the join attribute (if not already sorted)Merge the sorted relations to join themJoin step is similar to the merge stage of the MergeSort algorithm Main difference is the handling of duplicate values in join attribute every pair with the same value on join attribute must be

8、 matchedCan be used only for equi-joins and natural joinsEach block needs to be read only once Cost = br + bs + the cost of sorting if relations are unsorted0001000200020002000400050005000200020002000300050005Review: Hash Join00010002000200020004000500050002000200020003000500050001000200020002000400

9、050005000200020002000300050005 Applicable for equi-joins and natural joins A hash function h is used to partition tuples of both relations h maps JoinAttrs values to 0, 1, ., n, where JoinAttrs denotes the common attributes of r and s used in the natural join. r0, r1, . . ., rn denote partitions of

10、r tuples; each tuple tr r is put in partition ri,where i = h(tr JoinAttrs). s0, s1. . ., sn denotes partitions of s tuples; each tuple ts s is put in partition si, where i = h(ts JoinAttrs).Join ExerciseAssume the query: Find the names of sailors who have reservations# Records of Sailors nr = 10,000

11、; # Records of Reservations ns = 40, 000. # of memory pages M = 100Sailors(Sid, Name, Rating, Age)Reserves(Sid, Bid, Date)SELECT NameFROM Sailors, ReservesWHERE Sailors.Sid=Reserves.SidJoin ExerciseAssume the query: Find the names of sailors who have reservationsEach attribute is 20 bytes. Each page

12、 is 1000 bytes. There are 10,000 sailors, 40,000 reservations and a buffer of M=100 pages.Question 0: How many pages does Sailors contain? How many pages does Reservations contain?Each sailor record is 80 bytes and each reserves record 60 bytes. There are 1000/80=12 sailors per page and 1000/60=16 r

13、eservations per page. Sailors contains 10000/12=834 pages and reserves 40000/16=2500 pages. Since there are 10K sailors and 40K reservations, a sailor has on the average 4 reservations (but it is possible that some sailors have more, while others have none). Sailors(Sid, Name, Rating, Age)Reserves(S

14、id, Bid, Date)SELECT NameFROM Sailors, ReservesWHERE Sailors.Sid=Reserves.Sid# Records of Sailors ns = 10,000; # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Block Nested Loop JoinQuestion 1: Using Block Nested Lo

15、op, if use Sailors as outer relation, what is the cost?Mechanism:Buffer M-2 pages for Outer Relation, one for Inner relation, the other for output. For each time, read 98 pages of S. Scan R byto find matchTotal 834 pages, thus Need 834/98 = 9 blocks/timesThe total cost is: 834+9*2500= 23,334 bout +

16、bout / (M 2) * binner# Records of Sailors ns = 10,000; # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Block Nested Loop JoinQuestion 2: Using Block Nested Loop, if use Reservations as outer relation, what is the c

17、ost?Mechanism:Buffer M-2 pages for Outer Relation, one for Inner relatoin, the other for output. bout + bout / (M 2) * binnerFor each time, read 98 pages of R. Scan S byto find matchTotal 2500 pages, thus Need 2500/98 = 26 timesThe total cost is: 2500+26*834= 24,184 # Records of Sailors ns = 10,000;

18、 # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Index Nested LoopQuestion 3: Assume use Hash Index on Reserve.SID. Assume for each sailor the cost for searching on the hash index take 1.2 page accesses (because of

19、 overflow buckets). For each Sailor, 1.2 + 4 = 5.2 pages need to spend to retrieve matching Reserve recordTotal Cost: cost of reading Sailors + # records in sailors*5.2= 834+ 10,000*5.2=52,83456 Frank57 Sophie58 James59 Tom145858 $213 $158 $814 $258 $332 $758 $5H(58)234Find entry in the index cost 1

20、.2 pagesAverage 4 entries need to retrieveSaliorsIndex using hashingon ReserveH(SID)=SID % 11Reserve# Records of Sailors ns = 10,000; # Records of Reservations nr = 40, 000. # of pages of Sailors bs, = 834; # of pages of Reservations br = 2500# of memory pages M = 100Index Nested LoopQuestion 3: Ass

21、ume use Hash Index on Reserve.SID. Assume for each sailor the cost for searching on the hash index take 1.2 page accesses (because of overflow buckets). How to improve? Remember the query: Thus do not need to retrieve records from ReserveNow for each Sailor, only 1.2 pages needed to get the names wo

22、uld be matchedTotal Cost: cost of reading Sailors + # records in sailors*1.2= 834+ 10,000*1.2=12,83459 Tom58 James57 Sophie56 Frank145832 $758 $558 $314 $258 $813 $158 $2H(58)234SaliorsIndex using hashingon ReserveH(SID)=SID % 11ReserveSELECT NameFROM Sailors, ReservesWHERE Sailors.Sid=Reserves.SidS

23、ort Merge Join Question 4: Using Sort Merge Join method, sort Sailors and Reserve first and then merge. Estimate the total cost. Cost of Sorting Sailors on SidSorting Sailors: 834+417 (pass 0) + 417 + 417 (pass 1) = 2085Sailors(Sid, Name, Rating, Age)Reserves(Sid, Bid, Date)128341250sorted run 1sort

24、ed run 2sorted run 9run 1run 2run 9PASS 0PASS 151100401417At each sorted run we read on 100 sailor pages but we only write 50 because we discard attributes Rating, Age(they are not needed for the join and are not required in the result)Sort Merge Join (cont)Sort Reserves on Sid Cost: 2500+850 (pass

25、0) + 850=4200. As each sorted page of Reserves is generated we can directly find the joining tuples of sailors (in order to avoid writing the result of pass 1 in a temporary file and reading it again for the merge phase). Thus, for joining we only need to scan the sorted sailors file.Total cost: 208

26、5+4200+417= 6702 Sailors(Sid, Name, Rating, Age)Reserves(Sid, Bid, Date)More example about Merge JoinR1 has 1000 pages, R2 has 500 pagesi R1i.CR2j.C j110 51220 202320 203430 304540 305 506 527MemoryR1R2.R1R2Given R1 and R2 are already sorted on CTotal cost: Read R1 cost + read R2 cost= 1000 + 500 =

27、1,500 pagesHash Join on Sailor and ReservationUse the small relation (Sailors = 834) as the build input. How to choose the number of buckets n?n should be such that each partition for Sailors fits in memory, e.g., n= 10 buckets (so that average bucket size is 83.4 pages M). 100 main memory buffersDi

28、skDiskOriginal SOUTPUT1INPUT0hashfunctionh 9Partitions019. . .100 main memory buffersDiskDiskOriginal ROUTPUT1INPUT0hashfunctionh 9Partitions019. . .Cost of Partition S:834 + 834READWriteCost of Partition S:2500 + 2500READWriteHash Join(cont)Finally matching process:Use the small relation (Sailors = 834) as the build input, R as probe input. Match bucket of Sailors with corresp

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論