數(shù)據(jù)庫管理系統(tǒng):ch13 Query Processing_第1頁
數(shù)據(jù)庫管理系統(tǒng):ch13 Query Processing_第2頁
數(shù)據(jù)庫管理系統(tǒng):ch13 Query Processing_第3頁
數(shù)據(jù)庫管理系統(tǒng):ch13 Query Processing_第4頁
數(shù)據(jù)庫管理系統(tǒng):ch13 Query Processing_第5頁
已閱讀5頁,還剩29頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、Database System Concepts, 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use Silberschatz, Korth and Sudarshan13.2Database System Concepts - 5th Edition, Aug 27, 2005.nOverview nMeasures of Query CostnSelection Operation nSorting nJoin Operation nOther OperationsnEvaluation

2、 of ExpressionsSilberschatz, Korth and Sudarshan13.3Database System Concepts - 5th Edition, Aug 27, 2005.1. Parsing and translation2. Optimization3. EvaluationSilberschatz, Korth and Sudarshan13.4Database System Concepts - 5th Edition, Aug 27, 2005.nParsing and translationltranslate the query into i

3、ts internal form. This is then translated into relational algebra.lParser checks syntax, verifies relationsnEvaluationlThe query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.Silberschatz, Korth and Sudarshan13.5Database System Concepts - 5t

4、h Edition, Aug 27, 2005.nA relational algebra expression may have many equivalent expressionslE.g., balance2500(balance(account) is equivalent to balance(balance2500(account)nEach relational algebra operation can be evaluated using one of several different algorithmslCorrespondingly, a relational-al

5、gebra expression can be evaluated in many ways. nAnnotated expression specifying detailed evaluation strategy is called an evaluation-plan.lE.g., can use an index on balance to find accounts with balance v; do not use indexnA7 (secondary index, comparison). 4For A V(r) use index to find first index

6、entry v and scan index sequentially from there, to find pointers to records.4For AV (r) just scan leaf pages of index finding pointers to records, till first entry v4In either case, retrieve records that are pointed to requires an I/O for each record Linear file scan may be cheaperSilberschatz, Kort

7、h and Sudarshan13.13Database System Concepts - 5th Edition, Aug 27, 2005.nSeveral different algorithms to implement joinslNested-loop joinlBlock nested-loop joinlIndexed nested-loop joinlMerge-joinlHash-joinnChoice based on cost estimatenExamples use the following informationlNumber of records of cu

8、stomer: 10,000 depositor: 5000lNumber of blocks of customer: 400 depositor: 100Silberschatz, Korth and Sudarshan13.14Database System Concepts - 5th Edition, Aug 27, 2005.nTo compute the theta join r sfor each tuple tr in r do beginfor each tuple ts in s do begintest pair (tr,ts) to see if they satis

9、fy the join condition if they do, add tr ts to the result.endendnr is called the outer relation and s the inner relation of the join.nRequires no indices and can be used with any kind of join condition.nExpensive since it examines every pair of tuples in the two relations. Silberschatz, Korth and Su

10、darshan13.15Database System Concepts - 5th Edition, Aug 27, 2005.nIn the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is nr bs + br block transfers, plus nr + br seeksnIf both relations fit entirely in memory, in the best case.l Reduces cost to br

11、 + bs block transfers and 2 seeksnAssuming worst case memory availability cost estimate islwith depositor as outer relation:45000 400 + 100 = 2,000,100 block transfers,45000 + 100 = 5100 seeks lwith customer as the outer relation 410000 100 + 400 = 1,000,400 block transfers and 10,400 seeksnIn the b

12、est case, the cost estimate will be 500 block transfers.nBlock nested-loops algorithm (next slide) is preferable.Silberschatz, Korth and Sudarshan13.16Database System Concepts - 5th Edition, Aug 27, 2005.nVariant of nested-loop join in which every block of inner relation is paired with every block o

13、f outer relation.for each block Br of r do beginfor each block Bs of s do beginfor each tuple tr in Br do beginfor each tuple ts in Bs do beginCheck if (tr,ts) satisfy the join condition if they do, add tr ts to the result.endendendendSilberschatz, Korth and Sudarshan13.17Database System Concepts -

14、5th Edition, Aug 27, 2005.nWorst case estimate: br bs + br block transfers + 2 * br seekslEach block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relationnBest case: br + bs block transfers + 2 seeks.nImprovements to nested lo

15、op and block nested loop algorithms:lIn block nested-loop, use M 2 disk blocks as blocking unit for outer relations, where M = memory size in blocks; use remaining two blocks to buffer inner relation and output4 Cost = br / (M-2) bs + br block transfers + 2 br / (M-2) seekslIf equi-join attribute fo

16、rms a key or inner relation, stop inner loop on first matchlScan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU replacement)lUse index on inner relation if available (next slide)Silberschatz, Korth and Sudarshan13.18Database System Concepts - 5th

17、 Edition, Aug 27, 2005.nIndex lookups can replace file scans ifljoin is an equi-join or natural join andlan index is available on the inner relations join attribute4Can construct an index just to compute a join.nFor each tuple tr in the outer relation r, use the index to look up tuples in s that sat

18、isfy the join condition with tuple tr.nWorst case: buffer has space for only one page of r, one page of index(for s), and, for each tuple in r, we perform an index lookup on s.nCost of the join: br (tT + tS) + nr clWhere c is the cost of traversing index and fetching all matching s tuples for one tu

19、ple or rlc can be estimated as cost of a single selection on s using the join condition.nIf indices are available on join attributes of both r and s,use the relation with fewer tuples as the outer relation.Silberschatz, Korth and Sudarshan13.19Database System Concepts - 5th Edition, Aug 27, 2005.nCo

20、mpute depositor customer, with depositor as the outer relation.nLet customer have a primary B+-tree index on the join attribute customer-name, which contains 20 entries in each index node.nSince customer has 10,000 tuples, the height of the tree is 4, and one more access is needed to find the actual

21、 datandepositor has 5000 tuplesnCost of block nested loops joinl400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks4 assuming worst case memory 4may be significantly less with more memoryn Cost of indexed nested loops joinl100 + 5000 * 5 = 25,100 block transfers and seeks.lCPU cost likely t

22、o be less than that for block nested loops join Silberschatz, Korth and Sudarshan13.20Database System Concepts - 5th Edition, Aug 27, 2005.1.Sort both relations on their join attribute (if not already sorted on the join attributes).2.Merge the sorted relations to join them1.Join step is similar to t

23、he merge stage of the sort-merge algorithm. 2.Main difference is handling of duplicate values in join attribute every pair with same value on join attribute must be matched3.Detailed algorithm in bookSilberschatz, Korth and Sudarshan13.21Database System Concepts - 5th Edition, Aug 27, 2005.nCan be u

24、sed only for equi-joins and natural joinsnEach block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memorynThus the cost of merge join is: br + bs block transfers + br / bb + bs / bb seeks+ the cost of sorting if relations are unsorted.bb is the num

25、ber of buffer for the relations.Silberschatz, Korth and Sudarshan13.22Database System Concepts - 5th Edition, Aug 27, 2005.nApplicable for equi-joins and natural joins.nA hash function h is used to partition tuples of both relations nh maps JoinAttrs values to 0, 1, ., n, where JoinAttrs denotes the

26、 common attributes of r and s used in the natural join. lr0, r1, . . ., rn denote partitions of r tuples4Each tuple tr r is put in partition ri where i = h(tr JoinAttrs).lr0, r1. . ., rn denotes partitions of s tuples4Each tuple ts s is put in partition si, where i = h(ts JoinAttrs).nNote: In book,

27、ri is denoted as Hri, si is denoted as Hsi and n is denoted as nh. Silberschatz, Korth and Sudarshan13.23Database System Concepts - 5th Edition, Aug 27, 2005.Silberschatz, Korth and Sudarshan13.24Database System Concepts - 5th Edition, Aug 27, 2005.nr tuples in ri need only to be compared with s tup

28、les in si Need not be compared with s tuples in any other partition, since:lan r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.lIf that value is hashed to some value i, the r tuple has to be in ri and the s tuple in si.Silberschatz, Korth and S

29、udarshan13.25Database System Concepts - 5th Edition, Aug 27, 2005.1. Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition.2. Partition r similarly.3. For each i:(a) Load si into memory and build an in

30、-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h.(b) Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attribu

31、tes.The hash-join of r and s is computed as follows.Relation s is called the build input and r is called the probe input.Silberschatz, Korth and Sudarshan13.26Database System Concepts - 5th Edition, Aug 27, 2005.nThe value n and the hash function h is chosen such that each si should fit in memory.lT

32、ypically n is chosen as bs/M * f where f is a “fudge factor”, typically around 1.2lThe probe relation partitions si need not fit in memorynRecursive partitioning required if number of partitions n is greater than number of pages M of memory.nIf M2bs, ,no need recursive partitioning Silberschatz, Kor

33、th and Sudarshan13.27Database System Concepts - 5th Edition, Aug 27, 2005.nIf recursive partitioning is not required: cost of hash join is 3(br + bs) +4 nh block transfers + 2( br / bb + bs / bb) seeks often 4 nh can be omitted. nIf the entire build input can be kept in main memory no partitioning(n

34、h=0) is requiredlCost estimate goes down to br + bs.plus 2 seeksSilberschatz, Korth and Sudarshan13.28Database System Concepts - 5th Edition, Aug 27, 2005.nAssume that memory size is 20 blocksnbdepositor= 100 and bcustomer = 400.nassuming 3 buffers for these two relations. Therefore total cost, igno

35、ring cost of writing partially filled blocks:l3(100 + 400) = 1500 block transfers +2( 100/3 + 400/3) = 336 seekscustomer depositorSilberschatz, Korth and Sudarshan13.29Database System Concepts - 5th Edition, Aug 27, 2005.nMaterialized evaluation: evaluate one operation at a time, starting at the low

36、est-level. Use intermediate results materialized into temporary relations to evaluate next-level operations.nE.g., in figure below, compute and storethen compute the store its join with customer, and finally compute the projections on customer-name. )(2500accountbalanceSilberschatz, Korth and Sudars

37、han13.30Database System Concepts - 5th Edition, Aug 27, 2005.nMaterialized evaluation is always applicablenCost of writing results to disk and reading them back can be quite highlOur cost formulas for operations ignore cost of writing results to disk, so4Overall cost = Sum of costs of individual ope

38、rations + cost of writing intermediate results to disknDouble buffering: use two output buffers for each operation, when one is full write it to disk while the other is getting filledlreduces execution timeSilberschatz, Korth and Sudarshan13.31Database System Concepts - 5th Edition, Aug 27, 2005.nPi

39、pelined evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next.nE.g., in previous expression tree, dont store result of linstead, pass tuples directly to the join. Similarly, dont store result of join, pass tuples directly to projection. nMuch cheaper than materialization: no need to store a temporary relation to disk.nPipelining may

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論