版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、呂學(xué)一 (中央研究院 資訊科學(xué)所).tw/hil/nOur linear-time algorithm for computing all Z-values in last weeks slides might go wrong. So, please (1) Find out where of our algorithm might go wrong. (Hint: 右(i) could be i 1. Should be more careful about some boundary condition.) (2) Explain how
2、to fix the bug.nTwo possible methods: (Dont do both.) Hand it to our TAs (e.g., bring it to our classroom or try them at their offices), or Email your file to one of our TAs. (Dont send it to our broadcast address. Otherwise, you get 0 point for this homework.)nDue: (100%) 11:59pm, Oct. 14, 2003. (5
3、0%) 1:20pm, Oct. 21, 2003.nYou are welcome to discuss the problem with anybody. However, you have to write your solution by yourself. Copying solutions of others is prohibited. Anybody doing that will get zero point for ALL of his/her homeworks of this course.nList the names of those who have discus
4、sed with you before you write down your solutions.nSuffix tree an extremely powerful data structure for string algorithms.nIntermission 小巨s magic show: “Prediction Impossible”. Input: P and S. Output: All occurrences of P in S. Time: O(|P| + |S|) Technique: Z values of PS.nZ(i + |P|) |P| iff P = Sii
5、 + |P| 1. Pi+|P|i+|P|+d-1SnSolving the Exact String Matching problem in O(|S|) time under the assumption that P is known and already pre-processed in O(|P|) time? E.g., P is one of your private collection of DNA sequence.nAnswer: trivial, because |S| |P|).nSolving the Exact String Matching problem i
6、n O(|P|) time under the assumption that S is known and already pre-processed in O(|S|) time? E.g., S is a dictionary whose content does not change frequently.nAnswer: may not always be possible, e.g., S and P are both all-1 strings.The Substring Problem Input: P and S. Output: an occurrence of P in
7、S.nSolving the Substring problem in O(|S|) time under the assumption that P is known and already pre-processed in O(|P|) time?nAnswer: trivialnSolving the Substring problem in O(|P|) time under the assumption that S is known and already pre-processed in O(|S|) time?nAnswer: possible, but its not cle
8、ar how to do it using Gusfields Z-value algorithm.nPreprocessing P Gusfield Boyer-Moore Knuth-Morris-PrattnPreprocessing S Suffix treeS = b b a b b a a bS18= b b a b b a a bS28= b a b b a a bS38= a b b a a bS48= b b a a bS58= b a a bS68= a a bS78= a bS88= b 1st suffix2nd suffix3rd suffix4th suffix5t
9、h suffix6th suffix7th suffix8th suffixS = b b a b b a a bS18= b b a b b a a bS28= b a b b a a bS38= a b b a a bS48= b b a a bS58= b a a bS68= a a bS78= a bS88= b 1st suffix2nd suffix3rd suffix4th suffix5th suffix6th suffix7th suffix8th suffixb b a b b a a b b a b b a a b a b b a a b b b a a b b a a
10、b a a b a b b nThe following statements are equivalent. P occurrs in S. P is a prefix of a suffix of S. P corresponds to a path of T starting from the root of T.b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b P occurs in S!b b a b b a a b b a b b a a b a b b a a b b b a a b b
11、 a a b a a b a b b P doesnt occur in S!b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b P doesnt occur in S!844441111155522222676333731 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Output: 3nQuestion: If we are given a suffix trie of S,
12、what is the time complexity for finding an occurrence of P in S?nAnswer: O(|P|) time.Q: Time complexity for constructing the suffix trie T of S?Q(|S|)Q(|S| log |S|)Q(|S|2)Q(|S|3)844441111155522222676333731 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nHow to e
13、stablish a lower bound?nAnswer: Showing an example which takes W(|S|2) time for any algorithm.nSuffix trie is good in solving Substring Problem, but may require W(|S|2) time and space.nQuestion: is there a compact representation of suffix trie that needs only O(|S|) time and space?A compact represen
14、tation of suffix trienT has at most |S| leaves. Why?nT has at most |S| 1 branching nodes. Why?nKeeping leaves and branching nodes only.ncompact representation of edge labels5,85,85,85,84,81,12,23,35,85,85,85,84,81,12,23,31,12,34,87,84,87,84,87,83,33,33,31,13,32,37,84,87,84,87,84,83,31,13,32,37,84,87
15、,84,87,84,8nThe space complexity of suffix tree O(|S|) O(|S| log |S|) O(|S|2) O(|S|3)nWhy? Number of nodes = O(|S|). Number of edges = O(|S|). Space required by each edge = O(1).Constructing Suffix Tree in Linear TimenWeiner, IEEE FOCS 1973 Linear time but expensive in space. D. E. Knuth: “the algor
16、ithm of 1973”.nMcCreight, J. ACM 1976 Linear time and quadratic space.nUkkonen, Algorithmica 1995 Linear time and linear space. Much better readability.nLet T(k) denote the suffix tree of S1k.nFrameworkcompute T(1);for k = 2 to |S| do compute T(k) from T(k-1);S = b b a b b a a bS18= b b a b b a a bS
17、28= b a b b a a bS38= a b b a a bS48= b b a a bS58= b a a bS68= a a bS78= a bS88= b b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b S = b b a b b a a b1,12,31,23,33,31,12,43,43,42,53,53,52,63,63,63,37,74,73,37,74,72,37,74,77,84,87,84,87,84,8nObservation: The tree structure do
18、es not change very often, i.e., only O(|S|) times.nAt the beginning of the k-th iteration, there are exactly k “growing points”(生長(zhǎng)點(diǎn)), all with different heights.nThe k-th iteration iteratively grows k edges with label Sk at those k growing points.b b a b b a a b b a b b a a b a b b a a b b b a a b b
19、 a a b a a b a b b nThree cases while growing trie Case 1: growing an edge at a leaf. Case 2: growing a new branch of leaf. Case 3: does not change the tree structure.1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長(zhǎng)此以往Case 2: 節(jié)外生枝Case 3: 若無(wú)其事nThose k s
20、teps in the k-th iteration have the following pattern: some (at least one) Case-1 steps, followed by some (could be zero) Case-2 steps, followed by some (could be zero) Case-3 steps.1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長(zhǎng)此以往Case 2: 節(jié)外生枝Case 3:
21、 若無(wú)其事n在同一個(gè)iteration之中,長(zhǎng)此以往之前(下)的step一定也是長(zhǎng)此以往。nAt the end of each iteration, if a growing point is a leaf, then all previous (lower) growing points are also leaves. Why?b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nThe i-th growing point = the end of the i-th suffix Sik of
22、the current prefix S1k.nArgument: the i-th (i 1) growing point is a leaf. Neither Sika nor Si.kb is a substring of S1k. Neither Si1ka nor Si1.kb is a substring of S1k. The (i 1)-st growing point is a leaf.b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nAt the beginning of th
23、e current iteration, its corresponding growing point is a leaf.b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b nAt the end of each iteration, if a growing point is a leaf, then all previous (lower) growing points are also leaves. nTherefore, in the next iteration, if a step i
24、s a “長(zhǎng)此以往”, then all previous steps are also “長(zhǎng)此以往”.n在同一個(gè)iteration之中,若無(wú)其事之後(上)的step一定也是若無(wú)其事。nAt the end of each iteration, if a growing point is an internal node, then all latter (higher) growing points are also internal. Why?b b a b b a b a b b a a b b a b b a b a a nThe i-th growing point = the en
25、d of the i-th suffix Sik of the current prefix S1k.nArgument: the i-th (i k) growing point is internal. Sika or Si.kb is a substring of S1k. Si+1ka or Si+1.kb is a substring of S1k. The (i+1)-st growing point is internal.b b a b b a b a b b a a b b a b b a b a a nAt the end of the current iteration,
26、 its corresponding growing point is still an internal node.b b a b b a b a b b a a b b a b b a b a a nAt the end of each iteration, if a growing point is an internal node, then all latter (higher) growing points are also internal.nTherefore, in “this” iteration, if a step is “若無(wú)其事”, then all its fol
27、lowing steps are also “若無(wú)其事”.nThose k steps in the k-th iteration have the following pattern: some (at least one) Case-1 steps, followed by some (could be zero) Case-2 steps, followed by some (could be zero) Case-3 steps.This is about the status within the same iteration葉子的宿命 一日為葉子, 終身為葉子b b a b b a
28、 a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長(zhǎng)此以往Case 2: 節(jié)外生枝Case 3: 若無(wú)其事nOnly O(|S|) Case-2 steps throughout the execution.nWhy?n 葉子生長(zhǎng)點(diǎn)之所對(duì)應(yīng)的edge labeln 斷頭指標(biāo)n Case-1 StepnAlways occurs at a leaf (growing point).nAt the beginning of iteration k, the path above a leaf has label
29、 ?, k1.nEach Case-1 step in iteration k simply changes the label ?, k1 to ?, k.?, k1?, knUse ?, - to label the path above each leaf. Thus, no need to do anything for each case-1 step.?, -S = b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,11 2111nWe can simply ignore all Case-1 steps.nRecal
30、l that the number of Case-2 steps is at most |S|.nQ: Is this good enough?Case 1: 長(zhǎng)此以往Case 2: 節(jié)外生枝Case 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case-2 Steps: 節(jié)外生枝nAlways occurs at a growing point that is not a leaf, which is not necessarily an inte
31、rnal node.nWhen the growing point is an internal node Label = k, -, where k is the current iteration index.k, -nWhen the growing point is not an internal node k, -ti, ji+t, ji, i+t-1Case-3 Steps: 若無(wú)其事nAlways occurs at a growing point that is not a leaf, which is not necessarily an internal node. nWh
32、en the new position of the growing point is an internal node ti, j0nWhen the new position of the growing point is not an internal node ti, jt + 1nGive a sequence S such that the number of Case-3 steps throughout the process of growing its suffix trie (or suffix tree) is W(|S|2).Completely ignore the
33、m以若無(wú)其事的態(tài)度處理若無(wú)其事的狀況ti, j0ti, jt + 11.For correctly maintaining the position of each growing point. (Why?)2.For correctly running Case-2 steps. (By what?)k, -ti, ji+t, ji, i+t-1Saving the book keeping efforts in all Case-3 steps Case 1: 長(zhǎng)此以往Case 2: 節(jié)外生枝Case 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b
34、b a a b a b b a a b b b a a b b a a b a a b a b b nJust keep one current growing point throughout the execution.nDeriving the new position of the current growing point from its previous position (with the helpusing suffix links (斷頭指標(biāo))Case 2: 節(jié)外生枝Case 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a
35、 b a b b a a b b b a a b b a a b a a b a b b The challenges: How do we derive the position of the current growing point? Vertically (case 2) Horizontally (case 3)nQ: Which one is easier?nMoving from iteration k 1 to iteration k.nThe growing point does not move!nThis is the easier case.Case 2: 節(jié)外生枝Ca
36、se 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b 1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長(zhǎng)此以往Case 2: 節(jié)外生枝Case 3: 若無(wú)其事nMoving from Step i to Step i+1 in the same iteration.nThe growing point moves d
37、ramatically.nThis is the tougher case.Case 2: 節(jié)外生枝Case 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b 1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長(zhǎng)此以往Case 2: 節(jié)外生枝Case 3: 若無(wú)其事前人種樹(shù)後人涼的哲學(xué)n每次千辛萬(wàn)苦找到vertical
38、movement的目的時(shí), 把這個(gè)movement的起點(diǎn)與終點(diǎn)用一個(gè)link記錄下來(lái).n下回遇到這個(gè)起點(diǎn)時(shí), 就可以直接走到終點(diǎn)去,不用再重新找一次.n這些link就叫做suffix link (斷頭指標(biāo)).n終點(diǎn)所對(duì)應(yīng)的字串,是起點(diǎn)所對(duì)應(yīng)之字串的斷頭字串(second suffix)Case 2: 節(jié)外生枝Case 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b n每個(gè)斷頭指標(biāo)的起點(diǎn)一定是個(gè)internal node, 不會(huì)葉子 不會(huì)是某個(gè)suffix
39、tree edge的中間nWhy?Case 2: 節(jié)外生枝Case 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b n每個(gè)internal node一定是某個(gè)斷頭指標(biāo)的起點(diǎn)nWhy?Case 2: 節(jié)外生枝Case 3: 若無(wú)其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b S = b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,112111nGoing up to a closest internal node (whose suffix link must be available). Suppose this upward traversal passes throu
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025屆湖南省岳陽(yáng)市岳陽(yáng)縣第一中學(xué)高一物理第一學(xué)期期中預(yù)測(cè)試題含解析
- 2025屆陜西省西安市高新一中物理高一第一學(xué)期期末預(yù)測(cè)試題含解析
- 2025屆海南省等八校物理高一第一學(xué)期期末檢測(cè)模擬試題含解析
- 福建省“超級(jí)全能生”2025屆高二物理第一學(xué)期期中經(jīng)典試題含解析
- 2025屆河南省焦作市普通高中物理高二上期中調(diào)研試題含解析
- 浙江省湖州市長(zhǎng)興縣、德清縣、安吉縣三縣2025屆物理高二第一學(xué)期期末教學(xué)質(zhì)量檢測(cè)模擬試題含解析
- 2025屆新疆維吾爾自治區(qū)沙灣一中高三上物理期中統(tǒng)考試題含解析
- 吉林省遼源五中2025屆物理高一上期末統(tǒng)考試題含解析
- 2025屆廣西柳州市鐵一中學(xué)物理高三第一學(xué)期期中統(tǒng)考模擬試題含解析
- 2025屆河南省盧氏實(shí)驗(yàn)高中高二物理第一學(xué)期期末學(xué)業(yè)水平測(cè)試試題含解析
- 2024年1月浙江省高考英語(yǔ)試題卷附答案
- 術(shù)中獲得性壓力損傷預(yù)防
- 計(jì)算機(jī)科學(xué)與技術(shù)職業(yè)生涯發(fā)展展示
- 護(hù)理腫瘤溶解綜合癥
- 騰訊營(yíng)銷(xiāo)師認(rèn)證考試題庫(kù)(附答案)
- 我的生涯發(fā)展
- 銀行存款業(yè)務(wù)課件
- 2024年揚(yáng)州市職業(yè)大學(xué)高職單招(英語(yǔ)/數(shù)學(xué)/語(yǔ)文)筆試歷年參考題庫(kù)含答案解析
- 2024年全國(guó)初中數(shù)學(xué)競(jìng)賽試題及答案
- 前交叉韌帶重建術(shù)后康復(fù)訓(xùn)練
- 安防監(jiān)控系統(tǒng)技術(shù)標(biāo)投標(biāo)書(shū)范本-圖文
評(píng)論
0/150
提交評(píng)論