版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、OutlineIntroductionBackgroundDistributed Database DesignDatabase IntegrationSchema MatchingSchema MappingSemantic Data ControlDistributed Query ProcessingMultimedia Query ProcessingDistributed Transaction ManagementData ReplicationParallel Database SystemsDistributed Object DBMSPeer-to-Peer Data Man
2、agementWeb Data Management Current IssuesProblem DefinitionGiven existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS)GCS is also called mediated schemaBottom-up design processIntegration AlternativesPhysical integrationSource
3、databases integrated and the integrated database is materializedData warehousesLogical integrationGlobal conceptual schema is virtual and not materializedEnterprise Information Integration (EII)Data Warehouse ApproachBottom-up DesignGCS (also called mediated schema) is defined firstMap LCSs to this
4、schemaAs in data warehousesGCS is defined as an integration of parts of LCSsGenerate GCS and map LCSs to this GCSGCS/LCS RelationshipLocal-as-viewThe GCS definition is assumed to exist, and each LCS is treated as a view definition over itGlobal-as-viewThe GCS is defined as a set of views over the LC
5、SsDatabase Integration ProcessRecall Access ArchitectureDatabase Integration IssuesSchema translationComponent database schemas translated to a common intermediate canonical representationSchema generationIntermediate schemas are used to create a global conceptual schemaSchema TranslationWhat is the
6、 canonical data model?RelationalEntity-relationshipDIKEObject-orientedARTEMISGraph-orientedDIPE, TranScm, COMA, CupidPreferable with emergence of XMLNo common graph formalismMapping algorithmsThese are well-knownSchema GenerationSchema matchingFinding the correspondences between multiple schemasSche
7、ma integrationCreation of the GCS (or mediated schema) using the correspondencesSchema mappingHow to map data from local databases to the GCSImportant: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCSRunning ExampleEMP(ENO, ENAME, TITLE)PROJ(P
8、NO, PNAME, BUDGET, LOC, CNAME)ASG(ENO, PNO, RESP, DUR)PAY(TITLE, SAL)RelationalE-R ModelSchema MatchingSchema heterogeneityStructural heterogeneityType conflictsDependency conflictsKey conflictsBehavioral conflictsSemantic heterogeneityMore important and harder to deal withSynonyms, homonyms, hypern
9、ymsDifferent ontologyImprecise wordingSchema Matching (contd)Other complicationsInsufficient schema and instance informationUnavailability of schema documentationSubjectivity of matchingIssues that affect schema matchingSchema versus instance matchingElement versus structure level matchingMatching c
10、ardinalitySchema Matching ApproachesLinguistic Schema MatchingUse element names and other textual information (textual descriptions, annotations) May use external sources (e.g., Thesauri)SC1.element-1 SC2.element-2, p,sElement-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p hold
11、s with a similarity value of sSchema levelDeal with names of schema elementsHandle cases such as synonyms, homonyms, hypernyms, data type similaritiesInstance levelFocus on information retrieval techniques (e.g., word frequencies, key terms)“Deduce” similarities from theseLinguistic MatchersUse a se
12、t of linguistic (terminological) rulesBasic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet)Predicate p and similarity value s hand-crafted specified, discovered may be computed or specified by an expert after discoveryExamplesuppercase names lower case names, true
13、, 1.0uppercase names capitalized names, true, 1.0capitalized names lower case names, true, 1.0DB1.ASG DB2.WORKS_IN, true, 0.8Automatic Discovery of Name SimilaritiesAffixesCommon prefixes and suffixes between two element name stringsN-gramsComparing how many substrings of length n are common between
14、 the two name stringsEdit distanceNumber of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the otherSoundex codePhonetic similarity between names based on their soundex codesAlso look at data typesData type similarity may suggest stro
15、nger relationship than the computed similarity using these methods or to differentiate between multiple strings with same valueN-gram Example3-grams of string “Responsibility” are the following:Res sibibi espbip spoili ponlitonsitynsi3-grams of string “Resp” areResesp3-gram similarity: 2/12 = 0.17Ed
16、it Distance ExampleAgain consider “Responsibility” and “Resp”To convert “Responsibility” to “Resp”Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l(fā)”, “i”, “t”, “y”To convert “Resp” to “Responsibility”Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l(fā)”, “i”, “t”, “y”The number of edit operations requir
17、ed is 10Similarity is 1 (10/14) = 0.29Constraint-based MatchersData always have constraints use themData type informationValue rangesExamplesRESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low)If they come from the same domain, this may increase their similarity v
18、alueENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-RENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRINGConstraint-based Structural MatchingIf two schema elements are structurally similar, then there is a higher likelihood that they represent the same conceptStr
19、uctural similarity:Same properties (attributes)“Neighborhood” similarityUsing graph representationThe set of nodes that can be reached within a particular path length from a node are the neighbors of that nodeIf two concepts (nodes) have similar set of neighbors, they are likely to represent the sam
20、e conceptLearning-based Schema MatchingUse machine learning techniques to determine schema matchesClassification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar conceptsSimilarity is defined according t
21、o features of data instancesClassification is “l(fā)earned” from a training setLearning-based Schema MatchingCombined Schema Matching ApproachesUse multiple matchersEach matcher focuses on one area (name, etc)Meta-matcher integrates these into one predictionIntegration may be simple (take average of sim
22、ilarity values) or more complex (see Fagins work)Schema IntegrationUse the correspondences to create a GCSMainly a manual process, although rules can helpBinary Integration MethodsN-ary Integration MethodsSchema MappingMapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target.Data warehouses actual translationData integration systems discover mappings that ca
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 寵物店店員工作總結(jié)
- 黑龍江省哈爾濱市道里區(qū)2023-2024學(xué)年九年級上學(xué)期化學(xué)期末測試題
- 大型活動銷售總結(jié)
- 婚慶策劃師的工作總結(jié)
- 2025年云南省八省聯(lián)考高考地理模擬試卷
- 《胡蘿卜素的提取定》課件
- 《怎樣做品牌規(guī)劃》課件
- 2023年江西省上饒市公開招聘警務(wù)輔助人員輔警筆試自考題1卷含答案
- 2022年湖北省黃岡市公開招聘警務(wù)輔助人員輔警筆試自考題1卷含答案
- 2024年江蘇省無錫市公開招聘警務(wù)輔助人員輔警筆試自考題1卷含答案
- 中國陰離子交換膜行業(yè)調(diào)研分析報告2024年
- 絨毛下血腫保胎方案
- 醫(yī)美行業(yè)監(jiān)管政策與競爭環(huán)境
- 2024年02月湖北武漢市公安局招考聘用輔警267人筆試歷年高頻考題(難、易錯點薈萃)答案帶詳解附后
- 房屋移交的時間和方式
- 北京市西城區(qū)2022-2023學(xué)年七年級(上)期末數(shù)學(xué)試卷(人教版 含答案)
- 2024年福建寧德城市建設(shè)投資開發(fā)公司招聘筆試參考題庫含答案解析
- 電焊的安全防護技術(shù)模版
- 低值易耗品明細表
- 金礦投資可行性方案
- 山東省濟南市2023-2024學(xué)年高三上學(xué)期期末學(xué)習(xí)質(zhì)量檢測生物試題(原卷版)
評論
0/150
提交評論