版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
大數(shù)據(jù)外文翻譯參考文獻(xiàn)綜述大數(shù)據(jù)外文翻譯參考文獻(xiàn)綜述(文檔含中英文對照即英文原文和中文翻譯)原文:DataMiningandDataPublishingDataminingistheextractionofvastinterestingpatternsorknowledgefromhugeamountofdata.Theinitialideaofprivacy-preservingdataminingPPDMwastoextendtraditionaldataminingtechniquestoworkwiththedatamodifiedtomasksensitiveinformation.Thekeyissueswerehowtomodifythedataandhowtorecoverthedataminingresultfromthemodifieddata.Privacy-preservingdataminingconsiderstheproblemofrunningdataminingalgorithmsonconfidentialdatathatisnotsupposedtoberevealedeventothepartyrunningthealgorithm.Incontrast,privacy-preservingdatapublishing(PPDP)maynotnecessarilybetiedtoaspecificdataminingtask,andthedataminingtaskmaybeunknownatthetimeofdatapublishing.PPDPstudieshowtotransformrawdataintoaversionthatisimmunizedagainstprivacyattacksbutthatstillsupportseffectivedataminingtasks.Privacy-preservingforbothdatamining(PPDM)anddatapublishing(PPDP)hasbecomeincreasinglypopularbecauseitallowssharingofprivacysensitivedataforanalysispurposes.Onewellstudiedapproachisthek-anonymitymodel[1]whichinturnledtoothermodelssuchasconfidencebounding,l-diversity,t-closeness,(α,k)-anonymity,etc.Inparticular,allknownmechanismstrytominimizeinformationlossandsuchanattemptprovidesaloopholeforattacks.Theaimofthispaperistopresentasurveyformostofthecommonattackstechniquesforanonymization-basedPPDM&PPDPandexplaintheireffectsonDataPrivacy.Althoughdataminingispotentiallyuseful,manydataholdersarereluctanttoprovidetheirdatafordataminingforthefearofviolatingindividualprivacy.Inrecentyears,studyhasbeenmadetoensurethatthesensitiveinformationofindividualscannotbeidentifiedeasily.AnonymityModels,k-anonymizationtechniqueshavebeenthefocusofintenseresearchinthelastfewyears.Inordertoensureanonymizationofdatawhileatthesametimeminimizingtheinformationlossresultingfromdatamodifications,everalextendingmodelsareproposed,whicharediscussedasfollows.1.k-Anonymityk-anonymityisoneofthemostclassicmodels,whichtechniquethatpreventsjoiningattacksbygeneralizingand/orsuppressingportionsofthereleasedmicrodatasothatnoindividualcanbeuniquelydistinguishedfromagroupofsizek.Inthek-anonymoustables,adatasetisk-anonymous(k≥1)ifeachrecordinthedatasetisin-distinguishablefromatleast(k.1)otherrecordswithinthesamedataset.Thelargerthevalueofk,thebettertheprivacyisprotected.k-anonymitycanensurethatindividualscannotbeuniquelyidentifiedbylinkingattacks.2.ExtendingModelsSincek-anonymitydoesnotprovidesufficientprotectionagainstattributedisclosure.Thenotionofl-diversityattemptstosolvethisproblembyrequiringthateachequivalenceclasshasatleastlwell-representedvalueforeachsensitiveattribute.Thetechnologyofl-diversityhassomeadvantagesthank-anonymity.Becausek-anonymitydatasetpermitsstrongattacksduetolackofdiversityinthesensitiveattributes.Inthismodel,anequivalenceclassissaidtohavel-diversityifthereareatleastlwell-representedvalueforthesensitiveattribute.Becausetherearesemanticrelationshipsamongtheattributevalues,anddifferentvalueshaveverydifferentlevelsofsensitivity.Afteranonymization,inanyequivalenceclass,thefrequency(infraction)ofasensitivevalueisnomorethanα.3.RelatedResearchAreasSeveralpollsshowthatthepublichasanin-creasedsenseofprivacyloss.Sincedataminingisoftenakeycomponentofinformationsystems,homelandsecuritysystems,andmonitoringandsurveillancesystems,itgivesawrongimpressionthatdataminingisatechniqueforprivacyintrusion.Thislackoftrusthasbecomeanobstacletothebenefitofthetechnology.Forexample,thepotentiallybeneficialdataminingre-searchproject,TerrorismInformationAwareness(TIA),wasterminatedbytheUSCongressduetoitscontroversialproceduresofcollecting,sharing,andanalyzingthetrailsleftbyindividuals.Motivatedbytheprivacyconcernsondataminingtools,aresearchareacalledprivacy-reservingdatamining(PPDM)emergedin2000.TheinitialideaofPPDMwastoextendtraditionaldataminingtechniquestoworkwiththedatamodifiedtomasksensitiveinformation.Thekeyissueswerehowtomodifythedataandhowtorecoverthedataminingresultfromthemodifieddata.Thesolutionswereoftentightlycoupledwiththedataminingalgorithmsunderconsideration.Incontrast,privacy-preservingdatapublishing(PPDP)maynotnecessarilytietoaspecificdataminingtask,andthedataminingtaskissometimesunknownatthetimeofdatapublishing.Furthermore,somePPDPsolutionsemphasizepreservingthedatatruthfulnessattherecordlevel,butPPDMsolutionsoftendonotpreservesuchproperty.PPDPDiffersfromPPDMinSeveralMajorWaysasFollows:1)PPDPfocusesontechniquesforpublishingdata,nottechniquesfordatamining.Infact,itisexpectedthatstandarddataminingtechniquesareappliedonthepublisheddata.Incontrast,thedataholderinPPDMneedstorandomizethedatainsuchawaythatdataminingresultscanberecoveredfromtherandomizeddata.Todoso,thedataholdermustunderstandthedataminingtasksandalgorithmsinvolved.ThislevelofinvolvementisnotexpectedofthedataholderinPPDPwhousuallyisnotanexpertindatamining.2)Bothrandomizationandencryptiondonotpreservethetruthfulnessofvaluesattherecordlevel;therefore,thereleaseddataarebasicallymeaninglesstotherecipients.Insuchacase,thedataholderinPPDMmayconsiderreleasingthedataminingresultsratherthanthescrambleddata.3)PPDPprimarily“anonymizes”thedatabyhidingtheidentityofrecordowners,whereasPPDMseekstodirectlyhidethesensitivedata.ExcellentsurveysandbooksinrandomizationandcryptographictechniquesforPPDMcanbefoundintheexistingliterature.Afamilyofresearchworkcalledprivacy-preservingdistributeddatamining(PPDDM)aimsatperformingsomedataminingtaskonasetofprivatedatabasesownedbydifferentparties.ItfollowstheprincipleofSecureMultipartyComputation(SMC),andprohibitsanydatasharingotherthanthefinaldataminingresult.Cliftonetal.presentasuiteofSMCoperations,likesecuresum,securesetunion,securesizeofsetintersection,andscalarproduct,thatareusefulformanydataminingtasks.Incontrast,PPDPdoesnotperformtheactualdataminingtask,butconcernswithhowtopublishthedatasothattheanonymousdataareusefulfordatamining.WecansaythatPPDPprotectsprivacyatthedatalevelwhilePPDDMprotectsprivacyattheprocesslevel.Theyaddressdifferentprivacymodelsanddataminingscenarios.Inthefieldofstatisticaldisclosurecontrol(SDC),theresearchworksfocusonprivacy-preservingpublishingmethodsforstatisticaltables.SDCfocusesonthreetypesofdisclosures,namelyidentitydisclosure,attributedisclosure,andinferentialdisclosure.Identitydisclosureoccursifanadversarycanidentifyarespondentfromthepublisheddata.Revealingthatanindividualisarespondentofadatacollectionmayormaynotviolateconfidentialityrequirements.Attributedisclosureoccurswhenconfidentialinformationaboutarespondentisrevealedandcanbeattributedtotherespondent.Attributedisclosureistheprimaryconcernofmoststatisticalagenciesindecidingwhethertopublishtabulardata.Inferentialdisclosureoccurswhenindividualinformationcanbeinferredwithhighconfidencefromstatisticalinformationofthepublisheddata.SomeotherworksofSDCfocusonthestudyofthenon-interactivequerymodel,inwhichthedatarecipientscansubmitonequerytothesystem.Thistypeofnon-interactivequerymodelmaynotfullyaddresstheinformationneedsofdatarecipientsbecause,insomecases,itisverydifficultforadatarecipienttoaccuratelyconstructaqueryforadataminingtaskinoneshot.Consequently,thereareaseriesofstudiesontheinteractivequerymodel,inwhichthedatarecipients,includingadversaries,cansubmitasequenceofqueriesbasedonpreviouslyreceivedqueryresults.Thedatabaseserverisresponsibletokeeptrackofallqueriesofeachuseranddeterminewhetherornotthecurrentlyreceivedqueryhasviolatedtheprivacyrequirementwithrespecttoallpreviousqueries.Onelimitationofanyinteractiveprivacy-preservingquerysystemisthatitcanonlyanswerasublinearnumberofqueriesintotal;otherwise,anadversary(oragroupofcorrupteddatarecipients)willbeabletoreconstructallbut1.o(1)fractionoftheoriginaldata,whichisaverystrongviolationofprivacy.Whenthemaximumnumberofqueriesisreached,thequeryservicemustbeclosedtoavoidprivacyleak.Inthecaseofthenon-interactivequerymodel,theadversarycanissueonlyonequeryand,therefore,thenon-interactivequerymodelcannotachievethesamedegreeofprivacydefinedbyIntroductiontheinteractivemodel.Onemayconsiderthatprivacy-reservingdatapublishingisaspecialcaseofthenon-interactivequerymodel.Thispaperpresentsasurveyformostofthecommonattackstechniquesforanonymization-basedPPDM&PPDPandexplainstheireffectsonDataPrivacy.k-anonymityisusedforsecurityofrespondentsidentityanddecreaseslinkingattackinthecaseofhomogeneityattackasimplek-anonymitymodelfailsandweneedaconceptwhichpreventfromthisattacksolutionisl-diversity.Alltuplesarearrangedinwellrepresentedformandadversarywilldiverttolplacesoronlsensitiveattributes.l-diversitylimitsincaseofbackgroundknowledgeattackbecausenoonepredictsknowledgelevelofanadversary.Itisobservethatusinggeneralizationandsuppressionwealsoapplythesetechniquesonthoseattributeswhichdoesn’tneedthisextentofprivacyandthisleadstoreducetheprecisionofpublishingtable.e-NSTAM(extendedSensitiveTuplesAnonymityMethod)isappliedonsensitivetuplesonlyandreducesinformationloss,thismethodalsofailsinthecaseofmultiplesensitivetuples.Generalizationwithsuppressionisalsothecausesofdatalosebecausesuppressionemphasizeonnotreleasingvalueswhicharenotsuitedforkfactor.Futureworksinthisfrontcanincludedefininganewprivacymeasurealongwithl-diversityformultiplesensitiveattributeandwewillfocustogeneralizeattributeswithoutsuppressionusingothertechniqueswhichareusedtoachievek-anonymitybecausesuppressionleadstoreducetheprecisionofpublishingtable.
譯文:數(shù)據(jù)挖掘和數(shù)據(jù)發(fā)布數(shù)據(jù)挖掘中提取出大量有趣的模式從大量的數(shù)據(jù)或知識(shí)。數(shù)據(jù)挖掘隱私保護(hù)PPDM的最初的想法是將傳統(tǒng)的數(shù)據(jù)挖掘技術(shù)擴(kuò)展到處理數(shù)據(jù)修改為屏蔽敏感信息。關(guān)鍵問題是如何修改數(shù)據(jù)以及如何從修改后的數(shù)據(jù)恢復(fù)數(shù)據(jù)挖掘的結(jié)果。隱私保護(hù)數(shù)據(jù)挖掘認(rèn)為機(jī)密數(shù)據(jù)上運(yùn)行數(shù)據(jù)挖掘算法的問題不應(yīng)該透露方運(yùn)行算法。相比之下,隱私保護(hù)數(shù)據(jù)發(fā)布(PPDP)不一定是綁定到一個(gè)特定的數(shù)據(jù)挖掘任務(wù),和數(shù)據(jù)挖掘任務(wù)時(shí)可能是未知的數(shù)據(jù)發(fā)布。PPDP研究如何將原始數(shù)據(jù)轉(zhuǎn)換成一個(gè)版本接種隱私攻擊,但仍然支持有效的數(shù)據(jù)挖掘任務(wù)。隱私保護(hù)數(shù)據(jù)挖掘(PPDM)和數(shù)據(jù)發(fā)布(PPDP)已成為越來越受歡迎,因?yàn)樗试S共享隱私的敏感數(shù)據(jù)進(jìn)行分析的目的。深入研究方法之一是k-anonymity匿名模型進(jìn)而導(dǎo)致信心邊界等模型,l-diversity,t-closeness,(α,k)-anonymity,等。特別是,所有已知的機(jī)制,盡量減少信息損失,試圖提供一個(gè)漏洞攻擊。本文的目的是提出一項(xiàng)調(diào)查最常見的攻擊技術(shù)即PPDM&PPDP和解釋它們對數(shù)據(jù)隱私的影響。盡管數(shù)據(jù)挖掘可能是有用的,很多數(shù)據(jù)持有者不愿提供他們的數(shù)據(jù)對數(shù)據(jù)挖掘的恐懼侵犯個(gè)人隱私。近年來,研究了以確保個(gè)人敏感信息不能輕易識(shí)別。匿名模型(k-匿名)技術(shù)一直是研究的焦點(diǎn),在過去的幾年里。為了確保匿名數(shù)據(jù)的同時(shí)盡量減少所造成的信息損失數(shù)據(jù)的修改,提出了幾個(gè)擴(kuò)展模型,討論如下。1.k-匿名模型k-anonymity最經(jīng)典模型之一,加入的攻擊技術(shù),防止泛化和/或抑制微數(shù)據(jù)發(fā)布的一部分,這樣任何個(gè)人可以獨(dú)特區(qū)別一群大小k。k-anonymous表,一個(gè)數(shù)據(jù)集是k-anonymous(k≥1)如果每個(gè)記錄的數(shù)據(jù)集——至少(k區(qū)分開來)其他相同的數(shù)據(jù)集內(nèi)的記錄。k值越大,更好的隱私保護(hù)。英蒂k-anonymity可以確?!獀iduals不能唯一標(biāo)識(shí)鏈接攻擊。2.擴(kuò)展模型因?yàn)閗-anonymity不提供足夠的保護(hù)屬性披露。l-diversity的概念試圖解決這個(gè)問題,要求每個(gè)等價(jià)類至少l上流每個(gè)敏感屬性值。比k-anonymityl-diversity技術(shù)有一定的優(yōu)勢。因?yàn)閗-anonymity數(shù)據(jù)集允許強(qiáng)大的攻擊由于缺乏多樣性的敏感屬性。在這個(gè)模型中,一個(gè)等價(jià)類據(jù)說l-diversity如果至少有l(wèi)上流的敏感屬性的值。因?yàn)橛姓Z義屬性值之間的關(guān)系,以及不同價(jià)值觀有不同水平的敏感性。anonymization之后,在任何等價(jià)類,一個(gè)敏感的頻率(分?jǐn)?shù))值不超過α。3.相關(guān)研究領(lǐng)域一些民意調(diào)查顯示,公眾有——有折痕的隱私的失落感。由于數(shù)據(jù)挖掘通常是信息系統(tǒng)的一個(gè)關(guān)鍵組成部分,國土安全系統(tǒng),以及監(jiān)測和監(jiān)測系統(tǒng),它給了一個(gè)錯(cuò)誤的印象,荷蘭國際集團(tuán)數(shù)據(jù)隱私入侵的技術(shù)。這種缺乏信任已經(jīng)成為障礙的技術(shù)中獲益。例如,潛在的有益的數(shù)據(jù)挖掘,搜索項(xiàng)目,恐怖主義信息意識(shí)(TIA),是由美國國會(huì)終止由于其爭議的程序收集、分享和分析個(gè)人留下的痕跡。出于隱私問題的數(shù)據(jù)挖掘工具,一個(gè)叫隱私保護(hù)的數(shù)據(jù)挖掘研究領(lǐng)域(PPDM)出現(xiàn)在2000年。PPDM的最初的想法是將傳統(tǒng)的數(shù)據(jù)挖掘技術(shù)擴(kuò)展到處理數(shù)據(jù)修改為屏蔽敏感信息。關(guān)鍵問題是如何修改數(shù)據(jù)以及如何從修改后的數(shù)據(jù)恢復(fù)數(shù)據(jù)挖掘的結(jié)果。這些解決方案通常與數(shù)據(jù)挖掘算法在考慮緊密耦合。相比之下,隱私保護(hù)數(shù)據(jù)發(fā)布(PPDP)不一定綁到一個(gè)特定的數(shù)據(jù)挖掘任務(wù),和數(shù)據(jù)挖掘任務(wù)有時(shí)是未知的數(shù)據(jù)發(fā)布的時(shí)候。此外,一些PPDP解決方案強(qiáng)調(diào)保存數(shù)據(jù)記錄級別的真實(shí)性,但是PPDM解決方案通常不保留這樣的財(cái)產(chǎn)。PPDP有別于PPDM在幾個(gè)主要方面如下:1)PPDP關(guān)注技術(shù)發(fā)布數(shù)據(jù),數(shù)據(jù)挖掘技術(shù)。事實(shí)上,它預(yù)計(jì),標(biāo)準(zhǔn)的數(shù)據(jù)挖掘技術(shù)應(yīng)用于分析數(shù)據(jù)。相反,數(shù)據(jù)持有人在PPDM需要隨機(jī)數(shù)據(jù)的方式,數(shù)據(jù)挖掘結(jié)果可以從隨機(jī)數(shù)據(jù)中恢復(fù)過來。為此,持有人必須了解數(shù)據(jù)挖掘任務(wù)的數(shù)據(jù)和算法。這種級別的預(yù)計(jì)數(shù)據(jù)持有人參與PPDP通常不是一個(gè)數(shù)據(jù)挖掘?qū)<摇?)隨機(jī)化和加密不保存記錄的真實(shí)值水平;因此,公布的數(shù)據(jù)基本上是毫無意義的決策。在這種情況下,數(shù)據(jù)持有人PPDM
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2024年物業(yè)市場推廣合作合同
- 2024年格力空調(diào)質(zhì)保與安裝服務(wù)協(xié)議
- 2025幼兒園園長聘用合同
- 渠道溝通機(jī)制建設(shè)增強(qiáng)協(xié)作效率
- 瑜伽館廣告牌建設(shè)合同
- 福建省福州市部分學(xué)校教學(xué)聯(lián)盟2023-2024學(xué)年高一上學(xué)期期末考試歷史試題(解析版)
- 北京市延慶區(qū)2023-2024學(xué)年高二上學(xué)期期末考試歷史試題(解析版)
- 三違行為預(yù)防與干預(yù)體系
- 河南省洛陽市2023-2024學(xué)年高二上學(xué)期期末考試數(shù)學(xué)試題(解析版)
- 河北省邢臺(tái)市質(zhì)檢聯(lián)盟2025屆高三上學(xué)期11月期中考試數(shù)學(xué)試題(解析版)
- 甲醇-水精餾填料塔的設(shè)計(jì)
- 吹風(fēng)機(jī)成品過程質(zhì)量控制檢查指引
- 中介人合作協(xié)議(模版)
- 財(cái)務(wù)管理制度-家電行業(yè)
- 班主任工作滿意度測評表
- 德國WMF壓力鍋使用手冊
- 瀝青路面施工監(jiān)理工作細(xì)則
- 《尋找消失的爸爸》(圖形)
- 《孤獨(dú)癥兒童-行為管理策略及行為治療課程》讀后總結(jié)
- 人教版八年級上冊英語單詞表默寫版(直接打印)
- PDCA循環(huán)在傳染病管理工作中的應(yīng)用
評論
0/150
提交評論