大規(guī)模實(shí)時(shí)圖計(jì)算在風(fēng)險(xiǎn)管理系統(tǒng)的應(yīng)用_第1頁
大規(guī)模實(shí)時(shí)圖計(jì)算在風(fēng)險(xiǎn)管理系統(tǒng)的應(yīng)用_第2頁
大規(guī)模實(shí)時(shí)圖計(jì)算在風(fēng)險(xiǎn)管理系統(tǒng)的應(yīng)用_第3頁
大規(guī)模實(shí)時(shí)圖計(jì)算在風(fēng)險(xiǎn)管理系統(tǒng)的應(yīng)用_第4頁
大規(guī)模實(shí)時(shí)圖計(jì)算在風(fēng)險(xiǎn)管理系統(tǒng)的應(yīng)用_第5頁
已閱讀5頁,還剩25頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、Large Scale Online Graph on Aerospike at PayPal Risk大規(guī)模實(shí)時(shí)圖計(jì)算在PayPal風(fēng)險(xiǎn)管理系統(tǒng)的應(yīng)用Agenda2PayPal Risk & Online Graph ApplicationsOnline Graph PlatformOnline Graph Linking CasesAerospike at PayPalIndustry Trends Redefining the Way PayPal Builds Trust Between Buyers and SellersPayPal Risk: Building Trust in

2、a New WorldCHIEF RISK OFFICER = CHIEF TRUSTOFFICER500M to 1B identities stolen globally; $32M in U.S. retail fraud losses1Sources: 1 Nielsen, Dept of Commerce, JP Morgan 2015Financial RiskSecurityComplianceTrust & Protection3Neural Networks Ensemble/Embedding ModelsLinear/Logistic RegressionTree Ens

3、emble ModelsStrategiesRulesAgentsStrategies is tree-based rules based on machine learning model scoresRules for some fraud trend which cannot be reflected in models in timeGraph enhanced new dimension for both models/strategies/rules and agents* Different kinds of models adopted in different fraud c

4、asesData Analytical Solution of Risk Fraud DetectionHistorical Behavior DataStreaming Behavior DataGraph Linking Behavior Data4Real Time Graph Risk Opportunities - Repeated OffendersSoutheast Asia Sample Case7 bad seller accounts sell counterfeit shoes on websitesBuyer claims INR/SNAD and sellers ar

5、e identified as bad sellers7 bad sellers share the same MY bank accountAn unknown MY account adds this bank account is suspiciousaddBank Account xx7846XXXXX XXXBad date 2017/07/09XXXXX XXBad date 2017/12/18XXXXX XXBad date 2017/08/09XXX XXXXXBad date 2018/01/16XXXX XXXBad date 2018/01/11XXXXX XXXBad

6、 date 2018/02/07addaddaddaddaddaddadd on 2018/11/215XXXXX XXXXXXXXX XXBad date 2018/01/31Linking is an efficient method to identify repeated offenders by finding they share the same private assets (e.g., bank account) with bad accounts who offended our platform before.Real Time Graph Risk Opportunit

7、ies - Fast Hit & RunReceive $6802019/01/0416:54Bad IT AccountBalance Sent $6802019/01/06Online Solution Available2019/01/0423:37Create account, DOF=0Bad Cookie LinkedGraph Real-time Features2019/01/0420:50One new account is linked to one bad account by cookie in 2019/01/04Later fraud transactions fr

8、om such new accountWith online graph linking features, new rule has detected such fraud transactions in 2019/01/06 when money to be exit6Agenda7PayPal Risk & Online Graph ApplicationsOnline Graph PlatformOnline Graph Linking CasesAerospike at PayPalOverview of PayPal Risk Real Time GraphYYYXXXCookie

9、IPAccount IDcreation_timelast_modified_timelast_nameaccount_status8Account - IPsrc_id_dst_idcreation_timelast_modified_timelink_countVertexEdgeDynamic Evolving Graph & Its Vertex Centric Storage Model11GraphVertex1Edge1ididproperty1property1property2property2Vertex2Edge2ididproperty1property1propert

10、y2property2Account VertexKey: Account IDColumn1: PropertiesColumn2: Edge Account- IPColumn3: Edge Account- CookieColumn4: Edge Account- XXXIP VertexKey: IPColumn1: PropertiesColumn2: Edge IP - AccountCookie VertexKey: Cookie IDColumn1: PropertiesColumn2: Edge Cookie - AccountXXX VertexKey: XXX IDCol

11、umn1: PropertiesColumn2: Edge XXX IPAccount# of Permanent Cookies Linked to One IPReal Time Subgraph Query for Complicated Linking ComputationXXXYYYCookieIPg.V().hasLabel(Account).has(CustID, 123).in().store(sg).out().store(sg).cap(sg).subgraph()Subgraph QueryReal Time Clustering on Subgraphg.V().ha

12、sLabel(Account).has(CustID, 123).out(txn).store(sg).out(txn).store(sg).cap(sg).subgraph().clustering(algorithm1)16Real Time User-Defined Vertex to Scale LinkingUDV1UDV2UDV3UDV41.User-defined vertex (UDV) can be dynamically defined;172.User-defined vertex can be defined as combination like property 1

13、 + property 2 + prefix of property 3;Take examples:UDV1 = f1(x, y)2)UDV2 = f2(x, y);3)*User-defined vertex should be defined well to avoid hot spot in graph, extreme case like one- all or one-one.How?20msAverage Latency50 BillionsVertices/Edges110msP99 Latency99.9%Availability18Performance, Scalabil

14、ity, AvailabilityAgenda19PayPal Risk & Online Graph ApplicationsOnline Graph PlatformOnline Graph Linking CasesAerospike at PayPalAerospike at PayPal309021045065010020040030050060070000.531218201120122013201420152016201720182019Storage Growth TrendData Growth (TB)In-Memory Caching product20Hybrid Me

15、mory AerospikeAerospike AnatomyHigh density storage with Hybrid Memory ArchitectureLinear horizontal scale CPU, Memory, DiskHighly available, Shared nothing architecture, XDR replicationPersistent Memory/Shared Memory support for fast DB restarts/OS rebootsFlexible data modelAsync non-blocking IO su

16、pport for Java client (Netty)21Why Aerospike as Real-time Graph StorageAerospike AnatomyDesigned for SSDs.(Even wear and tear on Device)Proprietary file systemHybrid Storage Predictable capacity. (Key=64Bytes, Value=value, RIPEMD-160 Hash)Key, ValueServerNoSQL KV DatabaseWritten in CAP (Eventual) an

17、d CP (Strong) modesIn-Memory or Hybrid-Memory ModesUses Linux Shared Mem or Persistent Memory for quick restartsLow Disk write amplification (up to 2)SSD optimized - block storageUDF for server side computes22ClientC, C+, Java, Go, C#, NodeJS, PHP, Python, Ruby, Perl, RustConsoleAMCStorage Architect

18、ureIn-MemoryCache/DBMemory-FirstHybrid-MemoryDatabase MemoryClientRead PathDatabase MemoryClientRead Path With Cache HitRead Path With Cache MissLatencyDatabase Disk-VaryingThroughput-VaryingDatabase MemoryClientRead PathDatabase Disk-Low orLatency HighThroughput-ConsistentDatabase DiskLatency- Ultr

19、a Low Throughput- ConsistentAsync/Sync Persistence23High Storage DensityIn-Memory Cache (50TB)Aerospike (Hybrid Memory NoSQL) (50TB)18 Racks# of server 10243 Racks# of servers = 120Performance Cost5xSpace/Power8x10 xConsistent PerformanceAvg Throughput 2M TPSUltra low latency (200us Avg)Inconsistent

20、 PerformanceAverage Throughput 200K TPSLow latency (1ms Avg)Total Cost= $12.5mTotal Cost= $1.8mBefore99.5 ATBAfter99.99+ ATBAvailability24Available = 384GB40 coresCPUMemoryNVMeLoad 250MMax Write = 200K TPSAvailable = 384GB5%15G BLoad 1BMax Write = 200K TPSAvailable = 384GB5%60GB100% Utilization100%

21、Utilization100% UtilizationAvailable = 1.92TB x 1 SATA RI - SSDAvailablx 1 SAT268GB e = 1.92TBA RI - SSDAvailable = 1.92TB25x 1 SATA RI - SSD1TBLinear Scale CPU, MemoryAvailable = 384GB40 coresCPUMemoryNVMe100% UtilizationAvailable = 1.92TB x 1 NVMe RI - SSDLinear Scale DiskLoad 1BMax Write = 400K TPSAvailable = 384GB10%60GB100% UtilizationAvailable =

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論