AWS動態(tài)管理大規(guī)模Spark集群_第1頁
AWS動態(tài)管理大規(guī)模Spark集群_第2頁
AWS動態(tài)管理大規(guī)模Spark集群_第3頁
AWS動態(tài)管理大規(guī)模Spark集群_第4頁
AWS動態(tài)管理大規(guī)模Spark集群_第5頁
已閱讀5頁,還剩32頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、AWS動態(tài)管理大規(guī)模Spark集群技術(shù)創(chuàng)新 變革未來Founded in 2013 with HQ in Mountain View, California USAStrong financial backing over $50M by Sequoia Capital, Genesis Capital, and GSR venturesTechnology7 patent pendingElite tech-team of PhDs from top universities specializing in machine learning, big data and securityCli

2、entsPartnered with global clients with an online or mobile presence in gaming, social, commerce and finance4B+Protected accountsglobally800B+Processed eventsto date200MDetected + bad accounts to dateRules EnginesSupervisedMachine LearningUnsupervisedMachine LearningReputation ListsHow it worksSearch

3、 reputation databaseMatches against listsMake decisionSende rReceiverReputation DatabaseIs Sender IP Listed?YES / NOMakeDecisionExamplesEmailIP AddressDevicesCredit Card #sPhone #sHow it worksCheck against rule listsCriteria with weightsCombination rules with logicIF (user email = free email service

4、) AND (comment character count 150 per sec) flag user account as spammermute commentingRULEWEIGHTIP Address is anonymous proxy+800Account age 180 days-500Email is private corporate domain name-350Mismatch billing country and IP country+450Phone number found on 3 accounts+250What is it?An algorithm t

5、hat learns to perform a task from known examples (training data).An important requirement of using supervised learning is having the data to train the model.What is it?An algorithm that learns to identify linkages and patterns in the data without prior knowledge of what to look for.Unsupervised mach

6、ine learning does not requirelabeled training data.Comparison Between ApproachesTimeReputation List and Device FingerprintRules EngineSupervised Machine LearningUnsupervised Machine LearningEffectiveness Limited coverage and precision Can use emulators to bypass device fingerprintNeed to maintain an

7、d adapt rules constantly Poor against adaptive attacks Need large amount of labeled data Have difficulties detecting unknown attacksAuto-label generationDetection of unknown attacksAuto-rules generationDataVisorUnsupervised Machine LearningUML Engine Process FlowSTEP 1DYNAMIC FEATURE EXTRACTIONSTEP

8、2UNSUPERVISED ATTACK RING DETECTIONSTEP 3RESULT CATEGORIZATION & RANKINGGenerating large set of features to describe each input accountPerforming correlation analysis across all counts and identifying attack ringsAssigning confidence score and categorizing attack ringProfile InfoBehaviors & Activiti

9、esOrigins & Digital FingerprintsContents & MetadataRelationship among acctsSecurity event logs + labelsCustom scoring + metadataSocialGamingE-commerceFinanceDynamic event extractionDATA INFRASTRUCTURE (Hadoop, Spark)Correlation engine of billions of usersTemporalEventSeq.Velocity + FreqSpatial / GEO

10、DomainGraph AttributesAttributesRaw Input DataOur feature engineering was designed to operate across a very high dimensional feature space and be comprehensive in extracting fraud features.Data Processing LayerFeature ExtractionCross-event/Time-series Feature EngineeringDATAVISORS UNSUPERVISED MACHI

11、NE LEARNINGUser profile dataDerived features (frequency, velocity, correlation )user0001ProfileBehaviorFreq.Corr.Long vector to describe comprehensive profile and behavior of each useruser0002user0003User IDuser0001Update dynamicallyBehavior dataUser ID.user000 1user0002user0003Velocit yDeviceSeq.Ge

12、oAll application-level events from multiple online service verticals410 Million+ IP addresses3.6 Million+ Email domains160,000+ Device types300,000+ OS versions5.3 Million+ User agent strings700,000+ Phone prefixesFrom 4 Billion+ global users, 800 Billion+ events and growingFinancialE-CommSocialMobi

13、leGlobal Intelligence NetworkXiaomi Mi 5 is a phone that was released in 201650% of its footprint is in Russia and ChinaWhen it appears in other region, its fraud rate can up to 51%user0001user0002user0003Intel from GINIPEmail domainPhone prefixDevice infoGLOBAL INTELLIGENCE NETWORK (GIN)ProfileBeha

14、vi orFreq.Corr.Velocit yDeviceSeq.Geouser0001user0002user0003user0004user0005user9553Cluster001Dimension reduction based onStatistical analysisDomain knowledgeFeature correlationDynamic clustering based oncombinations ofFeature dimensions (f)Feature weights (w)Linkage probability func. (F)User Level

15、Cluster LevelCluster002Cluster551Based on key clustering features, DataVisor engine will output reason code and corresponding categories.ATTACK CATEGORIZATIONREASON CODEAutomated account opening fraudMass account takeover password testingManual transaction fraudCLASSIFICATIONOF FRAUD CAMPAIGNSANALYS

16、IS OFFRAUD TECHNIQUESMONITORING OF FRAUD TRENDSStop Fake Account CreationPrevent mass registration of fake account armiesPrevent Transaction FraudReduce e-commerce and financial fraud 30%-50% more than traditional solutionsIdentify Account TakeoversDetect compromised users before damage to your cust

17、omers or brandBlock Fake Reviews & LikesMaintain trust in your platform by reducing fake comments & votingFilter SpamPrevent spammers from posting illicit or annoying contentDiscover Fake App InstallsSave millions of dollars per year by flagging fake mobile app installsEarly detection view shows how

18、 DataVisor catches crime rings before damage is done.Fake App Installs and Game Play Activity15K+ installs all coming from the same device typeRedmi 3S running Android 5.1.1Fake retention activity within 7 days to 2 weeks following installMultiple app starts every few seconds or minutesCompletely go

19、ne inactive after the faked app_start retentionsAll Installs from Xiaomi Redmi 3sFollowed by fake app starts to mimic retentionOne major “wave” offraudulent installsFraud score95Derive granular user behavior informationNew user ratioFraudulent user ratioFirst/Last seen timeProxy/Data center IPGeoloc

20、ation Deep LearningGlobal Intelligence NetworkFinancialSocialE-CommMobilePro:Unified engine (end-to-end solution)Simple APISpeedCon:Deep learning integration under developmentPro:Production ready (if done right)Extensive ML API for various tasksCon:Limited data pre-processing supportNot end-to-end s

21、olutionUDFDataframeDerived featureOrigin featurePre-processingLoad data into DataFrameEach user defined function (UDF) is builtfrom a feature functionUniform APIServingEvery entry of data point is pre-processedand then fed to DL model for inferenceThe same feature function is used to process data at

22、 serving timeFeature functionsServingdataModelingInferencePipline Module 1Process 200+ GB/day/client5000+ average peak QPSacross clientsBatch process runs multiple times per dayDynamically launch and destroy Spark Cluster utilizing Spot FleetResults are precomputed and written to each data storeOrig

23、inal DataPipline Module 2Pipline Module N+1Pipline Module NPipline Module N+1Pipline Module N+1Moved to a 3 year convertible instance modelReal-time cost tracking (Cloudability)Spot FleetSparkGen Internal Spark Cluster Management SoftwareProd JobSchedulerSpark Resource ManagerProd JobsDev JobsDevelo

24、persMSSSMSSSSSMSSSSTrack pipeline dependency and run all jobs on Spot instancesTip: Spot instances are 7 times cheaper than on-demand 3 times cheaper than reserved instances.Single Static Cluster One-time launch Low utilization Idle timeMultiple Static Clusters One-time launch Moderate utilization Idle time Limited concurrency

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論