數(shù)據(jù)與模型安全 課件 第4周:對(duì)抗樣本檢測(cè)_第1頁(yè)
數(shù)據(jù)與模型安全 課件 第4周:對(duì)抗樣本檢測(cè)_第2頁(yè)
數(shù)據(jù)與模型安全 課件 第4周:對(duì)抗樣本檢測(cè)_第3頁(yè)
數(shù)據(jù)與模型安全 課件 第4周:對(duì)抗樣本檢測(cè)_第4頁(yè)
數(shù)據(jù)與模型安全 課件 第4周:對(duì)抗樣本檢測(cè)_第5頁(yè)
已閱讀5頁(yè),還剩49頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Adversarial

Example

Detection姜育剛,馬興軍,吳祖煊Recap:

week

3

1.

Adversarial

Examples

2.

Adversarial

Attacks

3.

Adversarial

Vulnerability

UnderstandingIn-class

Adversarial

Attack

Competitionhttps://codalab.lisn.upsaclay.fr/competitions/15669?secret_key=77cb8986-d5bd-4009-82f0-7dde2e819ff8

In-class

Adversarial

Attack

CompetitionIn-class

Adversarial

Attack

CompetitionAdversarial

attack

competition(account

for

30%)必須使用學(xué)校郵箱注冊(cè)比賽(否則無(wú)成績(jī))比賽時(shí)間:Phase

1:10月1號(hào)–

10月28號(hào)Phase

2:評(píng)估階段,學(xué)生不參與沒(méi)卡的同學(xué)可以使用Google

Colab:/

按排名算分:第一名30分最后一名15分Adversarial

Example

Detection

(AED)A

binary

classification

problem:

clean

(y=0)

or

adv

(y=1)?An

anomaly

detection

problem:

benign

(y=0)

or

abnormal

(y=1)?

Principles

for

AEDAll

binary

classification

methods

can

be

applied

for

AEDPrinciples

for

AEDAll

anomaly

detection

methods

can

be

applied

for

AEDPrinciples

for

AEDUse

as

much

information

as

you

canInput

statisticsManual

featuresTraining

dataAttention

mapTransformationMixupDenoising…ActivationsDeep

featuresProbabilitiesLogitsGradientsLoss

landscapeUncertainty…Principles

for

AEDLeverage

unique

characteristics

of

adversarial

examplesTwinsStrangersExtremely

close

to

the

clean

sampleFar

away

in

predictionPrinciples

for

AEDBuild

detectors

based

on

existing

understandingsHigh

dimensional

pocketsLocal

linearityTilting

boundaryPrinciples

for

AEDIt’s

is

still

feature

engineering!Challenges

in

AEDThe

diversity

of

adversarial

examples

used

for

training

the

detectors

determine

the

detection

performanceDetectors

are

also

machine

learning

models:

they

are

also

vulnerable

to

adversarial

attacks

The

detectors

need

to

detect

both

existing

and

unknown

attacksThe

detectors

need

to

be

robust

to

adaptive

attacksExisting

MethodsSecondary

Classification

Methods

(二級(jí)分類法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(誘捕檢測(cè)法)Existing

MethodsSecondary

Classification

Methods

(二級(jí)分類法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(誘捕檢測(cè)法)Secondary

Classification

MethodsTake

adversarial

examples

as

a

new

classAdversarialRetraining

(對(duì)抗重訓(xùn)練)Grosse

et

al.

Onthe(Statistical)DetectionofAdversarialExamples,

arXiv:1702.06280Secondary

Classification

MethodsClean

samples

as

class

0,

adversarial

as

class

1AdversarialClassification

(對(duì)抗分類)Gong

et

al.

Adversarialandcleandataarenottwins,

arXiv:1704.04960Secondary

Classification

MethodsTraining

a

detector

for

each

intermediate

layerCascade

Classifiers

(級(jí)聯(lián)分類器)Metzen,JanHendrik,etal."Ondetectingadversarialperturbations."

arXivpreprintarXiv:1702.04267

(2017).Existing

MethodsSecondary

Classification

Methods

(二級(jí)分類法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(誘捕檢測(cè)法)Principle

Component

Analysis

(PCA)The

last

few

components

differentiate

adversarial

examplesHendrycks,Dan,andKevinGimpel.“Earlymethodsfordetectingadversarialimages.”

arXiv:1608.00530

(2016);

Carlini

and

Wagner."Adversarialexamplesarenoteasilydetected:Bypassingtendetectionmethods."

AISec.2017.Blue:

a

clean

sampleYellow:

an

adv

exampleAn

artifact

caused

by

the

black

backgroundDimensionality

ReductionBhagoji,ArjunNitin,DanielCullina,andPrateekMittal."Dimensionalityreductionasadefenseagainstevasionattacksonmachinelearningclassifiers."arXiv:1704.02654

2.1(2017).Train

on

PCA

reduced

dataExisting

MethodsSecondary

Classification

Methods

(二級(jí)分類法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(誘捕檢測(cè)法)Distribution

DetectionGrosse

et

al.

Onthe(Statistical)DetectionofAdversarialExamples,

arXiv:1702.06280MaximumMeanDiscrepancy

(MMD)Two

datasets:

Distribution

DetectionFeinman,Reuben,etal."Detectingadversarialsamplesfromartifacts."

arXivpreprintarXiv:1703.00410

(2017).KernelDensityEstimation

(KDE)Adversarial

examples

are

in

low

density

spaceDistribution

DetectionFeinman,Reuben,etal."Detectingadversarialsamplesfromartifacts."

arXivpreprintarXiv:1703.00410

(2017).KernelDensityEstimation

(KDE)Adversarial

examples

are

in

low

density

space

Bypassing

10

Detection

MethodsAdversarialExamplesAreNotEasilyDetected:BypassingTenDetectionMethods.

Carlini

and

Wagner,

AISec

2017.Local

Intrinsic

Dimensionality

(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

Maet

al.

ICLR

2018Definition(LocalIntrinsicDimensionality)AdversarialexamplesareinhighdimensionalsubspacesLocal

Intrinsic

Dimensionality

(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

Maet

al.

ICLR

2018AdversarialSubspacesandExpansionDimension:

Local

Intrinsic

Dimensionality

(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

Maet

al.

ICLR

2018Estimatinglocalintrinsicdimensionality.Amsaleg

et

al.KDD

2015EstimationofLID:

Hill(MLE)estimator(Hill1975,Amsalegetal.2015):BasedonExtremeValueTheory:Nearestneighbordistancesareextremeevents.LowertaildistributionfollowsGeneralizedParetoDistribution(GPD).

Local

Intrinsic

Dimensionality

(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

Maet

al.

ICLR

2018InterpretationofLIDforAdversarialSubspaces:LIDdirectlymeasuresexpansionrateoflocaldistancedistributions.Theexpansionofadversarialsubspaceishigherthannormaldatasubspace.LIDassessesthespace-fillingcapabilityofthesubspace,basedonthedistancedistributionoftheexampletoitsneighbors.Local

Intrinsic

Dimensionality

(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

Maet

al.

ICLR

2018LID

of

adversarial

examples

(red)

are

higherLID

at

deeper

layers

are

more

differentiableLocal

Intrinsic

Dimensionality

(LID)Local

Intrinsic

Dimensionality

(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

Maet

al.

ICLR

2018Experiments&Results:DatasetFeatureFGMBIM-aBIM-bJSMAOptMNISTKD78.1298.1498.6168.7795.15BU32.3791.5525.4688.7471.30LID96.8999.6099.8392.2499.24CIFAR-10KD64.9268.3898.7085.7791.35BU70.5381.6097.3287.3691.39LID82.3882.5199.7895.8798.94SVHNKD70.3977.1899.5786.4687.41BU86.7884.0786.9391.3387.13LID97.6187.5599.7295.0797.60Local

Intrinsic

Dimensionality

(LID)CharacterizingAdversarialSubspaceUsingLocalIntrinsicDimensionality.

Maet

al.

ICLR

2018Experiments&Results:Train\TestattackFGMBIM-aBIM-bJSMAOptFGSMKD64.9269.1589.7185.7291.22BU70.5381.672.6586.7991.27LID82.3882.3091.6189.9393.32Detectors

trained

on

simple

attacks

FGSM

can

detect

complex

attacksAn

Improved

Detector

of

LID/pdf/2212.06776.pdf

An

Improved

Detector

of

LID/pdf/2212.06776.pdfMahalanobisDistance

(MD)Mahalanobis,PrasantaChandra."Onthegeneralizeddistanceinstatistics."NationalInstituteofScienceofIndia,1936.

The

MD

of

between

two

data

points:MahalanobisDistance

(MD)Leeetal.“Asimpleunifiedframeworkfordetectingout-of-distributionsamplesandadversarialattacks.”

NeurIPS

2018.

MahalanobisDistance

(MD)Leeetal.“Asimpleunifiedframeworkfordetectingout-of-distributionsamplesandadversarialattacks.”

NeurIPS

2018.MahalanobisDistance

(MD)Leeetal.“Asimpleunifiedframeworkfordetectingout-of-distributionsamplesandadversarialattacks.”

NeurIPS

2018.Experiments&Results:Existing

MethodsSecondary

Classification

Methods

(二級(jí)分類法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(誘捕檢測(cè)法)Bayes

UncertaintyBayesianUncertainty(BU)

Feinman,Reuben,etal."Detectingadversarialsamplesfromartifacts."

arXivpreprintarXiv:1703.00410

(2017).Feature

SqueezingXu

et

al."Featuresqueezing:Detectingadversarialexamplesindeepneuralnetworks."

arXiv:1704.01155

(2017).Bit

depth

reductionSqueezing

clean

and

adv

examplesReducing

input

dimensionality

improves

robustnessThe

prediction

inconsistency

before

and

after

squeezing

can

detect

advsRandom

TransformationTian

et

al."Detectingadversarialexamplesthroughimagetransformation."

AAAI2018.The

prediction

of

advs

will

change

after

random

transformationsLog-OddsRoth

et

al.“Theoddsareodd:Astatisticaltestfordetectingadversarialexamples.”

ICML2019.Add

random

noise

to

the

input

Log-OddsHuetal.“Anewdefenseagainstadversarialimages:Turningaweaknessintoastrength.”

NeurIPS

2019.原則1:對(duì)抗樣本的梯度更均勻原則2:對(duì)抗樣本難以被攻擊第二次測(cè)試準(zhǔn)則1:隨機(jī)噪聲不會(huì)改變預(yù)測(cè)結(jié)果測(cè)試準(zhǔn)則1:再次攻擊需要更多的擾動(dòng)Existing

MethodsSecondary

Classification

Methods

(二級(jí)分類法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論