數(shù)據(jù)與模型安全 課件 第5周:對(duì)抗防御_第1頁(yè)
數(shù)據(jù)與模型安全 課件 第5周:對(duì)抗防御_第2頁(yè)
數(shù)據(jù)與模型安全 課件 第5周:對(duì)抗防御_第3頁(yè)
數(shù)據(jù)與模型安全 課件 第5周:對(duì)抗防御_第4頁(yè)
數(shù)據(jù)與模型安全 課件 第5周:對(duì)抗防御_第5頁(yè)
已閱讀5頁(yè),還剩60頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Adversarial

Defense姜育剛,馬興軍,吳祖煊Recap:

week

4

1.

Adversarial

Example

DetectionSecondary

Classification

Methods

(二級(jí)分類(lèi)法)Principle

Component

Analysis

(主成分分析法,PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(誘捕檢測(cè)法)Adversarial

Attack

CompetitionLink:https://codalab.lisn.upsaclay.fr/competitions/15669?secret_key=77cb8986-d5bd-4009-82f0-7dde2e819ff8

Adversarial

Defense

vs

DetectionThe

weird

relationship

between

defense

and

detectionDetection

IS

defenseBut…

when

we

say

defense,

we

(most

of

the

time)

mean

the

model

is

secured,

yet

detection

cannot

do

that…In

survey

papers:

detection

is

defenseIn

technical

papers:

defense

is

defense

not

detectionDifferencesDefense

is

to

secure

the

model

or

the

systemDetection

is

to

identify

potential

threats,

which

should

befollowed

by

a

defense

strategy,

e.g.,

query

rejection

(but

mostly

ignored)By

defense,

it

mostly

means

robust

training

methodsDefense

MethodsEarly

Defense

MethodsEarly

Adversarial

Training

MethodsLater

Adversarial

Training

MethodsRemaining

Challenges

and

Recent

ProgressA

Recap

of

the

Timeline2014年Goodfellow等人提出快速單步攻擊FGSM及對(duì)抗訓(xùn)練2015年簡(jiǎn)單檢測(cè)方法(PCA)和對(duì)抗訓(xùn)練方法2016年提出對(duì)抗訓(xùn)練的min-max優(yōu)化框架2017年大量的對(duì)抗樣本檢測(cè)方法和攻擊方法(BIM、C&W)、10種檢測(cè)方法被攻破2019年TRADES及大量其他對(duì)抗訓(xùn)練方法、第一篇Science文章2018年物理世界攻擊方法、檢測(cè)方法升級(jí)、PGD攻擊與對(duì)抗訓(xùn)練、9種防御方法被攻破2020年AutoAttack攻擊、Fast對(duì)抗訓(xùn)練2021年增大模型、增加數(shù)據(jù)的對(duì)抗訓(xùn)練、領(lǐng)域延伸2022年尚未解決的問(wèn)題,攻擊越來(lái)越多,防御越來(lái)越難2013年Biggio等人與Szegedy等人發(fā)現(xiàn)對(duì)抗樣本Principles

of

DefenseBlock

the

attack

(掐頭去尾)Mask

the

input

gradientsRegularize

the

input

gradientsDistill

the

logitsDenoise

the

inputRobustify

the

model

(增強(qiáng)中間)Smooth

the

decision

boundaryReduce

the

Lipschitzness

of

the

modelSmooth

the

loss

landscapeAdversarial

AttackSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.

模型訓(xùn)練:

對(duì)抗攻擊:

分類(lèi)錯(cuò)誤擾動(dòng)很小測(cè)試階段攻擊

Performance

Metrics

Othermetrics:maximumperturbationfor100%attacksuccessrate

Defense

MethodsEarly

Defense

MethodsEarly

Adversarial

Training

MethodsAdvanced

Adversarial

Training

MethodsRemaining

Challenges

and

Recent

ProgressDefensive

DistillationMaking

large

logits

change

to

be

”small”Scaling

up

logits

by

a

few

magnitudes;Retrain

the

lastlayerwith

scaled

logits;Papernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016

12Distillation

with

temperature

TDefensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

et

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016

Defensive

Distillation

Is

Not

RobustCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."

arXivpreprintarXiv:1607.04311

(2016).

It

can

be

evaded

by

attacking

the

distilled

network

with

the

temperature

T.

Lessons

LearnedCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."

arXivpreprintarXiv:1607.04311

(2016).Distillation

is

not

a

good

solution

for

adversarial

robustnessVanishing

input

gradients

can

still

be

recovered

by

a

reverse

operationA

defense

should

beevaluated

against

the

adaptive

attack

to

prove

real

robustnessInput

Gradients

RegularizationRoss

et

al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."

AAAI,2018.Drucker,Harris,andYannLeCun.“Improvinggeneralizationperformanceusingdoublebackpropagation.”

TNN,1992.

Classification

lossInput

gradients

regularization

Related

to

the

double

backpropagation

proposed

by

DruckerandLeCun(1992):Input

Gradients

RegularizationRoss

et

al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."

AAAI,2018.Issues:

1)

limited

adversarial

robustness,

2)

hurts

learning蒸餾的對(duì)抗訓(xùn)練的正則化的Feature

SqueezingXu

et

al."Featuresqueezing:Detectingadversarialexamplesindeepneuralnetworks."

NDSS,2018.Compress

the

input

spaceIt

also

hurts

performance

on

large-scale

image

datasets.ThermometerEncodingBuckmanetal."Thermometerencoding:Onehotwaytoresistadversarialexamples."

ICLR,2018.Discretize

the

input

to

break

small

noiseProposed

Thermometer

EncodingInput

TransformationsGuoetal."CounteringAdversarialImagesusingInputTransformations."

ICLR,2018.ImagecroppingandrescalingBit-depthreductionJPEGcompressionTotal

variance

minimizationImagequiltingObfuscated

Gradients

=

Fake

RobustnessAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

et

al.Synthesizingrobustadversarialexamples.ICML,2018.BackwardPassDifferentiableApproximation(BPDA):

Expectation

Over

Transformation

(EOT)T:

a

set

of

randomized

transformationscan

break

non-differentiable

operation

based

defensescan

break

randomization

based

defensesfind

a

linear

approximation

of

the

non-differentiable

operations,

e.g.,

discretization,

compression

etc.

BPDA+EOT

breaks

7

defenses

published

at

ICLR

2018Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

et

al.Synthesizingrobustadversarialexamples.ICML,2018.We

got

a

survivor!How

to

Properly

Evaluate

a

Defense?Carlini,Nicholas,etal.“Onevaluatingadversarialrobustness.”

arXivpreprintarXiv:1902.06705

(2019).

Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018Donotblindlyapplymultiple(similar)attacksTryatleastonegradient-freeattackandonehard-labelattackPerformatransferabilityattackusingasimilarsubstitutemodel.Forrandomizeddefenses,properlyensembleoverrandomnessFornon-differentiablecomponents,applydifferentiabletechniques

(BPDA)VerifythattheattackshaveconvergedundertheselectedhyperparametersCarefullyinvestigateattackhyperparametersandreportthoseselectedCompareagainstpriorworkandexplainimportantdifferencesTestbroaderthreatmodelswhenproposinggeneraldefensesRobust

Activation

FunctionsXiao

et

al."EnhancingAdversarialDefensebyk-Winners-Take-All."

ICLR,2020.Block

the

internal

activation:

break

the

continuityk-Winners-Take-All(k-WTA)activationRobust

Loss

FunctionPang

et

al.Rethinkingsoftmaxcross-entropylossforadversarialrobustness.ICLR,2020.Max-Mahalanobiscenter(MMC)

lossMax-Mahalanobiscenter(MMC);

SCE:

softmax

cross

entropyRobust

InferencePang

et

al.MixupInference:Betterexploitingmixuptodefendadversarialattacks.ICLR,2020.Mixup

Inference

(MI)New

Adaptive

Attacks

Break

These

DefensesTrameretal.“Onadaptiveattackstoadversarialexampledefenses.”

NeurIPS,

2020.T1:

AttackthefulldefenseT2:

Target

importantdefensepartsT3:

SimplifytheattackT4:

EnsureconsistentlossfunctionT5:

OptimizewithdifferentmethodsT6:

UsestrongadaptiveattacksHow

to

Evaluate

aDefense?CroceandHein.“Reliableevaluationofadversarialrobustnesswithanensembleofdiverseparameter-freeattacks.”

ICML,2020.

Gao

et

al.

FastandReliableEvaluationofAdversarialRobustnesswithMinimum-MarginAttack,

ICML

2022.Zimmermann

etal."IncreasingConfidenceinAdversarialRobustnessEvaluations."

arXivpreprintarXiv:2206.13991

(2022).Strong

attacks:AutoAttack

(one

must-to-test

attack)Margin

Decomposition

(MD)

attack

(better

than

AutoAttack

on

ViT)Minimum-Margin

(MM)

attack

(new

SOTA

attack

to

test?)Extra

robustness

tests

Attackunittests

(Zimmermannetal,2022)Adversarial

TrainingGoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.The

idea

is

simple:

just

train

on

adversarial

examples!

對(duì)抗訓(xùn)練是一種數(shù)據(jù)增廣方法原始數(shù)據(jù)->對(duì)抗攻擊->對(duì)抗樣本->模型訓(xùn)練Adversarial

Training

Adversarial

TrainingAdversarial

training

produces

smooth

decision

boundary正常邊界生成對(duì)抗樣本訓(xùn)練后Early

Adversarial

Training

MethodsSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.2014年,Szegedy

et

al.

在其解釋對(duì)抗樣本的論文中已探索了對(duì)抗訓(xùn)練,用L-BFGS攻擊對(duì)神經(jīng)網(wǎng)絡(luò)每一層生成對(duì)抗樣本,并添加到訓(xùn)練過(guò)程中。發(fā)現(xiàn):深層對(duì)抗樣本更能提高魯棒性2015年,Goodfellow

et

al.提出使用FGSM(單步)攻擊生成的對(duì)抗樣本來(lái)訓(xùn)練神經(jīng)網(wǎng)絡(luò)Goodfellow等人并未使用中間層的對(duì)抗樣本,因?yàn)榘l(fā)現(xiàn)中間層對(duì)抗樣本沒(méi)有提升Min-max

Robust

OptimizationNoklandetal.Improvingback-propagationbyaddinganadversarialgradient.

arXiv:1510.04189,

2015.Huang

et

al.

Learningwithastrongadversary,

ICLR

2016.

Shaham

et

al.

Understandingadversarialtraining:Increasinglocalstabilityofneuralnetsthroughrobustoptimization,

arXiv:1511.05432,

2015The

First

Proposal

of

Min-Max

Optimization內(nèi)部最大化Inner

maximization外部最小化Outer

minimization

Virtual

Adversarial

TrainingMiyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.VAT:

a

method

to

improve

generalizationDifferences

to

adversarial

trainingL2

regularized

perturbationUse

both

clean

and

adv

examples

for

trainingUse

KL

divergence

to

generate

adv

examplesWeaknesses

of

Early

AT

MethodsMiyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.

These

methods

are

fast!

Only

takes

x2

time

of

standard

trainingPGD

Adversarial

TrainingAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

et

al.Synthesizingrobustadversarialexamples.ICML,2018.We

got

a

survivor!PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.A

Saddle

Point

Problem內(nèi)部最大化Inner

maximization外部最小化Outer

minimizationA

saddle

point(constrained

bi-level

optimization)

problemIn

constrained

optimization,

Projected

Gradient

Descent

(PGD)

is

the

best

first-order

solver

PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.Projected

Gradient

Descent

(PGD)PGD

is

an

optimizerPGD

is

also

known

as

an

adv

attack

in

the

field

of

AML

Projection(Clipping)

PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.Projected

Gradient

Descent

(PGD)Random

initialization

+

Uniform

Noise

PGD

Adversarial

TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”

ICLR.2018.

Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”

NeurIPS,

2019.Characteristics

of

PGD

adversarial

training

PGD

Adversarial

TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”

ICLR.2018.

Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”

NeurIPS,

2019.決策邊界:魯棒特征:普通訓(xùn)練對(duì)抗訓(xùn)練普通訓(xùn)練對(duì)抗訓(xùn)練對(duì)抗樣本Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

PGD步長(zhǎng)對(duì)魯棒性影響PGD步數(shù)對(duì)魯棒性影響訓(xùn)練初期使用簡(jiǎn)單攻擊Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

How

to

measure

the

convergence

of

the

inner

maximization?

Definition(

First-OrderStationaryCondition

(FOSC))

Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

Dynamic

Adversarial

Training:Weak

attack

for

early

training,

strong

attack

for

later

trainingWeak

attack

improves

generalization,

strong

attack

improves

final

robustness.Convergence

analysis:DART

improves

robustnessRobustness

on

CIFAR-10

with

WideResNetTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Use

distribution

loss

(KL)

for

inner

and

outer

optimizationsTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Winning

solutions

of

NeurIPS2018AdversarialVisionChallengeCharacteristics

of

TRADES使用KL監(jiān)督對(duì)抗樣本的生成,魯棒性提升顯著干凈樣本也參與訓(xùn)練,有利于模型收斂和干凈準(zhǔn)確率基于KL的對(duì)抗樣本生成包含自適應(yīng)的過(guò)程能成訓(xùn)練得到比PGD對(duì)抗訓(xùn)練更平滑的決策邊界TRADES既改進(jìn)了內(nèi)部最大化又改進(jìn)了外部最小化TRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Experimental

results

of

TRADESTRADES

vs

VAT

vs

ALPZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Miyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.Kannan,Harini,AlexeyKurakin,andIanGoodfellow."Adversariallogitpairing."

arXivpreprintarXiv:1803.06373

(2018).TRADES:Virtual

Adversarial

Training:Adversarial

Logits

Pairing:相似的優(yōu)化框架,不同的損失選擇,結(jié)果差異很大MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.Adversarialexamplesareonlydefinedoncorrectlyclassifiedexamples,

what

about

misclassified

examples

?

MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.The

influence

of

misclassifiedandcorrectlyclassifiedexamples:

Misclassifiedexampleshaveasignificantimpactonthefinalrobustness!MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.

differentmaximizationtechniques

have

negligibleeffectdifferentminimizationtechniques

have

significanteffectMART:

MisclassificationAwareadveRsarialTrainingMisclassificationawareadversarialrisk:Adversarial

risk:Correctlyclassified

and

misclassifiedexample:Misclassificationawareadversarialrisk:MART:

MisclassificationAwareadveRsarialTrainingSurrogatelossfunctions(existingmethodsandMART):Semi-supervisedextensionofMART:MART:

MisclassificationAwareadveRsarialTrainingRobustness

of

MART:White-boxrobustness:ResNet-18,CIFAR-10,??=8/255White-boxrobustness:WideResNet-34-10,CIFAR-10,??=8/255Using

More

Data

to

Improve

RobustnessAlayraceta

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論