數(shù)據(jù)與模型安全課件第5周：對(duì)抗防御

上傳人：y*** IP屬地：山東上傳時(shí)間：2024-10-12 格式：PPTX 頁(yè)數(shù)：65 大?。?2.75MB 積分：15 舉報(bào) 版權(quán)申訴

已閱讀5頁(yè)，還剩60頁(yè)未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說(shuō)明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Adversarial

Defense姜育剛，馬興軍，吳祖煊Recap:

week

Adversarial

Example

DetectionSecondary

Classification

Methods

(二級(jí)分類法)Principle

Component

Analysis

(主成分分析法，PCA)Distribution

Detection

Methods

(分布檢測(cè)法)Prediction

Inconsistency

(預(yù)測(cè)不一致性)Reconstruction

Inconsistency

(重建不一致性)Trapping

Based

Detection

(誘捕檢測(cè)法)Adversarial

Attack

CompetitionLink:https://codalab.lisn.upsaclay.fr/competitions/15669?secret_key=77cb8986-d5bd-4009-82f0-7dde2e819ff8

Adversarial

Defense

DetectionThe

weird

relationship

between

defense

and

detectionDetection

defenseBut…

when

say

defense,

(most

the

time)

mean

the

model

secured,

yet

detection

cannot

that…In

survey

papers:

detection

defenseIn

technical

papers:

defense

not

detectionDifferencesDefense

secure

the

model

the

systemDetection

identify

potential

threats,

which

should

befollowed

defense

strategy,

e.g.,

query

rejection

(but

mostly

ignored)By

defense,

mostly

means

robust

training

methodsDefense

MethodsEarly

Defense

MethodsEarly

Adversarial

Training

MethodsLater

Adversarial

Training

MethodsRemaining

Challenges

and

Recent

ProgressA

Recap

the

Timeline2014年Goodfellow等人提出快速單步攻擊FGSM及對(duì)抗訓(xùn)練2015年簡(jiǎn)單檢測(cè)方法（PCA）和對(duì)抗訓(xùn)練方法2016年提出對(duì)抗訓(xùn)練的min-max優(yōu)化框架2017年大量的對(duì)抗樣本檢測(cè)方法和攻擊方法（BIM、C&W）、10種檢測(cè)方法被攻破2019年TRADES及大量其他對(duì)抗訓(xùn)練方法、第一篇Science文章2018年物理世界攻擊方法、檢測(cè)方法升級(jí)、PGD攻擊與對(duì)抗訓(xùn)練、9種防御方法被攻破2020年AutoAttack攻擊、Fast對(duì)抗訓(xùn)練2021年增大模型、增加數(shù)據(jù)的對(duì)抗訓(xùn)練、領(lǐng)域延伸2022年尚未解決的問(wèn)題，攻擊越來(lái)越多，防御越來(lái)越難2013年Biggio等人與Szegedy等人發(fā)現(xiàn)對(duì)抗樣本Principles

DefenseBlock

the

attack

(掐頭去尾)Mask

the

input

gradientsRegularize

the

input

gradientsDistill

the

logitsDenoise

the

inputRobustify

the

model

(增強(qiáng)中間)Smooth

the

decision

boundaryReduce

the

Lipschitzness

the

modelSmooth

the

loss

landscapeAdversarial

AttackSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.

模型訓(xùn)練:

對(duì)抗攻擊:

分類錯(cuò)誤擾動(dòng)很小測(cè)試階段攻擊

Performance

Metrics

Othermetrics:maximumperturbationfor100%attacksuccessrate

Defense

MethodsEarly

Defense

MethodsEarly

Adversarial

Training

MethodsAdvanced

Adversarial

Training

MethodsRemaining

Challenges

and

Recent

ProgressDefensive

DistillationMaking

large

logits

change

”small”Scaling

logits

few

magnitudes;Retrain

the

lastlayerwith

scaled

logits;Papernot

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016

12Distillation

with

temperature

TDefensive

DistillationPapernot

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016Defensive

DistillationPapernot

al.

Distillationasadefensetoadversarialperturbationsagainstdeepneuralnetworks,

S&P,

2016

Defensive

Distillation

Not

RobustCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."

arXivpreprintarXiv:1607.04311

(2016).

can

evaded

attacking

the

distilled

network

with

the

temperature

Lessons

LearnedCarlini,Nicholas,andDavidWagner."Defensivedistillationisnotrobusttoadversarialexamples."

arXivpreprintarXiv:1607.04311

(2016).Distillation

not

good

solution

for

adversarial

robustnessVanishing

input

gradients

can

still

recovered

reverse

operationA

defense

should

beevaluated

against

the

adaptive

attack

prove

real

robustnessInput

Gradients

RegularizationRoss

al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."

AAAI,2018.Drucker,Harris,andYannLeCun.“Improvinggeneralizationperformanceusingdoublebackpropagation.”

TNN,1992.

Classification

lossInput

gradients

regularization

the

double

backpropagation

proposed

DruckerandLeCun(1992):Input

Gradients

RegularizationRoss

al."Improvingtheadversarialrobustnessandinterpretabilityofdeepneuralnetworksbyregularizingtheirinputgradients."

AAAI,2018.Issues:

limited

adversarial

robustness,

hurts

learning蒸餾的對(duì)抗訓(xùn)練的正則化的Feature

SqueezingXu

al."Featuresqueezing:Detectingadversarialexamplesindeepneuralnetworks."

NDSS,2018.Compress

the

input

spaceIt

also

hurts

performance

large-scale

image

datasets.ThermometerEncodingBuckmanetal."Thermometerencoding:Onehotwaytoresistadversarialexamples."

ICLR,2018.Discretize

the

input

break

small

noiseProposed

Thermometer

EncodingInput

TransformationsGuoetal."CounteringAdversarialImagesusingInputTransformations."

ICLR,2018.ImagecroppingandrescalingBit-depthreductionJPEGcompressionTotal

variance

minimizationImagequiltingObfuscated

Gradients

Fake

RobustnessAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

al.Synthesizingrobustadversarialexamples.ICML,2018.BackwardPassDifferentiableApproximation(BPDA):

Expectation

Over

Transformation

(EOT)T:

set

randomized

transformationscan

break

non-differentiable

operation

based

defensescan

break

randomization

based

defensesfind

linear

approximation

the

non-differentiable

operations,

e.g.,

discretization,

compression

etc.

BPDA+EOT

breaks

defenses

published

ICLR

2018Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

al.Synthesizingrobustadversarialexamples.ICML,2018.We

got

survivor!How

Properly

Evaluate

Defense?Carlini,Nicholas,etal.“Onevaluatingadversarialrobustness.”

arXivpreprintarXiv:1902.06705

(2019).

Athalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018Donotblindlyapplymultiple(similar)attacksTryatleastonegradient-freeattackandonehard-labelattackPerformatransferabilityattackusingasimilarsubstitutemodel.Forrandomizeddefenses,properlyensembleoverrandomnessFornon-differentiablecomponents,applydifferentiabletechniques

(BPDA)VerifythattheattackshaveconvergedundertheselectedhyperparametersCarefullyinvestigateattackhyperparametersandreportthoseselectedCompareagainstpriorworkandexplainimportantdifferencesTestbroaderthreatmodelswhenproposinggeneraldefensesRobust

Activation

FunctionsXiao

al."EnhancingAdversarialDefensebyk-Winners-Take-All."

ICLR,2020.Block

the

internal

activation:

break

the

continuityk-Winners-Take-All(k-WTA)activationRobust

Loss

FunctionPang

al.Rethinkingsoftmaxcross-entropylossforadversarialrobustness.ICLR,2020.Max-Mahalanobiscenter(MMC)

lossMax-Mahalanobiscenter(MMC);

SCE:

softmax

cross

entropyRobust

InferencePang

al.MixupInference:Betterexploitingmixuptodefendadversarialattacks.ICLR,2020.Mixup

Inference

(MI)New

Adaptive

Attacks

Break

These

DefensesTrameretal.“Onadaptiveattackstoadversarialexampledefenses.”

NeurIPS,

2020.T1:

AttackthefulldefenseT2:

Target

importantdefensepartsT3:

SimplifytheattackT4:

EnsureconsistentlossfunctionT5:

OptimizewithdifferentmethodsT6:

UsestrongadaptiveattacksHow

Evaluate

aDefense?CroceandHein.“Reliableevaluationofadversarialrobustnesswithanensembleofdiverseparameter-freeattacks.”

ICML,2020.

Gao

al.

FastandReliableEvaluationofAdversarialRobustnesswithMinimum-MarginAttack,

ICML

2022.Zimmermann

etal."IncreasingConfidenceinAdversarialRobustnessEvaluations."

arXivpreprintarXiv:2206.13991

(2022).Strong

attacks:AutoAttack

(one

must-to-test

attack)Margin

Decomposition

(MD)

attack

(better

than

AutoAttack

ViT)Minimum-Margin

(MM)

attack

(new

SOTA

attack

test?)Extra

robustness

tests

Attackunittests

(Zimmermannetal,2022)Adversarial

TrainingGoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.The

idea

simple:

just

train

adversarial

examples!

對(duì)抗訓(xùn)練是一種數(shù)據(jù)增廣方法原始數(shù)據(jù)->對(duì)抗攻擊->對(duì)抗樣本->模型訓(xùn)練Adversarial

Training

Adversarial

TrainingAdversarial

training

produces

smooth

decision

boundary正常邊界生成對(duì)抗樣本訓(xùn)練后Early

Adversarial

Training

MethodsSzegedyC,ZarembaW,SutskeverI,etal.Intriguingpropertiesofneuralnetworks[J].ICLR

2014.GoodfellowIJ,ShlensJ,SzegedyC.Explainingandharnessingadversarialexamples[J].ICLR

2015.2014年，Szegedy

al.

在其解釋對(duì)抗樣本的論文中已探索了對(duì)抗訓(xùn)練，用L-BFGS攻擊對(duì)神經(jīng)網(wǎng)絡(luò)每一層生成對(duì)抗樣本，并添加到訓(xùn)練過(guò)程中。發(fā)現(xiàn)：深層對(duì)抗樣本更能提高魯棒性2015年，Goodfellow

al.提出使用FGSM（單步）攻擊生成的對(duì)抗樣本來(lái)訓(xùn)練神經(jīng)網(wǎng)絡(luò)Goodfellow等人并未使用中間層的對(duì)抗樣本，因?yàn)榘l(fā)現(xiàn)中間層對(duì)抗樣本沒(méi)有提升Min-max

Robust

OptimizationNoklandetal.Improvingback-propagationbyaddinganadversarialgradient.

arXiv:1510.04189,

2015.Huang

al.

Learningwithastrongadversary,

ICLR

2016.

Shaham

al.

Understandingadversarialtraining:Increasinglocalstabilityofneuralnetsthroughrobustoptimization,

arXiv:1511.05432,

2015The

First

Proposal

Min-Max

Optimization內(nèi)部最大化Inner

maximization外部最小化Outer

minimization

Virtual

Adversarial

TrainingMiyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.VAT:

method

improve

generalizationDifferences

adversarial

trainingL2

regularized

perturbationUse

both

clean

and

adv

examples

for

trainingUse

divergence

generate

adv

examplesWeaknesses

Early

MethodsMiyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.

These

methods

are

fast!

Only

takes

time

standard

trainingPGD

Adversarial

TrainingAthalyeetal.“ObfuscatedGradientsGiveaFalseSenseofSecurity:CircumventingDefensestoAdversarialExamples.”

ICML,2018.

Athalye

al.Synthesizingrobustadversarialexamples.ICML,2018.We

got

survivor!PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.A

Saddle

Point

Problem內(nèi)部最大化Inner

maximization外部最小化Outer

minimizationA

saddle

point(constrained

bi-level

optimization)

problemIn

constrained

optimization,

Projected

Gradient

Descent

(PGD)

the

best

first-order

solver

PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.Projected

Gradient

Descent

(PGD)PGD

optimizerPGD

also

known

adv

attack

the

field

AML

Projection(Clipping)

PGD

Adversarial

TrainingMadryetal."TowardsDeepLearningModelsResistanttoAdversarialAttacks."

ICLR.2018.Projected

Gradient

Descent

(PGD)Random

initialization

Uniform

Noise

PGD

Adversarial

TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”

ICLR.2018.

Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”

NeurIPS,

2019.Characteristics

PGD

adversarial

training

PGD

Adversarial

TrainingMadryetal.“TowardsDeepLearningModelsResistanttoAdversarialAttacks.”

ICLR.2018.

Ilyasetal.“Adversarialexamplesarenotbugs,theyarefeatures.”

NeurIPS,

2019.決策邊界：魯棒特征：普通訓(xùn)練對(duì)抗訓(xùn)練普通訓(xùn)練對(duì)抗訓(xùn)練對(duì)抗樣本Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

PGD步長(zhǎng)對(duì)魯棒性影響PGD步數(shù)對(duì)魯棒性影響訓(xùn)練初期使用簡(jiǎn)單攻擊Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

How

measure

the

convergence

the

inner

maximization?

Definition(

First-OrderStationaryCondition

(FOSC))

Dynamic

Adversarial

Training

(DART)Wangetal.“OntheConvergenceandRobustnessofAdversarialTraining.”

ICML.2019.

Dynamic

Adversarial

Training:Weak

attack

for

early

training,

strong

attack

for

later

trainingWeak

attack

improves

generalization,

strong

attack

improves

final

robustness.Convergence

analysis:DART

improves

robustnessRobustness

CIFAR-10

with

WideResNetTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Use

distribution

loss

(KL)

for

inner

and

outer

optimizationsTRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Winning

solutions

NeurIPS2018AdversarialVisionChallengeCharacteristics

TRADES使用KL監(jiān)督對(duì)抗樣本的生成，魯棒性提升顯著干凈樣本也參與訓(xùn)練，有利于模型收斂和干凈準(zhǔn)確率基于KL的對(duì)抗樣本生成包含自適應(yīng)的過(guò)程能成訓(xùn)練得到比PGD對(duì)抗訓(xùn)練更平滑的決策邊界TRADES既改進(jìn)了內(nèi)部最大化又改進(jìn)了外部最小化TRADESZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Experimental

results

TRADESTRADES

VAT

ALPZhangetal."Theoreticallyprincipledtrade-offbetweenrobustnessandaccuracy."

ICML,2019.Miyato

etal.Distributionalsmoothingwithvirtualadversarialtraining.ICLR

2016.Kannan,Harini,AlexeyKurakin,andIanGoodfellow."Adversariallogitpairing."

arXivpreprintarXiv:1803.06373

(2018).TRADES:Virtual

Adversarial

Training:Adversarial

Logits

Pairing:相似的優(yōu)化框架，不同的損失選擇，結(jié)果差異很大MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.Adversarialexamplesareonlydefinedoncorrectlyclassifiedexamples,

what

about

misclassified

examples

MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.The

influence

misclassifiedandcorrectlyclassifiedexamples:

Misclassifiedexampleshaveasignificantimpactonthefinalrobustness!MART:

MisclassificationAwareadveRsarialTrainingWang,etal.“Improvingadversarialrobustnessrequiresrevisitingmisclassifiedexamples.”

ICLR,

2019.

differentmaximizationtechniques

have

negligibleeffectdifferentminimizationtechniques

have

significanteffectMART:

MisclassificationAwareadveRsarialTrainingMisclassificationawareadversarialrisk:Adversarial

risk:Correctlyclassified

and

misclassifiedexample:Misclassificationawareadversarialrisk:MART:

MisclassificationAwareadveRsarialTrainingSurrogatelossfunctions(existingmethodsandMART):Semi-supervisedextensionofMART:MART:

MisclassificationAwareadveRsarialTrainingRobustness

MART:White-boxrobustness:ResNet-18,CIFAR-10,??=8/255White-boxrobustness:WideResNet-34-10,CIFAR-10,??=8/255Using

Data

Improve

RobustnessAlayraceta

人人文庫(kù)> 全部分類> 教育資料 > 課件下載

溫馨提示

1. 本站所有資源如無(wú)特殊說(shuō)明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間，僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請(qǐng)與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

數(shù)據(jù)與模型安全課件第5周：對(duì)抗防御

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

數(shù)據(jù)與模型安全 課件 第5周：對(duì)抗防御

文檔簡(jiǎn)介

溫馨提示

最新文檔

評(píng)論

相關(guān)文檔

數(shù)據(jù)與模型安全課件第5周：對(duì)抗防御