油を売る

Universal adversarial perturbations

Adversarial Attacks DeepLearning CVPR

CVPR2017採択論文"Universal adversarial perturbations"のレビュー．

元論文はこちら

複数画像に対して汎用的に使えるuniversal perturbationを用いた分類機に対する攻撃が可能であることを示した論文．

Abstract

ニューラルネットワークに対して，一つのAdversarial Perturbationで複数の画像に適用可能なAdversarial Examplesを生成できるUniversal Adversarial Perturbationsを提案．

今までのAdversarial Perturbationsは，一つの画像につき一つの摂動ベクトルを生成して分類器の誤分類を誘発していたが，この論文では一つの摂動ベクトルで複数の画像に対する誤分類を誘発できるuniversal adversarial perturbationの存在を示し，そのようなベクトルを生成する手法を提案している．

f:id:noconocolib:20190131160443p:plain — Figure 1: When added to a natural image, a universal perturbation image causes the image to be misclassified by the deep neural network with high probability. Left images: Original natural images. The labels are shown on top of each arrow. Central image: Universal perturbation. Right images: Perturbed images. The estimated labels of the perturbed images are shown on top of each arrow.

Universal Perturbations

提案手法の目的は，分類器 $\hat{k}$ を，分布 $\mu$ からサンプリングされる殆どのデータ点について騙せるような摂動ベクトル $v \in \mathbb{R}^d$ を探すこと．

$\hat{k}(x + v) \neq \hat{k} (x)\ \ for "most" x\sim \mu$

摂動ベクトル $v$ は以下を満たす

$|v|_p \leq \epsilon$
$\mathbb{P}_{x\sim{\mu}} (\hat{k} (x + v) \neq \hat{k}(x)) \geq 1 - \delta$

パラメータ $\epsilon$ は摂動ベクトルのサイズ， $\delta$ は画像全体についてのエラー率．

f:id:noconocolib:20190131160709p:plain — Schematic representation of the proposed algorithm used to compute universal perturbations.

提案アルゴリズムは，データ集合 $X$ 全体に渡ってイテレートし，徐々にuniversarial perturbationを更新していく．

$\delta v_i \gets argmin_r |r|_2 \ s.t.\ \hat{k} (x_i + v + r) \neq \hat{k} (x_i)$

この最適化問題について， $|v|_p \leq \epsilon$ を満たすため，更新するuniversal perturbationは中心0，半径 $\epsilon$ の $l_p$ 球上にマッピングされるようにする．

$P_{p, \epsilon} = argmin_{v'} |v - v'|_2 \ \ subject to |v'|_p \leq \epsilon$

f:id:noconocolib:20190131160939p:plain

実験結果

提案手法を用いた際の，学習セットとバリデーションセットに対するエラー率．

f:id:noconocolib:20190131161001p:plain — Table 1: Fooling ratios on the set X, and the validation set.

摂動を加えられた画像と対応するラベル．

f:id:noconocolib:20190131161022p:plain — Figure 3: Examples of perturbed images and their corresponding labels. The first 8 images belong to the ILSVRC 2012 validation set, and the last 4 are images taken by a mobile phone camera. See supp. material for the original images.

GoogLeNetに対する複数種類のuniversarial perturbations．

f:id:noconocolib:20190131161041p:plain — Figure 5: Diversity of universal perturbations for the GoogLeNet architecture. The five perturbations are generated using different random shufflings of the set X. Note that the normalized inner products for any pair of universal perturbations does not exceed 0.1, which highlights the diversity of such perturbations.