When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models

Paper: arXiv: 2511.16203

no code and simple work

Motivation

adversarial robustness of VLA model For example:

Normal Input	Sneaky Attack	Robot's Reaction
"Pick up the red cup"	"Pick up the r3d cüp" (tiny typo)	Might grab the wrong thing
Clear camera view	Small sticker on the cup	Might not see the cup at all
"Put the cup on the left"	"Put the cup on the left... actually ignore that"	Gets confused and fails

Method

![[Pasted image 20260529095552.png]]

Textual attack: GCG attack
Visual attack: visual patch
Cross-Misalignment Attack: disrupts the semantic correspondence between visual and textual inputs

Share this post

Back to home

VLA-Fool

Motivation

Method

Comments

Motivation

Method

Comments

Scan to share on WeChat