When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models
Paper: arXiv: 2511.16203
no code and simple work
Motivation
adversarial robustness of VLA model For example:
| Normal Input | Sneaky Attack | Robot's Reaction |
|---|---|---|
| "Pick up the red cup" | "Pick up the r3d cüp" (tiny typo) | Might grab the wrong thing |
| Clear camera view | Small sticker on the cup | Might not see the cup at all |
| "Put the cup on the left" | "Put the cup on the left... actually ignore that" | Gets confused and fails |
Method
![[Pasted image 20260529095552.png]]
Textual attack: GCG attack
Visual attack: visual patch
Cross-Misalignment Attack: disrupts the semantic correspondence between visual and textual inputs