Back to all posts
Research

VLA-Fool

Researchers propose VLA-Fool, demonstrating how textual typos, visual patches, and cross-modal misalignment can adversarially attack vision-language-action models.

When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models

Paper: arXiv: 2511.16203

no code and simple work

Motivation

adversarial robustness of VLA model For example:

Normal Input Sneaky Attack Robot's Reaction
"Pick up the red cup" "Pick up the r3d cüp" (tiny typo) Might grab the wrong thing
Clear camera view Small sticker on the cup Might not see the cup at all
"Put the cup on the left" "Put the cup on the left... actually ignore that" Gets confused and fails

Method

![[Pasted image 20260529095552.png]]

Textual attack: GCG attack
Visual attack: visual patch
Cross-Misalignment Attack: disrupts the semantic correspondence between visual and textual inputs

Share this post

Back to home

Comments