VLA-Fool
Researchers propose VLA-Fool, demonstrating how textual typos, visual patches, and cross-modal misalignment can adversarially attack vision-language-action models.
All posts tagged with "multimodal robustness".
Researchers propose VLA-Fool, demonstrating how textual typos, visual patches, and cross-modal misalignment can adversarially attack vision-language-action models.