Recently - TechBlog

Engineering

2026-06-02

BAGEL 复现

ByteDance's BAGEL model uses a Mixture-of-Transformer-Experts architecture for unified image understanding, generation, and editing, requiring Ampere GPUs for full bfloat16 support.

#moe #multimodal #unified model

Read article

Engineering

2026-06-02

VILA-U 复现

A technical guide for reproducing the VILA-U multimodal model on AutoDL, covering environment setup, storage optimization, model download, inference, and common troubleshooting.

#model reproduction #multi modal #unified model

Read article

Notes · CS231n Learning Notes

2026-05-29

Regularization & Optimization

Regularization prevents overfitting by penalizing model complexity, while advanced optimizers like AdamW and learning rate schedules improve convergence and generalization in neural network training.

#cs231n #optimization #regularization

Read article

Notes · CS231n Learning Notes

2026-05-29

Image Classification with Linear Classifiers

Linear classifiers use learned weight matrices and biases to assign class scores, enabling fast inference but only handling linearly separable data.

#cs231n #image classification #linear classifiers

Read article

Research · Adversarial Robustness in VLA Models

2026-05-29

VLA-Fool

Researchers propose VLA-Fool, demonstrating how textual typos, visual patches, and cross-modal misalignment can adversarially attack vision-language-action models.

#VLA #adversarial attacks #multimodal robustness

Read article

Research · Adversarial Robustness in VLA Models

2026-05-29

StableVLA introduces IB-Adapter, a plug-and-play module grounded in information bottleneck theory that enhances vision-language-action model robustness to visual corruptions without requiring extra training data.

#VLA #information bottleneck #robust robotics

Read article