반응형
Optimization
-
[Paper] Gradient Multi-Normalization for Stateless and Scalable LLM TrainingML engineer/Papers & CS generals 2025. 9. 11. 14:06
https://arxiv.org/abs/2502.06742 Gradient Multi-Normalization for Stateless and Scalable LLM TrainingTraining large language models (LLMs) typically relies on adaptive optimizers like Adam (Kingma & Ba, 2015) which store additional state information to accelerate convergence but incur significant memory overhead. Recent efforts, such as SWAN (Ma et al., 2arxiv.org알고리즘 논문 답게, 최근의 LLM 모델 리포트 논문들과 ..