728x90 Scaling Laws1 Scaling Laws for Neural Language Models (2020) Review Scaling Laws for Neural Language Models We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitu arxiv.org 0. 핵심 요약 Cross-entropy loss를 사용할 때, Transformer 계열의 Language Model들은 model size, dataset si.. 2024. 3. 20. 이전 1 다음 728x90