728x90 AI23 Huggingface Daily Papers - 2025.04.21 https://huggingface.co/papers/date/2025-04-21 Daily Papers - Hugging Facenew Get trending papers in your email inbox once a day! Get trending papers in your email inbox! Subscribehuggingface.co1. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?(게재일: 21.04.2025, 소속: LeapLab, Tsinghua University & Shanghai Jiao Tong University)주요 TaskReinforcement Le.. 2025. 4. 23. Efficiently Modeling Long Sequences with Structured State Spaces(S4) (2021) Review Efficiently Modeling Long Sequences with Structured State Spaces A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers h arxiv.org 0. 핵심 요약 SSM(State Space Model)의 이론적인 강점은 유지하고 더 효율적으로 계산할 수 있는 모델.. 2024. 3. 24. Scaling Laws for Neural Language Models (2020) Review Scaling Laws for Neural Language Models We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitu arxiv.org 0. 핵심 요약 Cross-entropy loss를 사용할 때, Transformer 계열의 Language Model들은 model size, dataset si.. 2024. 3. 20. BitNet: Scaling 1-bit Transformers forLarge Language Models (2023) Review BitNet: Scaling 1-bit Transformers for Large Language Models The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed arxiv.org 0. 핵심 요약 LLM을 위한 1-bit Transformer 제안 1-bit weight 학습을 위해 nn.Linear를 대.. 2024. 3. 13. 이전 1 2 3 4 ··· 6 다음 728x90