본문 바로가기

Transformer4

[논문 리뷰] Transformers without Normalization https://arxiv.org/pdf/2503.10622요약Transformer 모델에서 Normalization Layer를 안 쓴 경우를 보신적 있나요?최근 Normalization Layer는 Neural Networks에서 가장 기존적인 요소로 자리매김했으며,특히 Transformer 모델에서는 압도적으로 많이 사용되고 있습니다오늘 살펴본 논문을 Normalization Layer를 대체 가능한 Dynamic Tanh에 대해 살펴보고자 합니다. Introduction- Normalization Layer 사용 필수일까?[현재 상황]2015년 Batch Normalization이 제안되고 나서부터 현재까지 Normalization Layer는 사실상 모든 네트워크에서 사용되고 있습니다. 이는 최적.. 2025. 4. 15.
[논문 리뷰] BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer 논문 제목: BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from TransformerBERT4Rec: Sequential Recommendation with Bidirectional Encoder... BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from TransformerModeling users' dynamic and evolving preferences from their historical behaviors is challenging and crucial for recommendation systems... 2024. 8. 5.
[논문 리뷰] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 논문 제목: An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision.. 2024. 7. 19.
[논문 리뷰] (Transformer) Attention Is All You Need: Attention Is All You Need Attention Is All You NeedThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a newarxiv.org Transformer 배경기존 모델 한계 소개한계점을 극복한 제안 모델 소개 제안모델Transformer 모델 아키텍처Multi-Head s.. 2024. 7. 17.