Transformers are SSMs

Posted Jun 3, 2024

By Fodev JEO 2 min read

Generalized Models and Efficient Algorithms Through Structured State Space Duality

While Transformers have been the main architecture behind deep learning’s success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mamba’s selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

🧙Paper Authors: Tri Dao∗1 and Albert Gu∗2 1Department of Computer Science, Princeton University 2Machine Learning Department, Carnegie Mellon University
1️⃣Read the Full Paper here: https://arxiv.org/abs/2405.21060
2️⃣Project Page: https://huggingface.co/papers/2405.21060
3️⃣Code: Coming 🔜

Translate to Korean

구조화된 상태공간 이중성을 통한 일반화된 모델과 효율적인 알고리즘

트랜스포머는 언어 모델링에서 딥 러닝의 성공을 뒷받침하는 주요 아키텍처였지만, 최근에는 Mamba와 같은 상태 공간 모델(SSM)이 중소 규모에서 트랜스포머와 동등하거나 더 우수한 것으로 나타났습니다. 우리는 이러한 모델 계열이 실제로 매우 밀접하게 관련되어 있음을 보여주며, 잘 연구된 구조화된 반분리 가능한 행렬 클래스의 다양한 분해를 통해 연결된 SSM과 주의 변형 간의 이론적 연결에 대한 풍부한 프레임워크를 개발합니다. 상태 공간 이중성(SSD) 프레임워크를 통해 핵심 계층이 2-8배 더 빠른 Mamba의 선택적 SSM을 개선한 새로운 아키텍처(Mamba-2)를 설계하는 동시에 언어 모델링에서 Transformers와 계속 경쟁할 수 있습니다.

Paper, SSM

AI Transformer SSM

This post is licensed under CC BY 4.0 by the author.

Generalized Models and Efficient Algorithms Through Structured State Space Duality

구조화된 상태공간 이중성을 통한 일반화된 모델과 효율적인 알고리즘

Trending Tags