Post

Transformers are SSMs

Generalized Models and Efficient Algorithms Through Structured State Space Duality

While Transformers have been the main architecture behind deep learningโ€™s success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices. Our state space duality (SSD) framework allows us to design a new architecture (Mamba-2) whose core layer is an a refinement of Mambaโ€™s selective SSM that is 2-8X faster, while continuing to be competitive with Transformers on language modeling.

๐Ÿง™Paper Authors: Tri Daoโˆ—1 and Albert Guโˆ—2 1Department of Computer Science, Princeton University 2Machine Learning Department, Carnegie Mellon University

 State Space Model are semiseparable matrix transformers

Translate to Korean

๊ตฌ์กฐํ™”๋œ ์ƒํƒœ๊ณต๊ฐ„ ์ด์ค‘์„ฑ์„ ํ†ตํ•œ ์ผ๋ฐ˜ํ™”๋œ ๋ชจ๋ธ๊ณผ ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜

ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์–ธ์–ด ๋ชจ๋ธ๋ง์—์„œ ๋”ฅ ๋Ÿฌ๋‹์˜ ์„ฑ๊ณต์„ ๋’ท๋ฐ›์นจํ•˜๋Š” ์ฃผ์š” ์•„ํ‚คํ…์ฒ˜์˜€์ง€๋งŒ, ์ตœ๊ทผ์—๋Š” Mamba์™€ ๊ฐ™์€ ์ƒํƒœ ๊ณต๊ฐ„ ๋ชจ๋ธ(SSM)์ด ์ค‘์†Œ ๊ทœ๋ชจ์—์„œ ํŠธ๋žœ์Šคํฌ๋จธ์™€ ๋™๋“ฑํ•˜๊ฑฐ๋‚˜ ๋” ์šฐ์ˆ˜ํ•œ ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ๋ชจ๋ธ ๊ณ„์—ด์ด ์‹ค์ œ๋กœ ๋งค์šฐ ๋ฐ€์ ‘ํ•˜๊ฒŒ ๊ด€๋ จ๋˜์–ด ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ž˜ ์—ฐ๊ตฌ๋œ ๊ตฌ์กฐํ™”๋œ ๋ฐ˜๋ถ„๋ฆฌ ๊ฐ€๋Šฅํ•œ ํ–‰๋ ฌ ํด๋ž˜์Šค์˜ ๋‹ค์–‘ํ•œ ๋ถ„ํ•ด๋ฅผ ํ†ตํ•ด ์—ฐ๊ฒฐ๋œ SSM๊ณผ ์ฃผ์˜ ๋ณ€ํ˜• ๊ฐ„์˜ ์ด๋ก ์  ์—ฐ๊ฒฐ์— ๋Œ€ํ•œ ํ’๋ถ€ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ•ฉ๋‹ˆ๋‹ค. ์ƒํƒœ ๊ณต๊ฐ„ ์ด์ค‘์„ฑ(SSD) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ํ•ต์‹ฌ ๊ณ„์ธต์ด 2-8๋ฐฐ ๋” ๋น ๋ฅธ Mamba์˜ ์„ ํƒ์  SSM์„ ๊ฐœ์„ ํ•œ ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜(Mamba-2)๋ฅผ ์„ค๊ณ„ํ•˜๋Š” ๋™์‹œ์— ์–ธ์–ด ๋ชจ๋ธ๋ง์—์„œ Transformers์™€ ๊ณ„์† ๊ฒฝ์Ÿํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

This post is licensed under CC BY 4.0 by the author.