Post

Llama 3 implemented in pure NumPy

๐Ÿฆ™ Llama 3 implemented in pure NumPy ๐Ÿ‘ฉโ€๐Ÿ”ฌ

๐Ÿš€ Exciting discovery! Came across a fascinating article on Llama 3 model implemented in NumPy, inspired by @Andrej Karpathy. The Llama 3 model at AI at Meta is making waves with its impressive scale and performance. ๐ŸŒŸ

๐Ÿง‘โ€๐Ÿ’ป Code : https://github.com/likejazz/llama3.np

๐Ÿ” With 24K GPUs, 15T training data, 10M instruction data, and 1.3M GPU hours, the numbers are truly overwhelming. Despite the transition to using GQA, the model structure remains unchanged from Llama 2, making it a familiar yet powerful framework.

๐Ÿง  To enhance understanding, Author are focusing on an accurate implementation using NumPy. Leveraging the stories15M model trained by Andrej Karpathy, weโ€™re converting it to a NumPy compressed format for a more intuitive model structure. Stay tuned as we transform the Karpathy-trained Llama 2 model into executable code, maintaining clarity and precision in our approach.

๐Ÿ“Š While incorporating GQA into our code, Author wonโ€™t apply it to model behavior, ensuring a seamless implementation of NumPy for enhanced interpretability. Stay tuned for more insights into this innovative approach!

Translate to Korean

๐Ÿฆ™ ์ˆœ์ˆ˜ NumPy๐Ÿ‘ฉ ๐Ÿ”ฌ๋กœ ๊ตฌํ˜„๋œ ๋ผ๋งˆ 3

๐Ÿš€ ํฅ๋ฏธ ์ง„์ง„ํ•œ ๋ฐœ๊ฒฌ! @Andrej Karpathy์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ NumPy์—์„œ ๊ตฌํ˜„ ๋œ Llama 3 ๋ชจ๋ธ์— ๋Œ€ํ•œ ํฅ๋ฏธ๋กœ์šด ๊ธฐ์‚ฌ๋ฅผ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. AI at Meta ์˜ ๋ผ๋งˆ 3 ๋ชจ๋ธ์€ ์ธ์ƒ์ ์ธ ๊ทœ๋ชจ์™€ ์„ฑ๋Šฅ์œผ๋กœ ํŒŒ์žฅ์„ ์ผ์œผํ‚ค๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๐ŸŒŸ

๐Ÿง‘ ์ฝ”๋“œ : https://github.com/likejazz/llama3.np

๐Ÿ” 24K GPU, 15T ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ, 10M ๋ช…๋ น ๋ฐ์ดํ„ฐ ๋ฐ 1.3M GPU ์‹œ๊ฐ„์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ทธ ์ˆ˜์น˜๋Š” ์ •๋ง ์••๋„์ ์ž…๋‹ˆ๋‹ค. GQA๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋กœ ์ „ํ™˜ํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋ชจ๋ธ ๊ตฌ์กฐ๋Š” Llama 2์—์„œ ๋ณ€๊ฒฝ๋˜์ง€ ์•Š์•„ ์นœ์ˆ™ํ•˜๋ฉด์„œ๋„ ๊ฐ•๋ ฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿง  ์ดํ•ด๋ฅผ ๋•๊ธฐ ์œ„ํ•ด ์ €์ž๋Š” NumPy๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •ํ™•ํ•˜๊ฒŒ ๊ตฌํ˜„ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Andrej Karpathy๊ฐ€ ํ›ˆ๋ จํ•œ stories15M ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ๋ณด๋‹ค ์ง๊ด€์ ์ธ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์œ„ํ•ด NumPy ์••์ถ• ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Karpathy๊ฐ€ ํ›ˆ๋ จํ•œ Llama 2 ๋ชจ๋ธ์„ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ ‘๊ทผ ๋ฐฉ์‹์˜ ๋ช…ํ™•์„ฑ๊ณผ ์ •๋ฐ€๋„๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋™์•ˆ ๊ณ„์† ์ง€์ผœ๋ด ์ฃผ์‹ญ์‹œ์˜ค.

๐Ÿ“Š GQA๋ฅผ ์ฝ”๋“œ์— ํ†ตํ•ฉํ•˜๋Š” ๋™์•ˆ ์ž‘์„ฑ์ž๋Š” GQA๋ฅผ ๋ชจ๋ธ ๋™์ž‘์— ์ ์šฉํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด NumPy๋ฅผ ์›ํ™œํ•˜๊ฒŒ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ˜์‹ ์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์— ๋Œ€ํ•œ ๋” ๋งŽ์€ ํ†ต์ฐฐ๋ ฅ์„ ๊ณ„์† ์ง€์ผœ๋ด ์ฃผ์‹ญ์‹œ์˜ค!

This post is licensed under CC BY 4.0 by the author.