Post

Llama 3 implemented in pure NumPy

πŸ¦™ Llama 3 implemented in pure NumPy πŸ‘©β€πŸ”¬

πŸš€ Exciting discovery! Came across a fascinating article on Llama 3 model implemented in NumPy, inspired by @Andrej Karpathy. The Llama 3 model at AI at Meta is making waves with its impressive scale and performance. 🌟

πŸ§‘β€πŸ’» Code : https://github.com/likejazz/llama3.np

πŸ” With 24K GPUs, 15T training data, 10M instruction data, and 1.3M GPU hours, the numbers are truly overwhelming. Despite the transition to using GQA, the model structure remains unchanged from Llama 2, making it a familiar yet powerful framework.

🧠 To enhance understanding, Author are focusing on an accurate implementation using NumPy. Leveraging the stories15M model trained by Andrej Karpathy, we’re converting it to a NumPy compressed format for a more intuitive model structure. Stay tuned as we transform the Karpathy-trained Llama 2 model into executable code, maintaining clarity and precision in our approach.

πŸ“Š While incorporating GQA into our code, Author won’t apply it to model behavior, ensuring a seamless implementation of NumPy for enhanced interpretability. Stay tuned for more insights into this innovative approach!

Translate to Korean

πŸ¦™ 순수 NumPyπŸ‘© πŸ”¬λ‘œ κ΅¬ν˜„λœ 라마 3

πŸš€ ν₯λ―Έ μ§„μ§„ν•œ 발견! @Andrej Karpathyμ—μ„œ μ˜κ°μ„ λ°›μ•„ NumPyμ—μ„œ κ΅¬ν˜„ 된 Llama 3 λͺ¨λΈμ— λŒ€ν•œ ν₯미둜운 기사λ₯Ό λ°œκ²¬ν–ˆμŠ΅λ‹ˆλ‹€. AI at Meta 의 라마 3 λͺ¨λΈμ€ 인상적인 규λͺ¨μ™€ μ„±λŠ₯으둜 파μž₯을 μΌμœΌν‚€κ³  μžˆμŠ΅λ‹ˆλ‹€. 🌟

πŸ§‘ μ½”λ“œ : https://github.com/likejazz/llama3.np

πŸ” 24K GPU, 15T ν›ˆλ ¨ 데이터, 10M λͺ…λ Ή 데이터 및 1.3M GPU μ‹œκ°„μ„ μ‚¬μš©ν•˜λ©΄ κ·Έ μˆ˜μΉ˜λŠ” 정말 μ••λ„μ μž…λ‹ˆλ‹€. GQAλ₯Ό μ‚¬μš©ν•˜κΈ°λ‘œ μ „ν™˜ν–ˆμŒμ—λ„ λΆˆκ΅¬ν•˜κ³  λͺ¨λΈ κ΅¬μ‘°λŠ” Llama 2μ—μ„œ λ³€κ²½λ˜μ§€ μ•Šμ•„ μΉœμˆ™ν•˜λ©΄μ„œλ„ κ°•λ ₯ν•œ ν”„λ ˆμž„μ›Œν¬κ°€ λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

🧠 이해λ₯Ό 돕기 μœ„ν•΄ μ €μžλŠ” NumPyλ₯Ό μ‚¬μš©ν•˜μ—¬ μ •ν™•ν•˜κ²Œ κ΅¬ν˜„ν•˜λŠ” 데 쀑점을 두고 μžˆμŠ΅λ‹ˆλ‹€. Andrej Karpathyκ°€ ν›ˆλ ¨ν•œ stories15M λͺ¨λΈμ„ ν™œμš©ν•˜μ—¬ 보닀 직관적인 λͺ¨λΈ ꡬ쑰λ₯Ό μœ„ν•΄ NumPy μ••μΆ• ν˜•μ‹μœΌλ‘œ λ³€ν™˜ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. Karpathyκ°€ ν›ˆλ ¨ν•œ Llama 2 λͺ¨λΈμ„ μ‹€ν–‰ κ°€λŠ₯ν•œ μ½”λ“œλ‘œ λ³€ν™˜ν•˜μ—¬ μ ‘κ·Ό λ°©μ‹μ˜ λͺ…ν™•μ„±κ³Ό 정밀도λ₯Ό μœ μ§€ν•˜λŠ” λ™μ•ˆ 계속 μ§€μΌœλ΄ μ£Όμ‹­μ‹œμ˜€.

πŸ“Š GQAλ₯Ό μ½”λ“œμ— ν†΅ν•©ν•˜λŠ” λ™μ•ˆ μž‘μ„±μžλŠ” GQAλ₯Ό λͺ¨λΈ λ™μž‘μ— μ μš©ν•˜μ§€ μ•ŠμœΌλ―€λ‘œ 해석 κ°€λŠ₯성을 높이기 μœ„ν•΄ NumPyλ₯Ό μ›ν™œν•˜κ²Œ κ΅¬ν˜„ν•  수 μžˆμŠ΅λ‹ˆλ‹€. 이 ν˜μ‹ μ μΈ μ ‘κ·Ό 방식에 λŒ€ν•œ 더 λ§Žμ€ 톡찰λ ₯을 계속 μ§€μΌœλ΄ μ£Όμ‹­μ‹œμ˜€!

This post is licensed under CC BY 4.0 by the author.