Post

The release of NVIDIA NIM on Hugging Face Inference Endpoints

Yesterday at COMPUTEX, Jensen Huang announced the release of NVIDIA NIM on Hugging Face Inference Endpoints!

๐Ÿš€ NVIDIA NIM are inference services designed to streamline and accelerate the deployment of generative AI models. ๐Ÿ‘€

  • 1๏ธโƒฃ 1-click deployment from Hugging Face Hub to Inference Endpoints
  • ๐Ÿ†• Starting with Llama 3 8B and Llama 3 70B on AWS, GCP
  • ๐Ÿš€ Up to 9000 tokens/sec for Llama 3 8B
  • ๐Ÿ”œ Adding Mixtral 8x22B, Phi-3, and Gemma and more soon

Learn More: https://developer.nvidia.com/blog/nvidia-collaborates-with-hugging-face-to-simplify-generative-ai-model-deployments

Translate to Korean

์–ด์ œ COMPUTEX์—์„œ Jensen Huang ๋Š” Hugging Face ์ถ”๋ก  ์—”๋“œํฌ์ธํŠธ์— ๋Œ€ํ•œ NVIDIA NIM์˜ ์ถœ์‹œ๋ฅผ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค!

๐Ÿš€ NVIDIA NIM์€ ์ƒ์„ฑํ˜• AI ๋ชจ๋ธ์˜ ๋ฐฐํฌ๋ฅผ ๊ฐ„์†Œํ™”ํ•˜๊ณ  ๊ฐ€์†ํ™”ํ•˜๋„๋ก ์„ค๊ณ„๋œ ์ถ”๋ก  ์„œ๋น„์Šค์ž…๋‹ˆ๋‹ค. ๐Ÿ‘€

  • 1๏ธโƒฃ Hugging Face Hub์—์„œ Inference Endpoints๋กœ 1-ํด๋ฆญ ๋ฐฐํฌ
  • ๐Ÿ†• AWS์˜ Llama 3 8B ๋ฐ Llama 3 70B๋ถ€ํ„ฐ GCP
  • ๐Ÿš€ ์ตœ๋Œ€ 9000 ํ† ํฐ/์ดˆ (Llama 3 8B)
  • ๐Ÿ”œ Mixtral 8x22B, Phi-3 ๋ฐ Gemma ๋“ฑ์ด ๊ณง ์ถ”๊ฐ€๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

์ž์„ธํžˆ๋ณด๊ธฐ: https://developer.nvidia.com/blog/nvidia-collaborates-with-hugging-face-to-simplify-generative-ai-model-deployments

This post is licensed under CC BY 4.0 by the author.