Post

GPT4o Release !!

๐Ÿคฏ OpenAIโ€™s new multimodal LLM can understand and generate across text, audio, and vision in real-time.

๐Ÿ’ฌ๐Ÿ—ฃ๏ธ๐Ÿ‘€ Here is what we know so far:

Details

  • OpenAI announced a new GPT-4o model, which performs better than GPT-4, but most importantly, itโ€™s going to be free for all users! ๐Ÿš€ โ €
  • Updated interface and PC application with voice control and screen-sharing capabilities have been introduced ๐ŸŽค โ €
  • Natively multimodal: for now, text, images, and voice generation are done by ONE model ๐Ÿ—ฃ๏ธ โ €
  • Developers are not forgotten either, because there will be API support: 2x faster, 50% cheaper, and 5x higher rate limits API vs. GPT4 ๐Ÿ’ฐ โ €
  • Voice mode has been greatly improved: now you can interrupt the model generation at any time, instead of waiting until the end. OpenAI also managed to bring speech generation to the real-time level, and most importantly, to make ChatGPT voice really alive! ๐ŸŽถ โ €
  • There is also an interactive mode, where you can simultaneously share the image from your camera and communicate with ChatGPT about it! ๐Ÿ’ฌ

Summary

  • ๐Ÿ“ฅ Input: Text, Text + Image, Text + Audio, Text + Video, Audio (based on the examples)
  • ๐Ÿ“คOutput: Image, Image + Text, Text, Audio (based on the examples)
  • ๐ŸŒ 88.7% on MMLU; 90.2% on HumanEval
  • ๐ŸŽง < 5% WER for Western European languages in transcription
  • ๐Ÿ–ผ๏ธ 69.1% on MMU; 92.8% on DocVQA
  • โšก Up to 50% cheaper (probably due to tokenization improvements) and 2x faster than GPT-4 Turbo
  • ๐ŸŽค Near real-time audio with 320ms on average, similar to human conversation
  • ๐Ÿ”ก New tokenizer with a 200k token vocabulary (previously 100k vocabulary) leading to 1.1x - 4.4x fewer tokens needed across 20 languages

Blog: https://openai.com/index/hello-gpt-4o/

The multimodal achievement and latency are impressive. ๐Ÿ”ฅ But Iโ€™m not worried about open-source AI. Open Source is stronger than ever and equally good for enterprises and companies use cases where you donโ€™t need a 200ms latency with voice input. โœ…

 GPT4o Released

Translate to Korean

๐Ÿคฏ OpenAI ์˜ ์ƒˆ๋กœ์šด ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM์€ ํ…์ŠคํŠธ, ์˜ค๋””์˜ค, ๋น„์ „์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ดํ•ดํ•˜๊ณ  ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ’ฌ๐Ÿ—ฃ๏ธ๐Ÿ‘€ ์ง€๊ธˆ๊นŒ์ง€ ์šฐ๋ฆฌ๊ฐ€ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒƒ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ž์„ธํžˆ

  • OpenAI๋Š” GPT-4๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚œ ์ƒˆ๋กœ์šด GPT-4o ๋ชจ๋ธ์„ ๋ฐœํ‘œํ–ˆ์ง€๋งŒ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ ๋ชจ๋“  ์‚ฌ์šฉ์ž์—๊ฒŒ ๋ฌด๋ฃŒ๋กœ ์ œ๊ณต๋œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๐Ÿš€ โ €
  • ์Œ์„ฑ ์ œ์–ด ๋ฐ ํ™”๋ฉด ๊ณต์œ  ๊ธฐ๋Šฅ์ด ์žˆ๋Š” ์—…๋ฐ์ดํŠธ๋œ ์ธํ„ฐํŽ˜์ด์Šค ๋ฐ PC ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด ๋„์ž…๐ŸŽค๋˜์—ˆ์Šต๋‹ˆ๋‹ค โ €
  • ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ: ํ˜„์žฌ ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€ ๋ฐ ์Œ์„ฑ ์ƒ์„ฑ์€ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๐Ÿ—ฃ๏ธ๋กœ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. โ €
  • GPT4๐Ÿ’ฐ์— ๋น„ํ•ด 2๋ฐฐ ๋” ๋น ๋ฅด๊ณ , 50% ๋” ์ €๋ ดํ•˜๊ณ , 5๋ฐฐ ๋” ๋†’์€ ์†๋„ ์ œํ•œ API๋ฅผ ์ง€์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐœ๋ฐœ์ž๋„ ์žŠ์ง€ ์•Š์Šต๋‹ˆ๋‹ค โ €
  • ์Œ์„ฑ ๋ชจ๋“œ๊ฐ€ ํฌ๊ฒŒ ๊ฐœ์„ ๋˜์–ด ์ด์ œ ๋ชจ๋ธ ์ƒ์„ฑ์ด ๋๋‚  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ์ง€ ์•Š๊ณ  ์–ธ์ œ๋“ ์ง€ ๋ชจ๋ธ ์ƒ์„ฑ์„ ์ค‘๋‹จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. OpenAI๋Š” ๋˜ํ•œ ์Œ์„ฑ ์ƒ์„ฑ์„ ์‹ค์‹œ๊ฐ„ ์ˆ˜์ค€์œผ๋กœ ๋Œ์–ด์˜ฌ๋ ธ์œผ๋ฉฐ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ ChatGPT ์Œ์„ฑ์„ ์‹ค์ œ๋กœ ์ƒ์ƒํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๐ŸŽถ โ €
  • ์นด๋ฉ”๋ผ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋™์‹œ์— ๊ณต์œ ํ•˜๊ณ  ChatGPT์™€ ์†Œํ†ตํ•  ์ˆ˜ ์žˆ๋Š” ๋Œ€ํ™”ํ˜• ๋ชจ๋“œ๋„ ์žˆ์Šต๋‹ˆ๋‹ค! ๐Ÿ’ฌ โ €

์š”์•ฝ

  • ๐Ÿ“ฅ ์ž…๋ ฅ: ํ…์ŠคํŠธ, ํ…์ŠคํŠธ + ์ด๋ฏธ์ง€, ํ…์ŠคํŠธ + ์˜ค๋””์˜ค, ํ…์ŠคํŠธ + ๋น„๋””์˜ค, ์˜ค๋””์˜ค(์˜ˆ์ œ ๊ธฐ๋ฐ˜)
  • ๐Ÿ“ค์ถœ๋ ฅ: ์ด๋ฏธ์ง€, ์ด๋ฏธ์ง€ + ํ…์ŠคํŠธ, ํ…์ŠคํŠธ, ์˜ค๋””์˜ค(์˜ˆ์ œ ๊ธฐ๋ฐ˜)
  • ๐ŸŒ MMLU์—์„œ 88.7%; HumanEval์—์„œ 90.2%
  • ๐ŸŽง< 5% WER for Western European languages in transcription
  • ๐Ÿ–ผ๏ธ 69.1% on MMU; 92.8% on DocVQA
  • โšก Up to 50% cheaper (probably due to tokenization improvements) and 2x faster than GPT-4 Turbo
  • ๐ŸŽค Near real-time audio with 320ms on average, similar to human conversation
  • ๐Ÿ”ก New tokenizer with a 200k token vocabulary (previously 100k vocabulary) leading to 1.1x - 4.4x fewer tokens needed across 20 languages

Blog: https://openai.com/index/hello-gpt-4o/

The multimodal achievement and latency are impressive. ๐Ÿ”ฅ But Iโ€™m not worried about open-source AI. Open Source is stronger than ever and equally good for enterprises and companies use cases where you donโ€™t need a 200ms latency with voice input. โœ…

This post is licensed under CC BY 4.0 by the author.