Post

GPT4o Release !!

🀯 OpenAI’s new multimodal LLM can understand and generate across text, audio, and vision in real-time.

πŸ’¬πŸ—£οΈπŸ‘€ Here is what we know so far:

Details

  • OpenAI announced a new GPT-4o model, which performs better than GPT-4, but most importantly, it’s going to be free for all users! πŸš€ β €
  • Updated interface and PC application with voice control and screen-sharing capabilities have been introduced 🎀 β €
  • Natively multimodal: for now, text, images, and voice generation are done by ONE model πŸ—£οΈ β €
  • Developers are not forgotten either, because there will be API support: 2x faster, 50% cheaper, and 5x higher rate limits API vs. GPT4 πŸ’° β €
  • Voice mode has been greatly improved: now you can interrupt the model generation at any time, instead of waiting until the end. OpenAI also managed to bring speech generation to the real-time level, and most importantly, to make ChatGPT voice really alive! 🎢 β €
  • There is also an interactive mode, where you can simultaneously share the image from your camera and communicate with ChatGPT about it! πŸ’¬

Summary

  • πŸ“₯ Input: Text, Text + Image, Text + Audio, Text + Video, Audio (based on the examples)
  • πŸ“€Output: Image, Image + Text, Text, Audio (based on the examples)
  • 🌐 88.7% on MMLU; 90.2% on HumanEval
  • 🎧 < 5% WER for Western European languages in transcription
  • πŸ–ΌοΈ 69.1% on MMU; 92.8% on DocVQA
  • ⚑ Up to 50% cheaper (probably due to tokenization improvements) and 2x faster than GPT-4 Turbo
  • 🎀 Near real-time audio with 320ms on average, similar to human conversation
  • πŸ”‘ New tokenizer with a 200k token vocabulary (previously 100k vocabulary) leading to 1.1x - 4.4x fewer tokens needed across 20 languages

Blog: https://openai.com/index/hello-gpt-4o/

The multimodal achievement and latency are impressive. πŸ”₯ But I’m not worried about open-source AI. Open Source is stronger than ever and equally good for enterprises and companies use cases where you don’t need a 200ms latency with voice input. βœ…

 GPT4o Released

Translate to Korean

🀯 OpenAI 의 μƒˆλ‘œμš΄ λ©€ν‹°λͺ¨λ‹¬ LLM은 ν…μŠ€νŠΈ, μ˜€λ””μ˜€, 비전을 μ‹€μ‹œκ°„μœΌλ‘œ μ΄ν•΄ν•˜κ³  생성할 수 μžˆμŠ΅λ‹ˆλ‹€.

πŸ’¬πŸ—£οΈπŸ‘€ μ§€κΈˆκΉŒμ§€ μš°λ¦¬κ°€ μ•Œκ³  μžˆλŠ” 것은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

μžμ„Ένžˆ

  • OpenAIλŠ” GPT-4보닀 μ„±λŠ₯이 λ›°μ–΄λ‚œ μƒˆλ‘œμš΄ GPT-4o λͺ¨λΈμ„ λ°œν‘œν–ˆμ§€λ§Œ κ°€μž₯ μ€‘μš”ν•œ 것은 λͺ¨λ“  μ‚¬μš©μžμ—κ²Œ 무료둜 μ œκ³΅λœλ‹€λŠ” κ²ƒμž…λ‹ˆλ‹€! πŸš€ β €
  • μŒμ„± μ œμ–΄ 및 ν™”λ©΄ 곡유 κΈ°λŠ₯이 μžˆλŠ” μ—…λ°μ΄νŠΈλœ μΈν„°νŽ˜μ΄μŠ€ 및 PC μ‘μš© ν”„λ‘œκ·Έλž¨μ΄ λ„μž…πŸŽ€λ˜μ—ˆμŠ΅λ‹ˆλ‹€ β €
  • 기본적으둜 λ©€ν‹°λͺ¨λ‹¬: ν˜„μž¬ ν…μŠ€νŠΈ, 이미지 및 μŒμ„± 생성은 ν•˜λ‚˜μ˜ λͺ¨λΈπŸ—£οΈλ‘œ μˆ˜ν–‰λ©λ‹ˆλ‹€. β €
  • GPT4πŸ’°μ— λΉ„ν•΄ 2λ°° 더 λΉ λ₯΄κ³ , 50% 더 μ €λ ΄ν•˜κ³ , 5λ°° 더 높은 속도 μ œν•œ APIλ₯Ό μ§€μ›ν•˜κΈ° λ•Œλ¬Έμ— κ°œλ°œμžλ„ μžŠμ§€ μ•ŠμŠ΅λ‹ˆλ‹€ β €
  • μŒμ„± λͺ¨λ“œκ°€ 크게 κ°œμ„ λ˜μ–΄ 이제 λͺ¨λΈ 생성이 끝날 λ•ŒκΉŒμ§€ 기닀리지 μ•Šκ³  μ–Έμ œλ“ μ§€ λͺ¨λΈ 생성을 쀑단할 수 μžˆμŠ΅λ‹ˆλ‹€. OpenAIλŠ” λ˜ν•œ μŒμ„± 생성을 μ‹€μ‹œκ°„ μˆ˜μ€€μœΌλ‘œ λŒμ–΄μ˜¬λ ΈμœΌλ©° κ°€μž₯ μ€‘μš”ν•œ 것은 ChatGPT μŒμ„±μ„ μ‹€μ œλ‘œ μƒμƒν•˜κ²Œ λ§Œλ“œλŠ” κ²ƒμž…λ‹ˆλ‹€! 🎢 β €
  • μΉ΄λ©”λΌμ˜ 이미지λ₯Ό λ™μ‹œμ— κ³΅μœ ν•˜κ³  ChatGPT와 μ†Œν†΅ν•  수 μžˆλŠ” λŒ€ν™”ν˜• λͺ¨λ“œλ„ μžˆμŠ΅λ‹ˆλ‹€! πŸ’¬ β €

μš”μ•½

  • πŸ“₯ μž…λ ₯: ν…μŠ€νŠΈ, ν…μŠ€νŠΈ + 이미지, ν…μŠ€νŠΈ + μ˜€λ””μ˜€, ν…μŠ€νŠΈ + λΉ„λ””μ˜€, μ˜€λ””μ˜€(예제 기반)
  • πŸ“€μΆœλ ₯: 이미지, 이미지 + ν…μŠ€νŠΈ, ν…μŠ€νŠΈ, μ˜€λ””μ˜€(예제 기반)
  • 🌐 MMLUμ—μ„œ 88.7%; HumanEvalμ—μ„œ 90.2%
  • 🎧< 5% WER for Western European languages in transcription
  • πŸ–ΌοΈ 69.1% on MMU; 92.8% on DocVQA
  • ⚑ Up to 50% cheaper (probably due to tokenization improvements) and 2x faster than GPT-4 Turbo
  • 🎀 Near real-time audio with 320ms on average, similar to human conversation
  • πŸ”‘ New tokenizer with a 200k token vocabulary (previously 100k vocabulary) leading to 1.1x - 4.4x fewer tokens needed across 20 languages

Blog: https://openai.com/index/hello-gpt-4o/

The multimodal achievement and latency are impressive. πŸ”₯ But I’m not worried about open-source AI. Open Source is stronger than ever and equally good for enterprises and companies use cases where you don’t need a 200ms latency with voice input. βœ…

This post is licensed under CC BY 4.0 by the author.