产品中心

ChatGPT rolls out voice and image capabilities

字号+ 作者:87福利影视网 来源:关于我们 2024-09-22 05:39:35 我要评论(0)

Everyone's favorite chatbot can now see and hear and speak. On Monday, OpenAI announced new multimod

Everyone's favorite chatbot can now see and hear and speak. On Monday, OpenAI announced new multimodal capabilities for ChatGPT. Users can now have voice conversations or share images with ChatGPT in real-time.

Audio and multimodal features have become the next phase in fierce generative AI competition. Meta recently launched AudioCraft for generating music with AI and Google Bard and Microsoft Bing have both deployed multimodal features for their chat experiences. Just last week, Amazon previewed a revamped version of Alexa that will be powered by its own LLM (large language model), and even Apple is experimenting with AI generated voice, with Personal Voice.

SEE ALSO:OpenAI expands ChatGPT 'custom instructions' to free users

Voice capabilities will be available on iOS and Android. Like Alexa or Siri, you can tap to speak to ChatGPT and it will speak back to you in one of five preferred voice options. Unlike, current voice assistants out there, ChatGPT is powered by more advanced LLMs, so what you'll hear is the same type of conversational and creative response that OpenAI's GPT-4 and GPT-3.5 is capable of creating with text. The example that OpenAI shared in the announcement is generating a bedtime story from a voice prompt. So, exhausted parents at the end of a long day can outsource their creativity to ChatGPT.

Mashable Light SpeedWant more out-of-this world tech, space and science stories?Sign up for Mashable's weekly Light Speed newsletter.By signing up you agree to our Terms of Use and Privacy Policy.Thanks for signing up!

Multimodal recognition is something that's been forecasted for a while, and is now launching in a user-friendly fashion for ChatGPT. When GPT-4 was released last March, OpenAI showcased its ability to understand and interpret images and handwritten text. Now it will be a part of everyday ChatGPT use. Users can upload an image of something and ask ChatGPT about it — identifying a cloud, or making a meal plan based on a photo of the contents of your fridge. Multimodal will be available on all platforms.

As with any generative AI advancement, there are serious ethics and privacy issues to consider. To mitigate risks of audio deepfakes, OpenAI says it is only using its audio recognition technology for the specific "voice chat" use case. Also, it was created with voice actors they have "directly worked with." That said, the announcement doesn't mention whether users' voices can be used to train the model, when you opt in to voice chat. For ChatGPT's multimodal capabilities, OpenAI says it has "taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy." But the real test of nefarious uses won't be known until it's released into the wild.


Related Stories
  • OpenAI just revealed DALL-E 3, its newest image generator
  • OpenAI releases new teacher guide for ChatGPT in classrooms
  • OpenAI violated EU privacy and transparency law, complaint alleges
  • Crypto bot network powered by ChatGPT uncovered on X

Voice chat and images will roll out to ChatGPT Plus and Enterprise users in the next two weeks, and to all users "soon after."

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

相关文章
  • 两个改造提升项目进入收尾阶段

    两个改造提升项目进入收尾阶段

    2024-09-22 05:10

  • 春风化雨润心田  文明新风遍雅州

    春风化雨润心田 文明新风遍雅州

    2024-09-22 04:52

  • PPP leader urges return of top envoy to Australia amid corruption probe

    PPP leader urges return of top envoy to Australia amid corruption probe

    2024-09-22 04:04

  • HMD announces a Barbie flip phone

    HMD announces a Barbie flip phone

    2024-09-22 03:53

网友点评