OpenAI's ChatGPT platform just became a whole lot more interactive, with the launch of GPT-4o. This "flagship model" analyzes audio, visual and/or text input, providing answers via a real-time conversation with a very human-sounding AI agent.
OpenAI 的ChatGPT平台刚刚变得更加互动,推出了 GPT-4o。这款“旗舰模型”分析音频、视觉和/或文本输入,通过与一个听起来非常像人类的 AI 代理进行实时对话来提供答案。
Announced this Monday (May 13) at an online launch event hosted by OpenAI CTO Mira Murati, GPT-4o is described as being "a step towards much more natural human-computer interaction." The o in its name stands for "omni."
本周一(5 月 13 日)在由 OpenAI CTO Mira Murati 主持的在线发布活动上宣布,GPT-4o 被描述为“迈向更加自然的人机交互的一步”。其名称中的 o 代表“全能”。
Aimed at delivering higher performance to users of the free service, it's claimed to match the paid GPT-4 Turbo model's performance at processing text and code input, while also being much faster and 50% cheaper in the API (meaning it can be integrated into third-party apps for less money).
旨在为免费服务的用户提供更高性能,据称它在处理文本和代码输入方面与付费 GPT-4 Turbo 模型的性能相匹配,同时速度更快,API 价格便宜 50%(这意味着它可以以更低的价格集成到第三方应用程序中)。
Users start with a simple "Hey, ChatGPT" vocal prompt, receiving a very effervescent spoken response from the agent. Using plain spoken language, the user then submits their query with accompanying text, audio and/or visuals if necessary – the latter can include photos, a live feed from their phone's camera, or pretty much anything else the agent can "see."
用户从一个简单的“嘿,ChatGPT”语音提示开始,从代理人那里得到一个非常活泼的口头回应。用户然后使用简单的口语提交他们的查询,如果必要,还附带文本、音频和/或视觉内容 - 后者可以包括照片、手机摄像头的实时视频,或者代理人可以“看到”的几乎任何其他内容。
When it comes to audio inputs, the AI responds in an average of 320 milliseconds, which the company states is similar to human response time in a human-human conversation. What's more, the system is presently fluent in over 50 languages.
当涉及音频输入时,AI 的平均响应时间为 320 毫秒,公司表示这与人类在人际对话中的响应时间相似。此外,该系统目前能流利地使用超过 50 种语言。
In today's announcement/demonstration, there were no awkward lags in the agent's responses, which definitely packed a lot of human-like emotion – HAL 9000 it was not. Additionally, users were able to interrupt the agent's answers without any disruption to the back-and-forth flow of information.
在今天的公告/演示中,代理的回应没有尴尬的延迟,绝对充满了很多类似人类的情感 - 它不是 HAL 9000。此外,用户能够打断代理的回答,而不会影响信息的来回流动。
Among other things, the demo also saw GPT-4o acting as an interpreter for an Italian-English conversation between two people; helping a person to solve a handwritten algebra equation; analyzing select sections of programming code; and even ad-libbing a bedtime story about a robot.
在其他方面,演示还看到 GPT-4o 充当两个人之间意大利语-英语对话的翻译;帮助一个人解决手写代数方程;分析选择的编程代码部分;甚至即兴讲述一个关于机器人的睡前故事。
Source: OpenAI 来源:OpenAI