type
status
date
summary
tags
category
slug
icon
password
公众号
关键词
小宇宙播客
小红书
数字人视频号
笔记
Zhipu AI leads global innovation once again: Launches AutoGLM, where a single sentence gets your phone to order takeout, book hotels, and shop for you
Recently, Zhipu AI unveiled its groundbreaking new agent—AutoGLM, truly achieving "one sentence to handle phone operations." This agent not only helps you complete common tasks like ordering takeout and booking hotels, but also simulates human interface interactions, providing users with an unprecedented level of convenience.
Agent AutoGLM: More than just replacing clicks, it simulates human operations
AutoGLM is an AI assistant for phone and web operations, eliminating the need for users to manually tap through tedious steps. Just say what you need, and AutoGLM will quickly and automatically execute all commands like a human would. For example:
- Like and comment on your boss's WeChat Moments post
- Reorder from order history on Taobao
- Book a hotel on Ctrip
- Order takeout on Meituan
- Automatically like and follow on Xiaohongshu
This agent helps users easily complete various phone tasks in situations where operating a phone is inconvenient, such as while cooking, driving, or working.
Core Technical Innovation: AutoGLM Makes Agents More "Human-Like"
Behind AutoGLM lies the technical breakthrough from the Zhipu team, enabling it to go beyond simple operations and flexibly adapt to complex scenarios. Its core innovations include:
- Decoupling of Task Planning and Action Execution
AutoGLM 将“任务规划”(想干什么)和“动作执行”(怎么操作)分开,结合自然语言桥接两者。这种架构让每一步执行更精准,避免误触,从而更好地完成任务。
- Self-Learning and Evolution
AutoGLM 采用“自进化在线课程强化学习框架”,使其能在不断的学习中提升自己,适应多种应用场景,比如在淘宝购物和订酒店时,通过“刷题”式的自我改进机制,确保每次任务的成功率不断提高。
Key Problems Solved by AutoGLM
- Improved Action Execution Precision: Through decoupled design, AutoGLM can accurately click interface elements, reducing misoperations.
- Task Planning Flexibility: The self-evolution learning framework enables more flexible responses in complex tasks, no longer getting "stuck."
Outstanding Performance in Evaluation Benchmarks
AutoGLM demonstrates excellent performance across multiple evaluation benchmarks:
- In the AndroidLab evaluation, AutoGLM's task execution success rate outperforms GPT-4o and Claude-3.5-Sonnet.
- In the WebArena-Lite benchmark, AutoGLM's task success rate improved by approximately 200% compared to GPT-4o, significantly narrowing the gap between humans and AI.
Open Beta Application
Currently, AutoGLM's web capabilities are available to the public, while the mobile version has been opened for beta testing to select Android users. Interested Android users can apply for access through this link to experience the convenience of intelligent living.
Even more exciting, Zhipu AI has partnered with phone manufacturers like Honor, and in the future, more phones will come with AutoGLM built-in, allowing everyone to enjoy the convenience brought by this ultimate AI assistant.
GLM-4-Voice: Emotional Voice Model Brings a New Interactive Experience
In addition to AutoGLM, Zhipu AI also released GLM-4-Voice, a multimodal voice model with emotional understanding and expression capabilities. It achieves seamless text-to-speech conversion, reducing information loss and latency, bringing users a more natural interactive experience. Its core highlights include:
- Emotional Expression: Can simulate various emotions such as happiness, sadness, and fear.
- Adjustable Speech Rate: Enables fast or slow output within the same conversation.
- Multi-language and Multi-dialect Support: Covers Chinese, English, and various regional dialects (such as Cantonese and Chongqing dialect).
- Flexible Input and Real-time Response: Adjusts output based on user commands, supports video calls, and will soon enable an AI assistant that "can see and speak."
GLM-4-Voice models audio tokens at 12.5Hz, ensuring low-latency end-to-end speech generation capabilities.
Open Source Code
Zhipu AI has open-sourced the GLM-4-Voice code. Developers are welcome to check it out on GitHub: GitHub Repository.
The launch of AutoGLM and GLM-4-Voice once again demonstrates Zhipu AI's technical prowess. Looking ahead, we expect these innovative technologies to bring more convenient and intelligent digital experiences to even more people.
上一篇
A 17-Year-Old High School Student's Million-Dollar AI App: Is This the Dawn of a New Era for Independent Developers?
下一篇
StoryMaker: An Open-Source Tool for Generating Personalized Stories from Photos
- 作者:Dr. Charlii
- 链接:https://www.charliiai.com/article/13e00092-b977-81ba-b052-fcac2474d15f
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。








