「它能做你让它做的任何事。可以当老师,帮你看小孩;可以遛狗、修剪草坪、买杂货;还能做你的朋友,给你端茶送水。无论你能想到什么,它都能去做。」
在上个月的股东大会上,他更是兴奋地表示:「一旦 AI 和机器人成熟,我们甚至可以把全球经济扩大十倍甚至百倍。Optimus 大规模应用就是那个无限增益的秘诀。也许到了那时候,『金钱』这种东西都变得多余。」
Google 这一次王者归来,震感甚至直接传导到了竞争对手的神经中枢。据 The Information 报道,面对 Google 步步紧逼的攻势,OpenAI CEO Sam Altman 本周一紧急在内部备忘录中宣布公司进入「红色警戒(code red)」状态,准备调动一切战略资源对 ChatGPT 的能力进行大幅升级。
据 The Verge 援引知情人士消息称,OpenAI 计划最早于下周初发布 GPT-5.2 模型, 这一时间表较原定的 12 月下旬计划大幅提前。
Logan Kilpatrick: 太赞了!这简直是 AI Studio 的完美宣传点,我们会把这段剪辑出来发布到网上。你刚才提到的一个重要话题是,在 Gemini 3 发布之际,我们同步推出了 Google Anti-gravity 平台。从模型角度来看,你认为这种产品架构对提升模型质量的重要性有多大?显然,这和工具调用、编码能力息息相关。
就像 Gemini、AI Studio 一样,Anti-gravity 平台也是如此。这些产品能让我们与用户紧密相连,获取真实的反馈信号,这是巨大的财富。Anti-gravity 平台作为我们的关键发布合作伙伴,虽然加入时间不长,但在过去两三周的发布筹备中,它的反馈起到了决定性作用。
搜索 AI 模式(AI Mode)也是如此,我们从那里获得了大量反馈。基准测试能帮助我们推动科学、数学等领域的智能提升,但了解现实世界的使用场景同样重要,模型必须能解决实际问题。
Gemini 3,一款全 Google 团队协作的模型
Logan Kilpatrick: 在你担任新任首席 AI 架构师后,你的职责不仅是确保我们拥有优秀的模型,还要推动产品团队将模型落地,在 Google 的所有产品中打造出色的用户体验。 Gemini 3 在发布当天就同步登陆 Google 所有产品端,这对用户来说是巨大的惊喜,也希望未来能覆盖更多产品。从DeepMind 的角度来看,这种跨团队协作是否增加了额外的复杂性?毕竟一年半前,事情可能还简单得多。
Koray Kavukcuoglu: 但我们的目标是构建智能,对吧?很多人问我,身兼 CTO 和首席 AI 架构师两个职位,会不会有冲突,但对我来说,这两个角色本质上是一致的。
要构建智能,就必须通过产品与用户的联动来实现。我的核心目标是确保 Google 的所有产品都能用上最先进的技术。我们不是产品团队,而是技术开发者,我们负责研发模型和技术,当然,我们也会对产品有自己的看法,但最重要的是,以最佳方式提供技术支持,与产品团队合作,在 AI 时代打造最优秀的产品。
这是一个全新的时代,新技术正在重新定义用户期望、产品行为和信息传递方式。因此,我希望能在 Google 内部推动这种技术赋能,与所有产品团队合作。这不仅对产品和用户有益,对我们自身也至关重要。
团队基于 Gemini 3.0 Pro 的架构,结合第一代模型的经验,通过扩大模型规模、优化调优方式,打造出了更强大的图像生成模型,这很合理。它的核心优势在于处理复杂场景:比如输入大量复杂文档,模型不仅能回答相关问题,还能生成对应的信息图表,而且效果很好。这就是输入多模态与输出多模态自然融合的体现,非常棒。
我们有幸生活在这个时代,很多人曾为 AI 或自己热爱的领域奋斗一生,希望能见证技术爆发,但这一切现在真的发生了。AI 的崛起不仅得益于机器学习和深度学习的进步,还离不开硬件、互联网和数据的发展,这些因素共同促成了今天的局面。所以,我既为自己选择了 AI 领域而自豪,也为能身处这个时代而感到幸运。这真的太令人兴奋了。
我可以肯定地说,20 年后,我们现在使用的大语言模型(LLM)架构肯定会被淘汰。所以,持续探索新方向是正确的选择。 Google DeepMind、 Google 研究院,以及整个学术研究社区,都需要共同推进多个领域的探索。
我认为,不必纠结于「什么是对的、什么是错的」,真正重要的是技术在现实世界中的能力和表现。
Logan Kilpatrick: 最后一个问题:我个人在 Google 的第一年多时间里,感受到了一种「 Google 逆袭」的氛围。尽管 Google 拥有强大的基础设施优势,但在 AI 领域,我们似乎一直在追赶。比如在 AI Studio 的早期阶段,我们没有用户(后来增长到3万人),没有收入,Gemini 模型也处于早期阶段。
而现在,随着 Gemini 3 的发布,我最近收到了很多来自生态系统各方的反馈,人们似乎终于意识到「 Google 的AI时代已经到来」。你是否也有过这种「逆袭」的感受?你相信我们能走到今天吗?对于团队来说,这种角色的转变会带来什么影响?
Koray Kavukcuoglu: 在大语言模型(LLM)的潜力逐渐显现时,我坦诚地说,我既认为 DeepMind 是前沿 AI 实验室,也意识到我们作为研究人员,在某些领域的投入还不够,这对我来说是一个重要的教训:我们必须拓宽探索范围,创新至关重要,而不是局限于某一种架构。
AI 主管 John Giannandrea 的「退休」,跟苹果在生成式 AI 领域的一连串失误有关。不仅底层的 Apple Intelligence 平台架构饱受延期和功能不佳的困扰,上层产品 Siri 的所谓「2.0 版」大规模改进计划,也落后了大约一年半。目前苹果计划与谷歌的合作来填补能力空白。
在高管层动荡的同时,苹果的工程师团队也在经历人才流失,特别是在 AI 领域。 Meta、OpenAI 和各种初创公司正在疯狂挖苹果软件/硬件工程的墙角。这让苹果试图追上 AI 浪潮变得难上加难。
曾负责 Siri 的 Robby Walker 在去年十月离开公司;他的继任者 Ke Yang 在这个职位上只待了几周就离职,加入了 Meta 新成立的超级智能实验室。
AI 模型主管 Ruoming Pang 的离职更是引发了连锁反应,他和 Tom Gunter、Frank Chu 等同事一起去了 Meta——当时,Meta 号称开出上亿美元的年包从苹果、OpenAI 等公司挖人。当时,苹果的 AI 组织士气严重低落,几周内就跳槽了十几位优秀的 AI 研究员。 苹果越来越多地使用外部 AI 技术,比如谷歌的 Gemini,也让从事大语言模型工作的员工感到担忧。
苹果的 AI 机器人软件团队,前不久也经历了大规模离职,包括其负责人 Jian Zhang,他同样加入了 Meta。
提示词:The video is shot using a fisheye lens, giving a distorted, wide-angle view of an urban street scene at night in front of a store with a sign reading “DELI • GROCERY • ATM” (English). The lighting is dim, with red neon reflecting off wet pavement. The musical sound is slow, ominous industrial bass with distant sirens. The camera focuses on a tall figure wearing a cracked, porcelain doll mask and a heavy trench coat, looming over the lens. Behind him, two figures in black hoodies stand motionless near the store entrance. The masked figure leans uncomfortably close to the fisheye lens, whispering hoarsely: “Midnight tick, the shadows don’t sleep. Price on the head, and the secrets we keep. You saw the sign, but you didn’t read the print. One wrong step, and you vanish in a tint.” The figure slowly raises a gloved hand to cover the camera lens as the screen fades to black.
坦白说,最终呈现的成片质感大大超出了我的心理预期。
拍我AI V5.5 展现出的剪辑非常老练,它懂得如何在不同景别间流畅切换,避免了 AI 视频生成中常见的时空割裂感,让画面流转显得合乎逻辑。
当然,现阶段的 AI 还无法做到百分之百的完美。比如在处理最后那个极具张力的鱼眼镜头人物对白时,面部细节仍显露出些许破绽。但它在动态物理规律的遵循上守住了底线,整体瑕不掩瑜,成片的完成度和可用性依然处于高水准。
Cast your mind back to when you learned to drive, ride a bike, speak a foreign language, perform a tracheostomy, or acquire any other skill. Wasn’t confidence the key to your success? Whatever we do in life, confidence is always critical. If you run a business, one of the metrics that are likely to be collected is confidence in your business, as that’s such an important economic indicator. Confidence is every bit as important in computing.
Over the last few weeks I’ve been discovering problems that have been eroding confidence in macOS. From text files that simply won’t show up in Spotlight search, to Clock timers that are blank and don’t function, there’s one common feature: macOS encounters an error or fault, but doesn’t report that to the user, instead just burying it deep in the log.
When you can spare the time, the next step is to contact Apple Support, who seem equally puzzled. You’re eventually advised to reinstall macOS or, in the worst case, to wipe a fairly new Apple silicon Mac and restore it in DFU mode, but have no reason to believe that will stop the problem from recurring. You know that Apple Support doesn’t understand what’s going wrong, and despite the involvement of support engineers, they seem as perplexed as you.
One reason for this is that macOS so seldom reports errors, and when it does, it’s uninformative if not downright misleading. Here’s a small gallery of examples I’ve encountered over the last few years, to bring back unhappy memories.
Maybe you saved an important webpage in Safari 26.1 using its Web Archive format, then a couple of days later discovered you couldn’t open it. There’s no error message, just a blank window, so you try again with the same result. Another site shows the same problem, forcing you to conclude that it’s a bug in Safari. Are you now going to devote your time to obtaining sufficient information to report that to Apple using Feedback? Or to contact Apple Support and pursue its escalation to an engineer who might fortuitously discover the cause?
Silent failures like these are least likely to be reported to Apple. In most cases, we find ourselves a workaround, here to abandon Web Archives and switch to saving webpages as PDF instead. When someone else mentions they too have the same problem, we advise them that Web Archives are broken, and our loss of confidence spreads by contagion.
Honest and understandable error reporting is essential to confidence. It enables us to tackle problems rather than just giving up in frustration, assuming that it’s yet another feature we used to rely on that has succumbed in the rush to get the next version of macOS out of the door.
Eroding confidence is also a problem that the vendors of AI appear to have overlooked, or at least seriously underestimated. It’s all very well using the euphemism of hallucination to play down the severity of errors generated by LLMs. But those can only cause users to lose confidence, no matter how ‘intelligent’ you might think your AI is becoming. Go talk to the lawyers who have been caught out by courts submitting AI fabrications whether they still have full confidence in your product.
希望年底的《疯狂动物城2》、《阿凡达3》能够再续前作的辉煌,票房口碑双丰收。希望明年的《蜘蛛侠brand new day》和《复仇者联盟5》能延续今年《Thunderbolts》的质量,重振漫威电影宇宙和超级英雄题材。希望《奥德赛》、《沙丘3》、《马里奥银河》、《玩具总动员5》等明年的大片都能成功,尽量把某些德不配位的电影票房排名往后挤一挤。
最近AI相关企业的财报都非常好看,行业对AI算力的需求也越来越大,suno v5、sora2、gemini3的表现也都惊艳了互联网,我相信AI泡沫短时间内还不会破裂。但希望AI在高速发展的同时,不要那么快取代普通人的工作岗位,而是成为帮助普通人提高生产力和自身价值的工具。希望硅谷的精英能多反思技术进步的双面性,在追求生产力、效率的同时,也要关心价值理性,在产品中融入更多的人文关怀,真正做到don’t be evil,不要开发个大语言模型就把自己当成造物主了。
One of the most common reasons for looking in the log is when an error occurs and isn’t reported sufficiently. It’s also probably the most common reason for frustration with the log, when you can’t identify the error you were looking for. This article explains why the log may not be a good place to identify the cause of errors.
Claude conned
Perhaps the best illustration of the difficulties faced by those using the log to investigate errors is in Claude’s attempt to diagnose problems with the Clock app.
First, it came across what it classed as a memory allocation error in the entry 00.968273 error com.apple.runningboard [app[application.com.apple.clock.1152921500311884024.1152921500311884029(501)]:1921] Memorystatus failed with unexpected error: Invalid argument (22)
Then it found and misinterpreted a cryptic entry from the kernel that also referred to memory 10.891949 kernel Clock[19237] triggered unnest of range 0x1e8000000->0x1ea000000 of DYLD shared region in VM map 0x5c946da0d472dbbf. While not abnormal for debuggers, this increases system memory footprint until the target exits.
It continued by misreading perfectly normal sequences of entries made by RunningBoard and FrontBoard, involving jargon such as assertion, as pathological cycles. Like someone who had skimmed quickly through a complex detective novel, Claude then jumped to the wrong conclusions.
Riddled with errors
Perfectly normal logs are full of errors, the great majority being expected or benign, and surprisingly few turn out to be reflected in what actually occurs. To demonstrate this, I took a log extract with a total of 25,159 entries excluding Signposts and found that 820 of them contained the word error in their message. So you can expect around 3% of all log entries to mention errors.
This is reversed when you look for entries classed as Error or Fault, which are usually rare and seldom contain information relevant to a problem you’re investigating. This is because many significant abnormal conditions and events are reported in entries that aren’t classified as Error or Fault, and often don’t include the word error in the message.
Process killed
The real error that Claude didn’t find (possibly because it wasn’t included in the submitted log extract) occurred when a key process, mobiletimerd, exceeded its memory allowance, so was killed. The diagnostic sequence of log entries for that ran: 03.099138 kernel process mobiletimerd [19118] crossed memory high watermark (15 MB); EXC_RESOURCE
03.099148 kernel memorystatus: mobiletimerd [19118] exceeded mem limit: InactiveHard 15 MB (fatal)
03.100180 kernel mobiletimerd[19118] Corpse allowed 1 of 5
03.100567 kernel 54578.846 memorystatus: killing_specific_process pid 19118 [mobiletimerd] (per-process-limit 0 0s rf:- type:daemon) 15360KB - memorystatus_available_pages: 1327431
03.100665 com.apple.opendirectoryd PID: 19118, Client: 'mobiletimerd', exited with 0 session(s), 0 node(s) and 0 active request(s)
03.100679 gui/501/com.apple.mobiletimerd [19118] exited with exit reason (namespace: 1 code: 0x7) - JETSAM_REASON_MEMORY_PERPROCESSLIMIT, ran for 110ms
03.100708 gui/501 [100015] service inactive: com.apple.mobiletimerd
To the knowledgeable human, that reads clearly, but doesn’t include general terms like error, so could well be lost on AI.
Process failed
This example should be more readily accessible in the log, but could be overlooked. This occurred when a Spotlight service tried to extract content from a text file that started with certain characters such as LG, resulting in an indexing failure: 30.946740 mdwrite Decoding error: Error Domain=NSCocoaErrorDomain Code=4864 UserInfo={NSDebugDescription=[private]} for [private]
30.951004 mds Decoding error: Error Domain=NSCocoaErrorDomain Code=4864 UserInfo={NSDebugDescription=[private]} for [private]
Error code 4864 is NSCoderReadCorruptError, implying that the presence of characters at the start of a text file may be triggering a bug in RichText.mdimporter, the importer module shipped in macOS that’s responsible for indexing plain text files.
Process halted
My third and final example comes from an examination of why Safari was failing to load and display a webarchive, and illustrates how macOS privacy and security features can halt a process that would otherwise complete successfully.
For Safari to load the main frame, it needed to obtain PolicyForNavigationAction approval. What happened is: 01.154639 com.apple.WebKit Loading Safari WebKit 0x14c19b818 - [pageProxyID=21, webPageID=22, PID=596] WebPageProxy::decidePolicyForNavigationAction: listener called: frameID=24, isMainFrame=1, navigationID=26, policyAction=0, safeBrowsingWarning=0, isAppBoundDomain=0, wasNavigationIntercepted=0
01.154642 com.apple.WebKit Loading Safari WebKit 0x14c19b818 - [pageProxyID=21, webPageID=22, PID=596] WebPageProxy::receivedNavigationActionPolicyDecision: frameID=24, isMainFrame=1, navigationID=26, policyAction=0
01.154666 com.apple.WebKit Loading Safari WebKit 0x14c19b818 - [pageProxyID=21, webPageID=22, PID=596] WebPageProxy::isQuarantinedAndNotUserApproved: failed to initialize quarantine file with path.
01.154666 com.apple.WebKit Loading Safari WebKit 0x14c19b818 - [pageProxyID=21, webPageID=22, PID=596] WebPageProxy::receivedNavigationActionPolicyDecision: file cannot be opened because it is from an unidentified developer.
01.154799 Error Safari Safari Web view (pid: 596) did fail provisional navigation (Error Domain=NSURLErrorDomain Code=-999 "(null)")
What should have happened instead is that the decision was approval: 00.740168 com.apple.WebKit 0xa4bda0718 - [pageProxyID=19, webPageID=20, PID=1035] WebPageProxy::decidePolicyForNavigationAction: listener called: frameID=4294967298, isMainFrame=1, navigationID=25, policyAction=Use, isAppBoundDomain=0, wasNavigationIntercepted=0
00.740172 com.apple.WebKit 0xa4bda0718 - [pageProxyID=19, webPageID=20, PID=1035] WebPageProxy::receivedNavigationActionPolicyDecision: frameID=4294967298, isMainFrame=1, navigationID=25, policyAction=Use
00.740233 com.apple.WebKit 0xa4bda0718 - [pageProxyID=19, webPageID=20, PID=1035] WebPageProxy::receivedNavigationActionPolicyDecision: Swapping in non-persistent websiteDataStore for web archive.
Although reported in an entry classed as Error for once, its consequences aren’t made clear in subsequent log entries.
Error reporting in macOS
When Apple replaced traditional logs with the Unified log in macOS Sierra, it made it clear that the new log wasn’t intended for advanced users or system administrators, but primarily for engineers. However, no provision was made for significant errors to be reported in any more accessible form. None of my three examples were reported directly to the user, who was left unaware of what had happened, and why.
This failure to report errors to users has only led to more bugs being ill-defined and unreported, and has done Mac users a great disservice by eroding confidence.
Strategy
Identifying the cause of an error using the log has similarities with solving a ‘whodunnit’ detective novel. There’s usually no shortage of suspects and clues, although many of those may prove misleading. Tracing a suspect’s whereabouts can often prove decisive in determining whether they were in the right place at the right time, and sometimes establishing how the crime happened is essential to its solution.
One big difference from detective fiction is being able to establish what is normal, and drawing comparison between a normal record of what should happen for comparison against an abnormal extract can be valuable.
Suggestions:
Obtain a complete log record, without the use of predicates, saved either as a logarchive or a LogUI JSON file. Although you’ll find it easier to work with filtered versions, only a complete record has all the entries you might need.
When possible, compare a ‘normal’ sequence of events with the abnormal record.
Identify and trace subsystems and processes specific to the malfunctioning component(s).
Identify and trace subsystems and processes with controlling roles, including LaunchServices, RunningBoard, TCC and security.
Process IDs can be invaluable when tracing.
Turn detective.
AI future
The Unified log might appear an ideal opportunity for AI approaches, but the reality is that we’re still a long way from achieving reliable interpretation by AI.
One severe limitation that’s often overlooked is that current techniques don’t fare well at the scale required. Analysing even a modest log extract involves well over 250,000 tokens, comparable to assessments made by NovelQA. Whereas human performance with those models exceeds 90%, few AI systems can attain more than 70%, and in some cases fail to reach even 50%.
Maybe one day, but for the moment at least humans are likely to remain best at using the log to identify the cause of errors.
对于 Claude 应用用户来说,长对话不会再被打断了。Claude 会在需要的时候自动总结早期上下文,让对话持续下去。
Anthropic 研究产品管理负责人 Dianne Na Penn 在接受采访时表示:
「我们在 Opus 4.5 的训练过程中提升了对长上下文的整体处理能力,但光有更长的上下文窗口是不够的。知道哪些信息值得记住,同样非常关键。」
这些改进也实现了 Claude 用户长期呼吁的一项功能:「无尽对话」。这功能能够让付费用户在对话超过上下文窗口限制时也不会中断,模型会自动压缩上下文记忆,而不用提醒用户。
Claude for Chrome 也已经向所有 Max 用户开放了,可以让 Claude 直接在浏览器多个标签页之间执行任务。
Claude for Excel 的 Beta 测试范围已经扩展到 Max、Team 和 Enterprise 用户了。
对于能使用 Opus 4.5 的 Claude 和 Claude Code 用户,Anthropic 已经取消了和 Opus 相关的使用上限。
对于 Max 用户和 Team Premium 用户,Anthropic 也提高了整体使用限额,用户可使用的 Opus token 数量与之前使用 Sonnet 时大致相同。随着未来更强模型的出现,配额也会根据情况相应更新。
让模型「更聪明也更省」,Opus 4.5 迎来底层大升级
随着模型变得更聪明,它们能用更少的步骤解决问题:减少反复试错、降低冗余推理、缩短思考过程。
Claude Opus 4.5 和前代模型比,在实现相同甚至更优结果的情况下,用的 tokens 数量明显少了。
Today, The New York Review of Books has published a 10,000-word essay by Ai Xiaoming, one of China’s foremost public intellectuals. Echoing Hayek’s Road to Serfdom, “The Road to Miaoxi” is a plea against unfettered state power, which she sees as increasing in China–and in fact around the world.
It comes in the form of a travelogue retracing the passion of the late Niu Lihua, whose memoir Broken Dreams at Miaoxi describe his twenty-year path of sorrows from labor camp to labor camp in western Sichuan province.
During her travels by car and foot, Ai uncovers the physical remnants of these camps, and reflects on how they were implemented–the lack of rule of law and the overconcentration of state power.
The exercise took weeks of time, making me appreciate the work put into translating by professional Chinese-English translators like Michael Berry at UCLA–not just the literal translation but the interpretation needed so it makes sense for a new audience.
For that, I owe the editing staff at the Review a huge amount of thanks. They found inconsistencies, unclear points, and inelegant phrasing throughout the work. This is an online essay and people sometimes still think that online means some sort of slapdash “blogpost.” In fact, this was more carefully and attentively edited than most print articles I’ve written, for any publication.
The article is behind a paywall but you can access it by creating an account and enjoying one free article, or you can write me for a PDF. It will also be unlocked on the Asia Society’s Chinafile NYRB archive in a month, as part of an agreement with the Review. However, I’d encourage readers to try the NYRB site first–even just registering for a free trial helps the Review, which is a unique, family-run journal. Work like this involves a paid staff and it should be supported in any way possible!