让马斯克变街头痞子、林黛玉穿潮牌，这个AI视频新玩法太上头，手把手教你秒变潮人

爱范儿

By: 张子豪

15 December 2025 at 18:39

给你 30s，介绍一下今天你的 OOTD 吧。

一向温文尔雅、只穿基本款的苹果 CEO Tim Cook，在他的「个人 ID 视频」里，穿上了大号羽绒服，戴着镶钻牙套，对着镜头作出了最狠的 Gangsta 匪帮姿势。

最神来之笔的是，他像掏枪一样掏出了一个……德州仪器计算器。

▲视频来源：https://x.com/ReflctWillie/status/1997819640874205685

很多人看到这个视频都欲罢不能，一镜到底的展示也太过瘾了，一遍遍根本停不下来。视频作者把好莱坞大片级别的运镜语言，套用在一个荒诞的内容上。形式的高级感和内容的滑稽感，让这个 AI 视频没有那些普遍存在的廉价特点，很快在社交媒体上火起来。

立马就又有了马斯克的版本。

▲ 视频来源：https://x.com/VibeMarketer_/status/1999227084250448083

作者很细心的给出了完整的制作流程，通过使用底片印样方式的提示词（Contact Sheet Prompting），来获取到一套 6 张图片，背景一致、人物表情和服装一致，但是动作不同的照片组合。

▲ 3×2 的胶片印样

所谓印样，是最早在胶片时代，摄影里使用的一种缩略图版的照片索引页；现在把这样的概念用在 Nano Banana Pro 里，就是充分利用它的一致性能力，一次性生成一系列风格不同、角度不同的视频截图，然后再通过首尾帧来生成视频。

Nano Banana Pro 最多能一次性生成，包含 9 个以上关键帧的完整印样，每一帧图片都保持了出色的角色、细节和叙事一致性。即便是分别生成，Nano Banana Pro 也能根据上传的参考图片，自动填补图片内容，确保叙事一致性。

▲ 首尾帧视频生成，提示词：一镜到底的拍摄，摄像机平稳且缓慢地推进，聚焦在人物的眼镜上，同时始终将主体保持在画面中。主体的动作极小且谨慎。

有了图片之后，我们就可以通过首尾帧转视频的方式，将这几张图片整合起来，可灵、Veo 3.1、Hailuo、剪映等视频生成模型和工具，都可以轻松做到。

值得注意的是，像 Sora 2 目前是不支持上传这种有真实人脸的图片，马斯克的 Grok Imagine 也仅支持首帧转视频，综合下来，我们还是推荐使用 Google Veo 3.1、剪映里的即梦、还有快手可灵来完成。

▲Grok 图片转视频，默认生成的内容，不明所以

在这位视频博主给出的指南里，他使用了 Nano Banana Pro 和可灵来完成，并且它开发了一整套工具，让我们可以自由地实现各种人物的替换。

▲ 视频来源：https://x.com/ReflctWillie/status/1998720751806066916

根据他分享的工作流，由于这个视频和库克那个基本类似，所以它只需要修改输入的三张图片，以及做一些细微的调整。例如从口袋里掏出来的是 GAME BOY 游戏机，还有更符合这个人物特点的元素，库克是镶嵌着苹果股票代码 AAPL 的大金牙，美联储的主席鲍威尔则是戴上了 FED 的金戒指。

▲项目地址：https://github.com/shrimbly/node-banana

目前他把这个项目放在了知名开源平台 GitHub 上，如果你喜欢自己折腾的话，把项目下载到本地，输入自己的 Gemini API，也可以直接套用这个流程。

我们也尝试了这个自动化的项目，生成了几张图片，相比较在 Gemini 网页或 App 内生成，确实能方便不少。我们不需要反复的上传图片，而是可以直接选择需要使用的图片，直接修改提示词，将整个操作流水线化。

不过，没有 API 也没关系，下面跟着我们的详细步骤，就用 Gemini 网页版一样能做到。

找一张自己的照片，喜欢的潮牌衣服，还有酷炫的眼睛。我们这里用才情高绝、生性孤傲、多愁善感的林妹妹来举例，看看她的 OOTD 时尚大片会是怎么样。

这里我们直接用 Nano Banana Pro 生成了一张林黛玉的照片。

▲提示词：Subject: A hyper-realistic high-fashion portrait of Lin Daiyu from Dream of the Red Chamber. She has a fragile, melancholic beauty, pale skin, and her signature “knitted eyebrows” (frowning slightly). She looks distinctively sorrowful and intellectual. Attire: Wearing exquisite, high-end traditional Qing Dynasty couture (Hanfu style). The fabric is layered translucent silk and organza in pale bamboo-green and moon-white. Intricate embroidery of falling petals. She wears a jade hairpin. Setting: Inside a modern, minimalist professional photography studio. A solid dark grey or textured canvas backdrop. Lighting & Camera: Cinematic studio lighting, Rembrandt lighting to accentuate her cheekbones and mood. Softbox lighting, sharp focus, shot on Hasselblad X2D, 85mm lens. Deep depth of field. Style: Vogue China editorial, ethereal, elegant, sorrowful, oriental aesthetics, avant-garde fashion photography, ultra-detailed texture. 16:9, 4K.

得到角色照片之后，眼镜和外套图片是可选的，如果没有上传，Nano Banana Pro 会自动生成对应的潮牌外套和眼镜。

我们从网上找了一件潮牌夹克外套让她穿上，然后在默认的提示词里面，增加了一些发型控制、妆造和瞧不起这些世俗之物的轻蔑表情等。

默认提示词：Show me a high fashion photoshoot image of the model wearing the oversized jacket and glasses, the image should show the a full body shot of the subject. The model is looking past the camera slightly bored expression and eyebrows raised. They have one hand raised with two fingers tapping the side of the glasses. The setting is a studio environment with a blue background. The model is wearing fashionable, dark grey baggy cotton pants. The jacket is extremely, almost comically oversized on the model.
The image is from a low angle looking up at the subject.
The image is shot on fuji velvia film on a 55mm prime lens with a hard flash, the light is concentrated on the subject and fades slightly toward the edges of the frame. The image is over exposed showing significant film grain and is oversaturated. The skin appears shiny (almost oily), and there are harsh white reflections on the glasses frames.

下一步就是生成所谓 Contact Sheet，输入我们之前得到的外套+眼镜的照片，再输入下面的提示词，我们就能得到一个，人物一致性的多角度分镜。

提示词：
Analyze the input image and silently inventory all fashion-critical details: the subject(s), exact wardrobe pieces, materials, colors, textures, accessories, hair, makeup, body proportions, environment, set geometry, light direction, and shadow quality.
All wardrobe, styling, hair, makeup, lighting, environment, and color grade must remain 100% unchanged across all frames.
Do not add or remove anything.
Do not reinterpret materials or colors.
Do not output any reasoning.

Your visible output must be:

One 2×3 contact sheet image (6 frames).

Then a keyframe breakdown for each frame.

Each frame must represent a resting point after a dramatic camera move — only describe the final camera position and what the subject is doing, never the motion itself.

The six frames must be spatially dynamic, non-linear, and visually distinct.

Required 6-Frame Shot List
1. High-Fashion Beauty Portrait (Close, Editorial, Intimate)

Camera positioned very close to the subject’s face, slightly above or slightly below eye level, using an elegant offset angle that enhances bone structure and highlights key wardrobe elements near the neckline. Shallow depth of field, flawless texture rendering, and a sculptural fashion-forward composition.

2. High-Angle Three-Quarter Frame

Camera positioned overhead but off-center, capturing the subject from a diagonal downward angle.
This frame should create strong shape abstraction and reveal wardrobe details from above.

3. Low-Angle Oblique Full-Body Frame

Camera positioned low to the ground and angled obliquely toward the subject.
This elongates the silhouette, emphasizes footwear, and creates a dramatic perspective distinct from Frames 1 and 2.

4. Side-On Compression Frame (Long Lens)

Camera placed far to one side of the subject, using a tighter focal length to compress space.
The subject appears in clean profile or near-profile, showcasing garment structure in a flattened, editorial manner.

5. Intimate Close Portrait From an Unexpected Height

Camera positioned very close to the subject’s face (or upper torso) but slightly above or below eye level.
The angle should feel fashion-editorial, not conventional — offset, elegant, and expressive.

6. Extreme Detail Frame From a Non-Intuitive Angle

Camera positioned extremely close to a wardrobe detail, accessory, or texture, but from an unusual spatial direction (e.g., from below, from behind, from the side of a neckline).
This must be a striking, abstract, editorial detail frame.

Continuity & Technical Requirements

Maintain perfect wardrobe fidelity in every frame: exact garment type, silhouette, material, color, texture, stitching, accessories, closures, jewelry, shoes, hair, and makeup.

Environment, textures, and lighting must remain consistent.

Depth of field shifts naturally with focal length (deep for distant shots, shallow for close/detail shots).

Photoreal textures and physically plausible light behavior required.

Frames must feel like different camera placements within the same scene, not different scenes.

All keyframes must be the exact same aspect ratio, and exactly 6 keyframes should be output. Maintain the exact visual style in all keyframes, where the image is shot on fuji velvia film with a hard flash, the light is concentrated on the subject and fades slightly toward the edges of the frame. The image is over exposed showing significant film grain and is oversaturated. The skin appears shiny (almost oily), and there are harsh white reflections on the glasses frames.

Output Format
A) 2×3 Contact Sheet Image (Mandatory)

得到六宫格的图片之后，我们需要使用下面的提示词，依次提取出这六张图片。

提示词：Review the grid of six images. I want you to isolate and upscale the image in the first/second/third column of the first/second row of images. Do not change the pose or any details of the model. Only output the single image from the six image grid.

其实 Nano Banana Pro 有能力直接生成九宫格的图片，不过为了保持固定 3:2 的横宽比，六宫格能更好的分离出所有图片，我们这里全部使用 16:9 的大小，以及 4K 画质。

有了这 6 张图片，我们还可以脑洞大开生成更多的关键帧图片，例如原视频中，让库克展示他的金牙、从口袋里掏出一个古早的设备。

例如我们从网上找了一张手镯的图片，让林黛玉展示他的玉手镯，而不是大金表。

▲图 7｜输入：图 3+图 5+玉手镯照片，以及提示词：Show me a wide angle close up of the model.The model is holding one wrist vertically in front of her, The opposite hand is gently pulling down the voluminous sleeve of her clothes robe to display a translucent emerald jade bangle. The hand that is pulling down the sleeve has a silver fashion ring shaped like a fallen flower petal on the last two digits of her hand encrusted into the front face.

如果你想保持这种街头的匪帮风格，可以直接使用默认的提示词，找到一个大金表的图片，然后输入下面的内容。

默认提示词：Show me a wide angle close up of the model.The model is holding one wrist vertically in front of him, the opposite hand is pulling down the sleeve of the hoodie to display the watch. The hand that is pulling down the sleeve has a two finger ring on the last two digits of his hand with the letters ‘LOVE’ encrusted into the front face.

此外，鞋子也换上了带有刺绣的潮牌高帮，既有古代绣花鞋的缎面、花朵刺绣，底下又是那种锯齿状的黑色橡胶厚底。

▲图 8｜输入图 7 + 图 3 +鞋子照片，提示词：Show me a wide angle worms eye view of the model standing, her right foot is extended in front of her, showing she is wearing the shoes in the reference image. Maintain the setting perfectly, include the finger ring on the models hand, and have her foot angled slightly to the side to highlight the detailing of the shoes

最后是从口袋里，掏出了一盒人参养荣丸，这是一个靠着药物维持生命的赛博朋克少女。

▲图9｜输入图 7+图 8 + 药盒照片，提示词：Tight shot of the model reaching into the side of the kangaroo pouch of the hoodie and partially showing the box of pills.

这里只需要修改 showing the box of pills，把 showing（展示）后面的内容，更换成你希望从口袋里拿出来的物品即可。

得到了全部的关键帧图片，接下来我们就是把这些图片串联起来，制作出一个看起来像是一镜到底的酷炫视频。图片转视频也不是完全不需要提示词，想要得到原视频一样的节奏控制，尽量采用流畅的动作和最小的模特移动，是减少抽卡的重要指令。

博主提到，可以在提示词里面输入，像是「镜头缓慢而平稳地围绕眼镜旋转，同时进行变焦。拍摄对象几乎一动不动，动作极其沉稳而深思熟虑。」

像是图 8 和图 9 之间的转换，我们在提示词里面，就增加了腿慢慢放下，镜头垂直上升的文字。

▲Google Veo 3.1 生成｜提示词：Camera Movement (Vertical Scan):
A continuous, seamless vertical crane shot moving upwards. The camera starts low, focused tightly on the embroidered high-top sneakers, then smoothly tilts up and glides along the texture of the grey cargo pants. As the camera rises to waist level, it pushes in (dolly in) towards the green satin jacket.
Subject Action (The Flow):
Start: The subject’s leg (showing the shoe) slowly lowers to a standing position as the camera moves up.
Transition: The subject stands confidently. The hand wearing the butterfly ring moves naturally into the pocket.
End: The hand pulls out a yellow and white medicine box (“Renshen Yangrong Wan”). The focus racks sharply onto the text on the box.
Atmosphere & Consistency:
High-fashion streetwear aesthetic. Hard flash lighting with a blue studio background. Maintain strict consistency of the green sukajan jacket embroidery and the jade bangle. The transition is liquid-smooth, feeling like a single, planned camera move.

你可能会好奇，为什么提示词里面说动作要慢，最后出来的预览视频，给人感觉确实干净利落。其实是用了这位视频博主的另一个工具，不得不佩服现在 AI 视频博主的创意和能力，不仅有好的点子，还能开发好用的工具。

▲地址：https://easypeasyease.vercel.app/，这个工具能对多个视频进行拼接、同时应用缓动曲线和添加音频；目前是免费使用。

通过 EasyPeaseEase 这个工具，我们的视频能够选择压缩到 0.5s-6s 之间，之前通过视频生成模型得到的缓慢动作，经过缓动曲线，让视频从开始到结束，加速或减速过程更平滑、自然，更能模拟真实世界的物理效果，从而让加速后的视频，看起来更生动、有质感，而不是生硬的匀速运动。

最后把这些视频都拼接起来，我们就得到了林妹妹的今日 OOTD 视频展示。

首尾帧转视频的提示词，如果你担心会需要频繁抽卡，直接上传首尾帧图片，问 Gemini 是很有效的方法。

Contact Sheet Prompt，印样表提示词其实是 Nano Banana Pro 非常有意思的一个玩法。先利用 Nano Banana Pro 强大的图片生成和世界知识理解能力，生成一张九宫格的视频关键帧集合，再逐行逐列提取对应的关键帧。

▲视频来源：https://x.com/techhalla/status/1996650389228355819

最后再汇总一波 Nano Banana Pro 的官方使用途径吧。

ai.studio：Google 官方 AI 工作室，需要绑定支付方式，能通过下拉选择不同的分辨率和图片大小，无需提示词控制，按次收费。
gemini.google.com：Gemini 网页版和手机 App，免费生成，有次数上限，达到上限后会自动使用 Nano Banana 模型，最大的特点是不能再控制生成图片的宽高比。
flow.google：Google 的视频生成平台，可以选择生成图片，不消耗积分，免费生成。

文中视频可点击该链接前往查看：https://mp.weixin.qq.com/s/s_EIYB0qqcWv29zMM1g-7Q

#欢迎关注爱范儿官方微信公众号：爱范儿（微信号：ifanr），更多精彩内容第一时间为您奉上。

爱范儿 | 原文链接 · 查看评论 · 新浪微博

Normal view