Gemini 2 的原生多模态图片生成

Google 今天发布了 Gemini 2.0 Flash 的多模态图像生成功能。

功能

用 6 张连续的中国传统画风的漫画来讲述买椟还珠的故事，生成文字和图片。

etZCIL3eoD

Gemini 2.0 Flash 现在不仅可以通过聊天生成图像，还可以直接通过对话的方式来编辑图片的局部。

图片中的人物左转 90 度，展示侧面，并且手举起

比如只有头部的照片，可以「扩展成全身照，白色衬衫，黑色裤子」

直接和 Gemini 对话就可以去除水印。

make a line art out of this sketch

give it some base color

add some soft shading, the source of the low light is on the left upper corner

add some background, indoors, fit the environment with the current source of light and shading, use proper angle

make it monochrome greyscale for light novel illustration

在之前介绍过的 Whisk 中大家可能就已经感受到了 Google 生成图像融合图像的能力了，现在 Google 将这个能力更进一步扩展了。

准确来说就是直接在图片中渲染文字。

Gemini 可以直接生成简单的海报，官方的例子就是直接生成生日贺卡。

比如一些人像生成的时候会将脸部修改。