Diffusing Colors: Image Colorization with Text Guided Diffusion

1Lightricks, 2Reichman University

Given a greyscale image, our method can colorize it with a text prompt.

Abstract

The colorization of grayscale images is a complex and subjective task with significant challenges. Despite recent progress in employing large-scale datasets with deep neural networks, difficulties with controllability and visual quality persist.

To tackle these issues, we present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. This integration not only produces colorization outputs that are semantically appropriate but also greatly improves the level of control users have over the colorization process. Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence.

We leverage a pretrained generative Diffusion Model, and show that we can finetune it for the colorization task without losing its generative power or attention to text prompts. Moreover, we present a novel CLIP-based ranking model that evaluates color vividness, enabling automatic selection of the most suitable level of vividness based on the specific scene semantics. Our approach holds potential particularly for color enhancement and historical image colorization.

Method

During training we encode the RGB and grayscale images into the latent space and feed the U-Net with a random convex combination of the two, together with an auto generated caption of the color image, and the timestep 𝑡. At inference time we encode the input grayscale image and iteratively colorize it. Image credit: Unsplash © David Clode.

text based colorization

Colorization using natural language

Our method can colorize a grayscale image using a text prompt. The text prompt can be as simple as a color name, or as complex as a full sentence.

text based colorization
text based colorization

Colorizing old photos

Enhancing existing colors in old photos

Automatically choosing color intensity

We can scale each cold diffusion step to control the color intensity. We visualized the effect of the latent color scaling of scales in range 0.6 to 1.4. Enclosed by a blue frame is the most preferred scale, as predicted by a trained color ranker. We show in a user study that people prefer colors that are in line with the prediction of our color ranker. Image credit: ‘Migrant Mother’ by Dorothea Lange.

ranker