FlexiDiffusion: Enhancing User-Controlled Text-to-Image Generation with Layout-Aware Personalization

Abstract

Diffusion models have shown remarkable progress in text-to-image generation, fostering the development of person-alized models that allow users to exercise fine-grained con-trol over the creative process. Such control is essential to enhance user freedom in crafting visually compelling con-tent while ensuring that generated outputs remain faithful to specified themes and details. In this study, we present a novel approach for layout-controllable, personalized dif-fusion that combines two key innovations: a Variational Detail-Aware Feature Extractor and a Dual Layout Control Mechanism. The feature extractor captures intricate details from reference subjects, ensuring high fidelity in the generated images, while the layout control mechanism allows users to embed specific layout constraints directly into the generation process. Through extensive experimentation, both qualitative and quantitative results consistently under-score the model’s superiority in fostering user-driven creativity. To the best of our knowledge, this work is pioneering in enabling users to achieve a new level of personalization, allowing them to ”create anything, anywhere.

Publication
IEEE/CVF Conference on Computer Vision and Pattern Recognition