Key-Locked Rank One Editing for Text-to-Image Personalization
May 02, 2023 ยท Entered Twilight ยท ๐ International Conference on Computer Graphics and Interactive Techniques
"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"
Evidence collected by the PWNC Scanner
Repo contents: LICENSE, README.md, configs, environment.yaml, evaluation, img, ldm, main.py, merge_embeddings.py, models, scripts, setup.py
Authors
Yoad Tewel, Rinon Gal, Gal Chechik, Yuval Atzmon
arXiv ID
2305.01644
Category
cs.CV: Computer Vision
Cross-listed
cs.AI,
cs.GR
Citations
221
Venue
International Conference on Computer Graphics and Interactive Techniques
Repository
https://github.com/rinongal/textual_inversion
โญ 3051
Last Checked
14 days ago
Abstract
Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted