GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts
November 18, 2024 Β· Declared Dead Β· π ACM Multimedia
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng
arXiv ID
2411.11435
Category
cs.CV: Computer Vision
Citations
2
Venue
ACM Multimedia
Last Checked
3 months ago
Abstract
Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, this specific task has received limited attention, often overshadowed by broader layout generation tasks such as document or poster design. In this paper, we propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts by integrating multi-modal inputs with user-defined constraints, enabling more flexible and robust layout generation for real-world applications. We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously, without compromising performance. To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets. In addition to geometric annotations (\textit{e.g.}, text masks and character recognition), our datasets include detailed layout descriptions in natural language, enabling the model to reason more effectively in handling complex designs and custom user inputs. Experimental results demonstrate the effectiveness of our proposed framework and datasets, outperforming existing methods on various benchmarks that assess geometric aesthetics and human preferences.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computer Vision
π
π
Old Age
π
π
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
π
π
Old Age
SSD: Single Shot MultiBox Detector
π
π
Old Age
Squeeze-and-Excitation Networks
π
π
Old Age
Fast R-CNN
π
π
Old Age
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted