Images that Sound: Composing Images and Sounds on a Single Canvas
May 20, 2024 ยท Entered Twilight ยท ๐ Neural Information Processing Systems
"No code URL or promise found in abstract"
"Derived repo from GitHub Pages (backfill)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, .project-root, LICENSE, README.md, assets, configs, environment.yml, huggingface_login.py, src
Authors
Ziyang Chen, Daniel Geng, Andrew Owens
arXiv ID
2405.12221
Category
cs.CV: Computer Vision
Cross-listed
cs.LG,
cs.MM,
cs.SD,
eess.AS
Citations
16
Venue
Neural Information Processing Systems
Repository
https://github.com/ificl/images-that-sound
โญ 250
Last Checked
8 days ago
Abstract
Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these visual spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space. During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models. Through quantitative evaluations and perceptual studies, we find that our method successfully generates spectrograms that align with a desired audio prompt while also taking the visual appearance of a desired image prompt. Please see our project page for video results: https://ificl.github.io/images-that-sound/
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted