Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity

March 06, 2017 · Entered Twilight · 🏛 The Web Conference

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: LICENSE, README.md, examples, pretrained_models, requirements.txt, score_example.py, utils.py

Authors Jack Hessel, Lillian Lee, David Mimno arXiv ID 1703.01725 Category cs.SI: Social & Info Networks Cross-listed cs.CL, cs.CV, physics.soc-ph Citations 39 Venue The Web Conference Repository https://github.com/jmhessel/catrank ⭐ 11 Last Checked 13 days ago

Abstract

The content of today's social media is becoming more and more rich, increasingly mixing text, images, videos, and audio. It is an intriguing research question to model the interplay between these different modes in attracting user attention and engagement. But in order to pursue this study of multimodal content, we must also account for context: timing effects, community preferences, and social factors (e.g., which authors are already popular) also affect the amount of feedback and reaction that social-media posts receive. In this work, we separate out the influence of these non-content factors in several ways. First, we focus on ranking pairs of submissions posted to the same community in quick succession, e.g., within 30 seconds, this framing encourages models to focus on time-agnostic and community-specific content features. Within that setting, we determine the relative performance of author vs. content features. We find that victory usually belongs to "cats and captions," as visual and textual features together tend to outperform identity-based features. Moreover, our experiments show that when considered in isolation, simple unigram text features and deep neural network visual features yield the highest accuracy individually, and that the combination of the two modalities generally leads to the best accuracies overall.