Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies
December 31, 2018 ยท Entered Twilight ยท ๐ arXiv.org
"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, .gitmodules, Dockerfile, LICENSE, README.md, __init__.py, configs, evkit, gibson, habitat-api, habitat-sim, requirements.txt, scripts, submission.sh, tnt
Authors
Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik
arXiv ID
1812.11971
Category
cs.CV: Computer Vision
Cross-listed
cs.AI,
cs.LG,
cs.NE,
cs.RO
Citations
16
Venue
arXiv.org
Repository
https://github.com/alexsax/midlevel-reps
โญ 109
Last Checked
28 days ago
Abstract
How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within a reinforcement learning framework--see Figure 1. This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images. We find that using a mid-level perception confers significant advantages over training end-to-end from scratch (i.e. not leveraging priors) in navigation-oriented tasks. Agents are able to generalize to situations where the from-scratch approach fails and training becomes significantly more sample efficient. However, we show that realizing these gains requires careful selection of the mid-level perceptual skills. Therefore, we refine our findings into an efficient max-coverage feature set that can be adopted in lieu of raw images. We perform our study in completely separate buildings for training and testing and compare against visually blind baseline policies and state-of-the-art feature learning methods.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted