Combining Label Propagation and Simple Models Out-performs Graph Neural Networks

October 27, 2020 · Entered Twilight · 🏛 International Conference on Learning Representations

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, README.md, diffusion_feature.py, gat, gen_models.py, logger.py, norm_spec.jl, outcome_correlation.py, run_experiments.py

Authors Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, Austin R. Benson arXiv ID 2010.13993 Category cs.LG: Machine Learning Cross-listed cs.SI Citations 311 Venue International Conference on Learning Representations Repository https://github.com/Chillee/CorrectAndSmooth ⭐ 294 Last Checked 1 month ago

Abstract

Graph Neural Networks (GNNs) are the predominant technique for learning over graphs. However, there is relatively little understanding of why GNNs are successful in practice and whether they are necessary for good performance. Here, we show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs by combining shallow models that ignore the graph structure with two simple post-processing steps that exploit correlation in the label structure: (i) an "error correlation" that spreads residual errors in training data to correct errors in test data and (ii) a "prediction correlation" that smooths the predictions on the test data. We call this overall procedure Correct and Smooth (C&S), and the post-processing steps are implemented via simple modifications to standard label propagation techniques from early graph-based semi-supervised learning methods. Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks, with just a small fraction of the parameters and orders of magnitude faster runtime. For instance, we exceed the best known GNN performance on the OGB-Products dataset with 137 times fewer parameters and greater than 100 times less training time. The performance of our methods highlights how directly incorporating label information into the learning algorithm (as was done in traditional techniques) yields easy and substantial performance gains. We can also incorporate our techniques into big GNN models, providing modest gains. Our code for the OGB results is at https://github.com/Chillee/CorrectAndSmooth.