Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees

Abstract

Graph representation learning is a ubiquitous task in machine learning where the goal is to embed each vertex into a low-dimensional vector space. We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution: the bipartite graph is assumed to be generated by a semiparametric exponential family distribution, whose parametric component is given by the proximity of outputs of two one-layer neural networks that take high-dimensional features as inputs, while nonparametric (nuisance) component is the base measure. In this setting, the representation learning problem is equivalent to recovering the weight matrices, and the main challenges of estimation arise from the nonlinearity of activation functions and the nonparametric nuisance component of the distribution. To overcome these challenges, we propose a pseudo-likelihood objective based on the rank-order decomposition technique and show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate. Moreover, we prove that the sample complexity of the problem is linear in dimensions (up to logarithmic factors), which is consistent with parametric Gaussian models. However, our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.

Publication
International Conference on Machine Learning
Sen Na
Sen Na
Postdoc in Statistics and ICSI

Sen Na is a postdoctoral scholar in the Statistics department and ICSI at UC Berkeley. His research interests broadly lie in the mathematical foundations of data science, with topics including high-dimensional statistics, graphical models, semiparametric models, optimal control, and large-scale and stochastic nonlinear optimization. He is also interested in applying machine learning methods in biology, neuroscience, and engineering.

Yuwei Luo
Yuwei Luo
MS Student (2019-2020)

Yuwei Luo received his M.S. degree in Statistics at University of Chicago in March 2020. Prior to graduate school, he received B.S.degree in Mathematics at University of Science and Technology of China (USTC) in June 2018. His research interests include reinforcement learning, control, optimization and network analysis. Yuwei is continuing his education as a PhD student at Stanford University.

Zhuoran Yang
Zhuoran Yang
Assistant Professor of Statistics and Data Science

Zhuoran Yang is an Assistant Professor of Statistics and Data Science at Yale University. His research interests lie in the interface between machine learning, statistics and optimization. The primary goal of his research is to design efficient learning algorithms for large-scale decision making problems that arise in reinforcement learning and stochastic games, with both statistical and computational guarantees.

Zhaoran Wang
Zhaoran Wang
Assistant Professor in Industrial Engineering & Management Sciences

Zhaoran Wang is an Assistant Professor in the Departments of Industrial Engineering & Management Sciences and Computer Science at Northwestern University. His research is to develop a new generation of data-driven decision-making methods, theory, and systems, which tailor artificial intelligence towards addressing pressing societal challenges.

Mladen Kolar
Mladen Kolar
Associate Professor of Econometrics and Statistics

Mladen Kolar is an Associate Professor of Econometrics and Statistics at the University of Chicago Booth School of Business. His research is focused on high-dimensional statistical methods, graphical models, varying-coefficient models and data mining, driven by the need to uncover interesting and scientifically meaningful structures from observational data.