Geometrical Insights for Implicit Generative Modeling

Abstract: Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences. In particular, we can establish surprising approximate global convergence guarantees for the 1-Wasserstein distance, even when the parametric generator has a nonconvex parametrization.

Léon Bottou, Martin Arjovsky, David Lopez-Paz and Maxime Oquab: Geometrical Insights for Implicit Generative Modeling, Braverman Readings in Machine Learning: Key Ideas from Inception to Current State, 229–268, Edited by Ilya Muchnik Lev Rozonoer, Boris Mirkin, LNAI Vol. 11100, Springer, 2018.

Just before section 6.2. the paper claims

One particularly striking aspect of this result is that it does not depend on the parametrization of the family $F$. Whether the cost function $C(\theta) = f(G_\theta\small{\#\mu_z})$ is convex or not is irrelevant: as long as the family $F$ and the cost function $f$ are convex with respect to a well-chosen set of curves, the level sets of the cost function $C(\theta)$ will be connected, and there will be a nonincreasing path connecting any starting point $\theta_0$ to a global optimum $\theta^*$.

This is only true when the parametrization is itself continuous with respect to the distance between induced distributions. This property is not necessarily easy to establish.

