===== Geometrical Insights for Implicit Generative Modeling =====
//Abstract//: Learning algorithms for implicit generative models can optimize a variety of criteria
that measure how the data distribution differs from the implicit model distribution,
including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy
criterion. A careful look at the geometries induced by these distances on
the space of probability measures reveals interesting differences. In particular, we can
establish surprising approximate global convergence guarantees for the 1-Wasserstein
distance, even when the parametric generator has a nonconvex parametrization.
One particularly striking aspect of this result is that it does not depend on the parametrization of the family $F$. Whether the cost function $C(\theta) = f(G_\theta\small{\#\mu_z})$ is convex or not is irrelevant: as long as the family $F$ and the cost function $f$ are convex with respect to a well-chosen set of curves, the level sets of the cost function $C(\theta)$ will be connected, and there will be a nonincreasing path connecting any starting point $\theta_0$ to a global optimum $\theta^*$.This is only true when the parametrization is itself continuous with respect to the distance between induced distributions. This property is not necessarily easy to establish.