The statistical distance of conditional distributions is an essential element of generating target data given some data as in video prediction. We establish how the statistical distances between two joint distributions are related to those between two conditional distributions for three popular statistical distances: f-divergence, Wasserstein distance, and integral probability metrics. Such characterization plays a crucial role in deriving a tractable form of the objective function to learn a conditional generator. For Wasserstein distance, we show that the distance between joint distributions is an upper bound of the expected distance between conditional distributions, and derive a tractable representation of the upper bound. Based on this theoretical result, we propose a new conditional generator, the conditional Wasserstein generator. Our proposed algorithm can be viewed as an extension of Wasserstein autoencoders (Tolstikhin et al. 2018) to conditional generation or as a Wasserstein counterpart of stochastic video generation (SVG) model by Denton and Fergus (Denton et al. 2018). We apply our algorithm to video prediction and video interpolation. Our experiments demonstrate that the proposed algorithm performs well on benchmark video datasets and produces sharper videos than state-of-the-art methods.