leon.bottou.org

On the Effective VC Dimension.

Abstract: The very idea of an effective Vapnik Chervonenkis (VC) dimension (Vapnik et al, 1994) relies on the hypothesis that the relation between the generalization error and the number of training examples can be expressed by a formula algebraically similar to the VC bound. This hypothesis calls for a serious discussion since the traditional VC bound widely overestimates the generalization error. In this paper we describe an algorithm and data dependent measure of capacity. We derive a confidence interval on the difference between the training error and the generalization error. This confidence interval is much tighter than the traditional VC bound. A simple change of the formulation of the problem yields this extra accuracy: our confidence interval bounds the error difference between a training set and a test set, rather than the error difference between a training set and some hypothetical grand truth. This transductive approach allows for deriving a data and algorithm dependent confidence interval.

Léon Bottou, Corinna Cortes and Vladimir Vapnik: On the Effective VC Dimension., Neuroprose (bottou.effvc.ps.Z), 1994.

ftp://archive.cse.ohio-state.edu/pub/neuroprose/bottou-effvc.ps.Z (offline?)
http://ftp.funet.fi/pub/sci/neural/neuroprose/bottou-effvc.ps.Z (mirror)
neuroprose-1994.djvu neuroprose-1994.pdf neuroprose-1994.ps.gz

@techreport{bottou-cortes-vapnik-94,
  author = {Bottou, L\'{e}on and Cortes, Corinna and Vapnik, Vladimir},
  title = {On the Effective VC Dimension.},
  institution = {Neuroprose},
  number = {bottou-effvc.ps.Z},
  year = {1994},
  note = {Also available on http://leon.bottou.org/papers},
  url = {http://leon.bottou.org/papers/bottou-cortes-vapnik-94},
}

Notes

This paper describes very early data-dependent and algorithm-dependent generalization bounds. Obtaining such bounds in the inductive setup is technically difficult [1,2]. This early work completely avoids these difficulties using a striking shortcut. It abandons the standard learning setup for a purely transductive setup. As a result we very easily obtain better bounds without even requiring the standard independence assumption. A similar approach is advocated in [3]. See also [4].

[1] M. Anthony, J. Shawe-Taylor, Sufficient Condition for Polynomial Distribution-dependent Learnability, Discrete Applied Mathematics, 77(1), pages 1-12, (1997).
[2] J. Shawe-Taylor, P. Bartlett, R. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies, IEEE Transactions on Information Theory, 44(5), pages 1926-1940, (1998).
[3] K. V. Vorontsov, Combinatorial substantiation of learning algorithms, Computational Mathematics and Mathematical Physics, 44(11), pages 1997-2009, (2004).
[4] L. Bottou, Y. Le Cun and V. Vapnik: Predicting Learning Curves without the Ground Truth Hypothesis, Tech Report, 1999.