## On the Effective VC Dimension.

*Abstract*:
The very idea of an effective Vapnik Chervonenkis (VC) dimension
(Vapnik et al, 1994) relies on the hypothesis that the relation between
the generalization error and the number of training examples can be
expressed by a formula algebraically similar to the VC bound. This
hypothesis calls for a serious discussion since the traditional VC
bound widely overestimates the generalization error.
In this paper we describe an algorithm and data dependent measure of
capacity. We derive a confidence interval on the difference
between the training error and the generalization error. This
confidence interval is much tighter than the traditional VC bound.
A simple change of the formulation of the problem yields this extra
accuracy: our confidence interval bounds the error difference between
a training set and a test set, rather than the error difference between
a training set and some hypothetical grand truth. This
*transductive approach* allows for deriving a *data and algorithm
dependent* confidence interval.

**, Neuroprose (bottou.effvc.ps.Z), 1994.**

*On the Effective VC Dimension.*
ftp://archive.cse.ohio-state.edu/pub/neuroprose/bottou-effvc.ps.Z (offline?)

http://ftp.funet.fi/pub/sci/neural/neuroprose/bottou-effvc.ps.Z (mirror)

neuroprose-1994.djvu
neuroprose-1994.pdf
neuroprose-1994.ps.gz

@techreport{bottou-cortes-vapnik-94, author = {Bottou, L\'{e}on and Cortes, Corinna and Vapnik, Vladimir}, title = {On the Effective VC Dimension.}, institution = {Neuroprose}, number = {bottou-effvc.ps.Z}, year = {1994}, note = {Also available on http://leon.bottou.org/papers}, url = {http://leon.bottou.org/papers/bottou-cortes-vapnik-94}, }

### Notes

This paper describes very early **data-dependent** and **algorithm-dependent** generalization bounds.
Obtaining such bounds in the inductive setup is technically difficult [1,2].
This early work completely avoids these difficulties using a striking shortcut.
It abandons the standard learning setup for a **purely transductive setup**.
As a result we very easily obtain better bounds without even requiring
the standard independence assumption. A similar approach is advocated in [3].
See also [4].

**[1]**M. Anthony, J. Shawe-Taylor, Sufficient Condition for Polynomial Distribution-dependent Learnability,*Discrete Applied Mathematics*, 77(1), pages 1-12, (1997).**[2]**J. Shawe-Taylor, P. Bartlett, R. Williamson, M. Anthony, Structural risk minimization over data-dependent hierarchies,*IEEE Transactions on Information Theory*, 44(5), pages 1926-1940, (1998).**[3]**K. V. Vorontsov, Combinatorial substantiation of learning algorithms,*Computational Mathematics and Mathematical Physics*, 44(11), pages 1997-2009, (2004).**[4]**L. Bottou, Y. Le Cun and V. Vapnik:*Predicting Learning Curves without the Ground Truth Hypothesis*, Tech Report, 1999.