link.springer.com

Simplified PAC-Bayesian Margin Bounds

Abstract

The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unit-norm weight vectors and unit-norm feature vectors. Unit-norm margin bounds have been proved previously using fat-shattering arguments and Rademacher complexity. Recently Langford and Shawe-Taylor proved a dimension-independent unit-norm margin bound using a relatively simple PAC-Bayesian argument. Unfortunately, the Langford-Shawe-Taylor bound is stated in a variational form making direct comparison to fat-shattering bounds difficult. This paper provides an explicit solution to the variational problem implicit in the Langford-Shawe-Taylor bound and shows that the PAC-Bayesian margin bounds are significantly tighter. Because a PAC-Bayesian bound is derived from a particular prior distribution over hypotheses, a PAC-Bayesian margin bound also seems to provide insight into the nature of the learning bias underlying the bound.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bartlett, P.: Personal communication (2003)

    Google Scholar 

  2. Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  3. Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the the size of the network. IEEE Transactions on Information Theory (March 1998)

    Google Scholar 

  4. Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)

    Article  MathSciNet  Google Scholar 

  5. Catoni, O.: Gibbs estimators. To appear in Probability Theory and Related fields

    Google Scholar 

  6. Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear clasifiers: Why svms work. In: Advances in Neural Information Processing Systems, NIPS (2001)

    Google Scholar 

  7. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)

    Article  MATH  MathSciNet  Google Scholar 

  8. Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics 30 (2002)

    Google Scholar 

  9. Langford, J., Seeger, M., Megiddo, N.: An improved predictive accuracy bound for averaging classifiers. In: ICML 2001 (2001)

    Google Scholar 

  10. Langford, J., Seger, M.: Bounds for averaging classifiers. CMU Technical Report CMU-CS-01-102 (2002)

    Google Scholar 

  11. Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Neural Information Processing Systems (NIPS) (2002)

    Google Scholar 

  12. McAllester, D.: PAC-Bayesian stochastic model selection. Machine Learning 5, 5–21 (2003); A short version appeared as PAC-Bayesian Model Averaging. In: COLT 1999 (1999)

    Google Scholar 

  13. McAllester, D., Ortiz, L.: Concentration inequalities for the missing mass and for histogram rule error. In: Neural Information Processing systems (NIPS) (2002)

    Google Scholar 

  14. Ruben, H.: A new asymptotic expansion for the normal probability integral and mill’s ratio. Journal of the Royal Statistical Society. Series B (Methodological) 24(1), 177–179 (1962)

    MATH  MathSciNet  Google Scholar 

  15. Schapire, R., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Machine Learning: Proceedings of the 14th International Conference (1997)

    Google Scholar 

  16. Seeger, M.: PAC-Bayesian generalization bounds for gaussian processes. Journal of Machine Learning Research 3, 233–269 (2002)

    Article  MathSciNet  Google Scholar 

  17. Shawe-Taylor, J., Bartlett, P., Williamson, R., Anthony, M.: A Framework for structural risk minimization. IEEE Transactions on Information Theory 44(5), 1926–1940 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Toyota Technological Institute at Chicago,  

    David McAllester

Authors

  1. David McAllester

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. MPI for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany

    Bernhard Schölkopf

  2. University of California, Santa Cruz

    Manfred K. Warmuth

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McAllester, D. (2003). Simplified PAC-Bayesian Margin Bounds. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45167-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40720-1

  • Online ISBN: 978-3-540-45167-9

  • eBook Packages: Springer Book Archive

Publish with us