Isaac Scientific Publishing
Journal of Advances in Applied Mathematics
JAAM > Volume 2, Number 3, July 2017
Special Issue on Mathematical Optimization

On Sparsity of Soft Margin Support Vector Machines

Download PDF  (457.4 KB)PP. 109-114,  Pub. Date:June 22, 2017
DOI: 10.22606/jaam.2017.23001

Jochen Merker
Faculty of Computer Science, Mathematics and Natural Sciences, University of Applied Sciences Leipzig, Germany
In supervised machine learning, a support vector machine (SVM) constructs from binary classified training data a linear classifier by solving a linearly constrained convex optimization problem. Depending on the number N of training data and the dimension D of the feature space, it either is advantageous to solve the primal problem or the dual problem. In this article, the case D >> N is discussed where D is so large that even a calculation of the dot product of fully occupied vectors in dimension D is too slow for the desired (e.g. real-time) application. Then a way to speed up the classification is to use an SVM which constructs a sparse linear classifier by solving an optimization problem involving the 1-norm, i.e. many components of the classifying vector are zero so that much less than D multiplications are neccessary to calculate the dot product. For a soft-margin SVM, in this article a theorem on the number of non-zero components is shown.
Support vector machines, SVM, sparse classifier, L1 regularization, LASSO, big data, machine learning.
  • [1]  V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
  • [2]  P. Foremski, “On different ways to classify internet traffic: a short review of selected publications,” Theor. Appl. Inf., vol. 25, pp. 119–136, 2013.
  • [3]  H. Kim, K. Clay, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, “Internet traffic classification demystified: Myths, caveats, and the best practices,” in Proceedings of the 2008 ACM CoNEXT conference. ACM, 2008, pp. 11–14.
  • [4]  P. Velan, M. Cermák, P. Celeda, and M. Dra?ar, “A survey of methods for encrypted traffic classification and analysis,” Int. J. Network Mgmt., pp. 1–24, 2014.
  • [5]  A. Khakpour and A. Liu, “An information-theoretical approach to high-speed flow nature identification,” IEEE/ACM Transactions on Networking, vol. 21, pp. 1076–1089, 2013.
  • [6]  B. Sch?lkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.
  • [7]  J. Fehr, K. Arreola, and H. Burkhardt, “Fast support vector machine classification of very large datasets,” in Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006.
  • [8]  E. Carrizosa, “Support vector machines and distance minimization,” in Data mining and mathematical programming / Panos M. Pardalos, Pierre Hansen, editors. CRM proceedings & lecture notes, ISSN 1065-8580; Volume 45. AMS, 2008, pp. 1–13.
  • [9]  T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2009, 2013.
  • [10]  S. Rosset and J. Zhu, “Piecewise linear regularized solution paths,” The Annals of Statistics, vol. 35, no. 3, pp. 1012–1030, 2007.
  • [11]  J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, “1-norm support vector machines,” in Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference, 2004, pp. 49–63.
  • [12]  V. Gómez-Verdejo, M. M. Ramón, J. Arenas-García, and H. Molina-Bulla, “Support vector machines with constraints for sparsity in the primal parameters,” IEEE Transactions on neural networks, vol. 22, no. 8, pp. 1269–1283, 2011.
Copyright © 2017 Isaac Scientific Publishing Co. All rights reserved.