Journal of Advances in Applied Mathematics

JAAM
>
Volume 2, Number 3, July 2017

Special Issue on
Mathematical Optimization

On Sparsity of Soft Margin Support Vector Machines
DOI:

10.22606/jaam.2017.23001
**Author(s)**
Jochen Merker

**Affiliation(s)**
Faculty of Computer Science, Mathematics and Natural Sciences, University of Applied Sciences Leipzig, Germany

**Abstract**
In supervised machine learning, a support vector machine (SVM) constructs from binary
classified training data a linear classifier by solving a linearly constrained convex optimization
problem. Depending on the number

*N *of training data and the dimension

*D* of the feature space,
it either is advantageous to solve the primal problem or the dual problem. In this article, the
case

*D *>>

* N* is discussed where

*D* is so large that even a calculation of the dot product of fully
occupied vectors in dimension

*D* is too slow for the desired (e.g. real-time) application. Then a way
to speed up the classification is to use an SVM which constructs a sparse linear classifier by solving
an optimization problem involving the 1-norm, i.e. many components of the classifying vector are
zero so that much less than

*D* multiplications are neccessary to calculate the dot product. For a
soft-margin SVM, in this article a theorem on the number of non-zero components is shown.

**Keywords**
Support vector machines, SVM, sparse classifier, L1 regularization, LASSO, big data,
machine learning.

**References**
- [1] V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
- [2] P. Foremski, “On different ways to classify internet traffic: a short review of selected publications,” Theor.
Appl. Inf., vol. 25, pp. 119–136, 2013.
- [3] H. Kim, K. Clay, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, “Internet traffic classification demystified:
Myths, caveats, and the best practices,” in Proceedings of the 2008 ACM CoNEXT conference. ACM, 2008,
pp. 11–14.
- [4] P. Velan, M. Cermák, P. Celeda, and M. Dra?ar, “A survey of methods for encrypted traffic classification and
analysis,” Int. J. Network Mgmt., pp. 1–24, 2014.
- [5] A. Khakpour and A. Liu, “An information-theoretical approach to high-speed flow nature identification,”
IEEE/ACM Transactions on Networking, vol. 21, pp. 1076–1089, 2013.
- [6] B. Sch?lkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization,
and Beyond. MIT Press, 2001.
- [7] J. Fehr, K. Arreola, and H. Burkhardt, “Fast support vector machine classification of very large datasets,” in
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006.
- [8] E. Carrizosa, “Support vector machines and distance minimization,” in Data mining and mathematical
programming / Panos M. Pardalos, Pierre Hansen, editors. CRM proceedings & lecture notes, ISSN 1065-8580;
Volume 45. AMS, 2008, pp. 1–13.
- [9] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2009, 2013.
- [10] S. Rosset and J. Zhu, “Piecewise linear regularized solution paths,” The Annals of Statistics, vol. 35, no. 3,
pp. 1012–1030, 2007.
- [11] J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, “1-norm support vector machines,” in Advances in Neural
Information Processing Systems 16: Proceedings of the 2003 Conference, 2004, pp. 49–63.
- [12] V. Gómez-Verdejo, M. M. Ramón, J. Arenas-García, and H. Molina-Bulla, “Support vector machines with
constraints for sparsity in the primal parameters,” IEEE Transactions on neural networks, vol. 22, no. 8, pp.
1269–1283, 2011.