I-BiDaaS project’s ADMM machine learning algorithm implementations

In the context of the H2020 project I-BiDaaS [1], University of Novi Sad, Faculty of Sciences and Barcelona Supercomputing Center have developed a pool of open-source implementations of Machine Learning (ML) algorithms based on the Alternating Direction Method of Multipliers (ADMM) [2], CVXPY [3], a Python-based toolbox for convex optimization, and COMPSs [4], a programming model for distributed infrastructures that allows to create parallel algorithm implementations from a sequential programming paradigm. This work has been promoted as an excellent innovation by the EU Innovation Radar [5], and it has been published at the Big Data Value Association (BDVA) Innovation Marketplace [6]. A number of related implementations are available at the I-BiDaaS knowledge repository [7], and the ADMM implementation of Least Absolute Shrinkage and Selection Operator (LASSO) has been included in the dislib library [8], an open source COMPSs-based library oriented to ML.

The solution can be used to train various ML models, including regression, classification, clustering over a computer cluster where COMPSs has been installed. Users can run a pre-defined ML model, or they can encode a new model by setting a different objective function for training. In this sense, the available implementation can be seen as a code template, where a new model can be obtained with a minor programming effort (see Figure 1 for illustration of the concept). Current implementation assumes that the input data is structured, organized into numerical-valued matrices, and split in multiple files, one per machine in the cluster. Parallel, scalable execution over the cluster is achieved by the inherent parallelization of ADMM and the underlying COMPSs runtime system.

This solution features a high reusability of the developed code, since with easy and small code changes new models can be supported. Furthermore, it corresponds, to the best of our knowledge, to the first ADMM method implementation in COMPSs and gives a novel addition to dislib. A technological novelty also lies in the integration of COMPSs and CVXPY through the ADMM framework to develop efficient methods for ML training. In this way, we exploit for the first time the benefits of parallel execution due to COMPSs, and the efficient convex optimization problems solutions due to CVXPY.

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780787.

Fig. 1. Illustration of the ADMM ML algorithms approach through an example involving support vector machine and logistic regression models.

[1] https://ibidaas.eu/

[2] Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Foundations and Trends in Machine Learning, 3(1):1–122, 2011.

[3] CVXPY: A Python-Embedded Modeling Language for Convex Optimization, S. Diamond and S. Boyd, Journal of Machine Learning Research, 17(83):1-5, 2016.

[4] ServiceSs: an interoperable programming framework for the Cloud, Journal of Grid Computing, March 2014, Volume 12, Issue 1, pp 67–91, Lordan, F., E. Tejedor, J. Ejarque, R. Rafanell, J. Álvarez, F. Marozzo, D. Lezzi, R. Sirvent, D. Talia, and R. M. Badia, DOI: 10.1007/s10723-013-9272-5

[5] https://www.innoradar.eu/innovation/35298

[6] https://marketplace.big-data-value.eu/content/admm-machine-learning-algorithms

[7] https://github.com/ibidaas/knowledge_repository/tree/master/tools_technologies/sources/batch_processing/unspmf

[8] J. Álvarez Cid-Fuentes, S. Solà, P. Álvarez, A. Castro-Ginard, and R. M. Badia, “dislib: Large Scale High Performance Machine Learning in Python”, in Proceedings of the 15th International Conference on eScience, 2019, pp. 96-105.