This project aims to implement the Support Vector Machine (SVM) on a Zynq 7000 or Zynq-MPSoC board. The Xilinx Vitis unified software platform is used for developing the SVM application on the Zybo-Z7-20, a Zynq 7000 embedded system from Digilent.

If you are interested in using Vitis to accelerate applications on Zynq, please refer to here or here.

This project description consists of four sections:

  1. Introduction
  2. Software code
  3. Hardware code
  4. Vitis


We can classify objects if we know some of their features.

Let’s consider two groups of flowers: roses and sunflowers. If we know the flower diameters and their stem or petal length as their features, we can say that a given flower is a rose or sunflower.

As another example, assume dogs and cats. If we know the ear length and nose diameter or snout geometry as their features, we can classify a given animal as a dog or cat.

Let’s consider two groups of objects that can be classified considering two features x1 and x2.

This table shows the corresponding data for four objects. We have two classes, red denoted by -1 and blue denoted by

We can use the coordinate system to illustrate these objects. Now we are going to use a model to classify these objects into two groups. A straight line can classify the objects in this simple example.

If we can find the function defining this line, then we can use that to classify a new object as red or blue. For example, if the value of the function for the new object is negative, then the object is red. Otherwise, it is blue. A linear equation with three coefficients can represent this line. We can use the W vector and the b constant to represent the coefficients.

If we can use our data set, then we can find the coefficients of models.

This process is called model training (or learning), and the data used for this purpose is called training data. If the trained data contains the features and the corresponding classes, the model supports supervised learning.

The process of classifying the new object requires a decision-making mechanism which is called the inference process. Here, our line model classifies the new object as blue. In our example, we had two groups of objects. A line can classify an object. However, several lines can do the task. If we consider a new object, some lines classify that as red, and some others classify that as blue. Which classification line (or model) is better?

The gaols of SVM or support vector machine is defining the best classifier and finding that.

SVMs are the most popular tools for classification and regression analysis in machine learning algorithms. SVMs are supervised learning models with associated learning algorithms.

This figure shows three different classifiers for our example. The SVM uses the concept of margin to find the best classifier. The margin is defined as the minimum distance of the classier to the member of the classes. The SVM claims that the best classifier is the one with a bigger margin. Then it proposes a mathematical approach to find the model coefficients to maximise the margin considering the training data. If you are interested, please refer to the original libSVM paper.

C-Support Vector Classification


Software code

libSVM is one of the most successful SVM open-source implementations. You can download this code from here. This implementation consists of two main software tools: one for training and another for inference. If you compile the code, then svm-train performs the training, and svm-predict performs the inference. In this o project we are going to implement the training part on a Zynq based embedded system.

This software is supported by a set of training and test data that can be found on their website here.

This is the content of a training data file. Each line consists of an object’s data: first its class and then the list of its features.

Here the class is 1 or -1 as we have only two classes of objects.

A sparse representation codes a feature. Each element contains the column index and the value.

So we have a vector, denoted by y, representing the classes. And a sparse matrix, denoted by x, representing the object features.

If we look at the libSVM code, the get_Q function in SVC_Q class is one of the compute-intensive parts that can be run on an FPGA.

The code calls the dot function several times to perform the dot-product of two sparse vectors. In addition, it calls the SVM kernel function. The get_Q function actually performs a sparse matrix-vector multiplication [1].

Hardware Code

So to map the set_Q function on the FPGA in Zynq,  we need a sparse matrix-vector multiplication (SpMV)suitable for hardware. The following code shows an SpMV implementation for FPGA. To understand this code, you can refer to here or here.

#define DIM 60098
extern "C" {
void spmv_kernel(
		float          *values,
		unsigned int   *col_indices,
		unsigned int   *row_indices,
		float          *x,
		float          *y,
		unsigned int    n,
		unsigned int    m) {
#pragma HLS INTERFACE m_axi bundle=gmem_0 port=values
#pragma HLS INTERFACE m_axi bundle=gmem_1 port=col_indices
#pragma HLS INTERFACE m_axi bundle=gmem_0 port=x
#pragma HLS INTERFACE m_axi bundle=gmem_0 port=y
#pragma HLS INTERFACE m_axi bundle=gmem_0 port=row_indices
	float x_local[DIM];
	float y_local[DIM];
	float row_indices_diff_local[DIM];
	unsigned int nnz = 0;
	for (unsigned int i =0; i < m; i++) {
		x_local[i] = x[i];
	unsigned int previous_row_index;
	for (unsigned int i =0; i < n+1; i++) {
		unsigned int row_index = row_indices[i];
		if (i > 0) {
			row_indices_diff_local[i-1] = row_index-previous_row_index;
			nnz += row_index-previous_row_index;;
		previous_row_index = row_index;
	double y_previous_break = 0.0;
	double y_all_row = 0.0;
	unsigned int j = 0;
	unsigned int remained_row_index = row_indices_diff_local[j++];
	for (int i = 0; i < nnz; ++i) {
		int k = col_indices[i];
		float y_t = values[i] * x_local[k];
		y_all_row += y_t;
		if (remained_row_index == 0) {
			y_local[j-1] = y_all_row - y_previous_break;
			y_previous_break = y_all_row;
			remained_row_index = row_indices_diff_local[j++];
	for (unsigned int i =0; i < n; i++) {
		y[i] = y_local[i];


These are the steps to implement the libSVM in Vitis.

1- Create a Vitis project and select the Zybo-Z7-20 as the platform.

2- A Vitis top-project containing three sub-projects will be created: host, kernel and linker projects.

3- Add three files from the libSVM into the host project.

4- Create a new source file under the kernel project and insert the SpMV code.

5- Add the OpenCL header files into the svm.h file under the host project.

6- Then we should modify a few functions in the svm-train.cpp file which are

main(), parse_command_line(), read_problem().

You can follow these changes in the video attached to this tutorial.

7- Then we should change the set_Q() function in the svm.c to call the SpMV hardware inside the FPGA.

int Q_HARDWARE(Qfloat* data, int i_vector, int start, int len) const {
		for ( int i = 0; i < col_size; i++) {
			ptr_x[i] = 0;
		int j = 0;
		for ( int i = 0; i < row_size; i++) {
			const svm_node* px = x_hardware[i];
			ptr_row_indices[i] = j;
			while (px->index != -1) {
				double v = px->value;
				ptr_values[j] = v;
				ptr_col_indices[j] = px->index;
		ptr_row_indices[row_size] = j;
		for ( int i = ptr_row_indices[i_vector]; i < ptr_row_indices[i_vector+1]; i++) {
			double value = ptr_values[i];
			int col_index = ptr_col_indices[i];// ptr_col_indices[i];
			ptr_x[col_index] = value;
		cl_int err;
		//set the kernel Arguments
		int narg=0;
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,buffer_values));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,buffer_col_indices));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,buffer_row_indices));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,buffer_x));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,buffer_y));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,start));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,len));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,row_size));
		OCL_CHECK(err, err = kernel_spmv->setArg(narg++,col_size));
	    OCL_CHECK(err, err = queue->enqueueMigrateMemObjects({buffer_values,buffer_col_indices, buffer_row_indices, buffer_x},0/* 0 means from host*/));
	    OCL_CHECK(err, err = queue->enqueueTask(*kernel_spmv));
	    OCL_CHECK(err, err = queue->enqueueMigrateMemObjects({buffer_y},CL_MIGRATE_MEM_OBJECT_HOST));
		for (int j = start; j < len; j++) {
			data[j] = (Qfloat)(y[i_vector] * y[j] * ptr_y[j]);
		return 0;
	Qfloat *get_Q(int i, int len) const
		Qfloat *data;
		int start, j;
		if((start = cache->get_data(i,&data,len)) < len)
			Q_HARDWARE(data, i, start, len);
		return data;

8- Now we can compile the code and run that on the emulators or actual FPGA.

[1] M. Hosseinabady and J. L. Nunez-Yanez, “A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 6, pp. 1272-1285, June 2020, doi: 10.1109/TCAD.2019.2912923.