Reducing II in HLS: Balanced-Paths

the goal of this blog is to show the impact of unbalanced conditional paths in high-level synthesis. For this purpose, I am using a synthetic example.

Let’s consider the following code in which we have an if statement without the else part. Let’s call that an unbalanced-if.

#define N 2014
int unbalanced_if(float a[N], float b[N], float c[N]) {
    for (unsigned int i = 0; i < N; i++) {
#pragma HLS PIPELINE
        c[i] += a[i]*b[i];
        if (i == 0 ) {
            c[i] += a[i];
        }
    }
    return 0;
}

If we synthesise this code by using Viviado-HLS, then we will get the following report. The loop initiation interval (II) is 14, and the loop iteration latency is 17. The whole design takes 28200 clock cycles to finish.

As shown in the following performance analysis digram, the main reason for the high II is the twice modification of C array inside at least one loop iteration.

Now let’s convert the unbalanced-if condition into a balanced-if condition that means it has two parts (i.e., if and else) and transfer the computation outside the condition into the if blocks, as shown in the following code.

#define N 2014
int balanced_if(float a[N], float b[N], float c[N]) {
    for (unsigned int i = 0; i < N; i++) {
#pragma HLS PIPELINE

        if (i == 0 ) {
            c[i] += a[i] + a[i]*b[i];
        } else {
            c[i] += a[i]*b[i];
        }
    }
    return 0;
}

After synthesising the code by Vivado-HLS the II = 7 is half of the II in the previous case. The whole design takes 14106 clock cycle to finish. Therefore, the design is twice faster than that of the code with the unbalanced-if condition.

By having a close look at the performance analysis digram, we realise that there is only one modification to the C array in each for-loop iteration.

So, the problem was having multiple accesses to the C in an iteration. Let’s go one step further and have only one access to the C by modifying the code as below:

#define N 2014
int balanced_if(float a[N], float b[N], float c[N]) {
  for (unsigned int i = 0; i < N; i++) {
#pragma HLS PIPELINE
    float tmp;
    if (i == 0 ) {
      tmp = a[i] + a[i]*b[i];
    } else {
      tmp = a[i]*b[i];
    }
    c[i] += tmp;
  }
  return 0;
}

Now, if you synthesise the code you will get II=1. which is perfect.