Wed. Dec 6th, 2023

Nov 1, 2016

# Covariance

The description of the covariance is taken from [1][2].

Covariance is one of the operator in statistics and data mining. It takes an input data as a 2D matrix and calculate the covariance of each two column. Therefore, the output is a symmetric matrix.

data:  is an $n\times m$ matrix that contains the input data

cove : is an $m\times m$ matrix that contains the results.
$cov(i, j)=\frac{\sum^{N-1}_{k=0}{(data(k,i)-mean(i))(data(k,j)-mean(j))}}{N-1}$

where
$mean(i)=\frac{\sum^{N-1}_{k=0}{data(k,i)}}{N}$

This description can be divided into three kernels [2]:  k_mean, k_reduce and k_covar as shown in the following figure.

There are three memory objects shares the data between kernels and host program: d_data, d_mean and d_covar

The k_mean kernel calculates the mean value of each column in the d_data 2D  matrix. The k_mean kernel subtract the corresponding mean from each element in the d_data. Finally, d_covar calculate the covariance.

The three kernels are running sequentially one after another as shown in the figure.

The corresponding unoptimised code can be found at here.

References

[1] Tomofumi Yuki, Louis-Noel Pouchet, “PolyBench 4.2.1 (pre-release),” May 20, 2016, [online] http://web.cse.ohio-state.edu/~pouchet/software/polybench/

[2] S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula and J. Cavazos, “Auto-tuning a high-level language targeted to GPU codes,” Innovative Parallel Computing (InPar), 2012, San Jose, CA, 2012, pp. 1-10. [online] https://cavazos-lab.github.io/PolyBench-ACC/