How can I use openmp and AVX2 simultaneously with perfect answer?
How can I use openmp and AVX2 simultaneously with perfect answer? I wrote the Matrix-Vector product program using OpenMP and AVX2. However, I got the wrong answer because of OpenMP. The true answer is all of the value of array c would become 100. My answer was mix of 98, 99, and 100. The actual code is below. I compiled Clang with -fopenmp, -mavx, -mfma. #include "stdio.h" #include "math.h" #include "stdlib.h" #include "omp.h" #include "x86intrin.h" void mv(double *a,double *b,double *c, int m, int n, int l) { int k; #pragma omp parallel { __m256d va,vb,vc; int i; #pragma omp for private(i, va, vb, vc) schedule(static) for (k = 0; k < l; k++) { vb = _mm256_broadcast_sd(&b[k]); for (i = 0; i < m; i+=4) { va = _mm256_loadu_pd(&a[m*k+i]); vc = _mm256_loadu_pd(&c[i]); vc = _mm256_fmadd_pd(vc, va, vb); _mm...