107 - List Vector with Pragma annotation
$ cd ../107
$ cat -n listvec2.c
code:listvec2.c
4 #define N 100 * (1 << 20) /* 100 M */ 21 double start = get_dtime();
23 for(i = 0; i < N; i++) {
24 a[idxi] = a[idxi] + bi; 25 }
26 printf("elapsed time = %.3f sec\n",get_dtime() - start);
---
$ gcc -O listvec2.c
$ a.out // 0.370 sec
$ ncc listvec2.c -o n.out
$ n.out // 0.615 sec = x0.60 speed-down
Possible True (Flow)-dependency (RAW: Read after Write conflict) prevents parallelization.
https://gyazo.com/d92225f417405e4f55fd8da32c4368bb
pragma ivdep tells no True-depedency exists among loop.
$ diff listvec2.c listvec2p.c
https://gyazo.com/5e11c6ca5a3ad60798551e36961ac692
$ ncc listvec2p.c -o np.out
$ np.out // 0.0118 sec = x5.2 speed-up against gcc = 5.2 times improve with this Pragma.x
https://gyazo.com/e1d7eb3a48c4e1a571abbd289a04fd39
An additional Pragma '_NEC vovertake' (Vec Overtake), which tells no antidependency (Write after Write) enables out of write and improves performance.
$ diff listvec2p.c listvec2pp.c
$ ncc listvec2pp.c -o npp.out
$ npp.out // 0.031 sec = x19.8 speed-up against gcc
improved from 5.2 times in previous 'Pragma ivdep' !
https://gyazo.com/c1099a9f4b981bb7b34fdd0bf61dc052