Ewart Timothée
2015-03-13 09:04:19 UTC
Hello all,
I have a issue/question using VMX/VSX on Power8 processor on a little endian system.
Using intrinsics function, if I perform an operation with vec_vsx_ld(…) - vet_vsx_st(), the compiler will add
a permutation, and then perform an operations (memory correctly aligned)
lxvd2x …
xxpermdi …
operations ….
xxpermdi
stxvd2x …
If I use vec_ld() - vec_st()
lvx
operations …
stvx
Reading the ISA, I do not see a real difference between this 2 instructions ( or I miss it)
So my 3 questions are:
Why do I have permutations ?
What is the cost of these permutations ?
What is the difference vet_vsx_ld and vec_ld for the performance ?
Best
Tim
Timothée Ewart, Ph. D.
http://www.linkedin.com/in/tewart
I have a issue/question using VMX/VSX on Power8 processor on a little endian system.
Using intrinsics function, if I perform an operation with vec_vsx_ld(…) - vet_vsx_st(), the compiler will add
a permutation, and then perform an operations (memory correctly aligned)
lxvd2x …
xxpermdi …
operations ….
xxpermdi
stxvd2x …
If I use vec_ld() - vec_st()
lvx
operations …
stvx
Reading the ISA, I do not see a real difference between this 2 instructions ( or I miss it)
So my 3 questions are:
Why do I have permutations ?
What is the cost of these permutations ?
What is the difference vet_vsx_ld and vec_ld for the performance ?
Best
Tim
Timothée Ewart, Ph. D.
http://www.linkedin.com/in/tewart