Wow, this is a great article, although fairly hard to read in this format. But it's worth at least skimming through if you are interested in this area. Some summary excerpts that give a sense of the trajectory:
"we were porting a lot of old SIMD code that we needed a new set of best practices to work better with SSE"
"auto-vectorization is the kind of cool idea"
"..if you bank on this and hope the compiler will always SIMD-ify your loops, you're making a fatal mistake."
"And this is the MAJOR problem with compiler auto-vectorization. It silently breaks during regular maintenance of the code."
"we should have this feature enabled, but it should be
considered a small bonus if it works"
"So what do we use? We use intrinsic functions. All our SIMD code is currently written this way."
"To use intrinsics effectively we have to be familiar with how the CPU works. We need to know what its good at doing, and what its bad at doing."
" If you have a 64-bit build, you WILL have SSE2 instructions"
"don't spend time on trying to wrap up SSE functionality."
"doomed to run slowly and waste a lot of time trying
to do this."
"Data layout is THE most important part of SIMD programming. If the data layout is poor, then no amount of SIMD is going to help."
Followed by some great specific examples...
And a useful chart at the bottom showing the percentage of targets that support SSE through AVX2.
Yeah, this is definitely one of the better articles on practical real world SIMD programming I've seen. Recommended for anyone interested in optimising their code using SIMD, particularly on x86/x64.
"we were porting a lot of old SIMD code that we needed a new set of best practices to work better with SSE"
"auto-vectorization is the kind of cool idea"
"..if you bank on this and hope the compiler will always SIMD-ify your loops, you're making a fatal mistake."
"And this is the MAJOR problem with compiler auto-vectorization. It silently breaks during regular maintenance of the code."
"we should have this feature enabled, but it should be considered a small bonus if it works"
"So what do we use? We use intrinsic functions. All our SIMD code is currently written this way."
"To use intrinsics effectively we have to be familiar with how the CPU works. We need to know what its good at doing, and what its bad at doing."
" If you have a 64-bit build, you WILL have SSE2 instructions"
"don't spend time on trying to wrap up SSE functionality."
"doomed to run slowly and waste a lot of time trying to do this."
"Data layout is THE most important part of SIMD programming. If the data layout is poor, then no amount of SIMD is going to help."
Followed by some great specific examples...
And a useful chart at the bottom showing the percentage of targets that support SSE through AVX2.