I see they learned clang’s dirty little secret over intrinsics viz. that in producing the IR it deviates (sometimes dramatically when AVX-512 is concerned) from the documented opcodes and the results are inevitably detrimental.
This is why ffmpeg uses assembly, and people get extremely mad when you say it's done for a reason, because they always want to come up with a fancier abstraction (usually cross-platform) which then defeats the purpose because it doesn't actually work.
nb those abstractions do make sense when you can only afford to write a single implementation of the algorithm; then you're just talking about a high level programming language. But they frequently fail to achieve their goal when you're writing a second implementation for the sole purpose of being faster.