Factoring in say 10x-100x repetition code for for error correction, existing silicon tech is already damn near to the launder limit. General purpose CPU and GPU architecture has some data flow logistics it could optimize, but ASICs are pretty much at the performance limit already.