I'm not sure what you're getting at. The point of these (ill-defined) alignment exercises is not to achieve parity with humans, but to constrain AI systems so that they behave in our best interest. Or, more prosaically, that they don't say or do things that are a brand safety or legal risk for their operator.
Still, I think that the original paper and this take on it are just exercises in excessive anthropomorphizing. There's no special reason to believe that the processes within an LLM are analogous to human thought. This is not a "stochastic parrot" argument. I think LLMs can be intelligent without being like us. It's just that we're jumping the gun in assuming that LLMs have a single, coherent set of values, or that they "knowingly" employ deception, when the only thing we reward them for is completing text in a way that pleases the judges.
Still, I think that the original paper and this take on it are just exercises in excessive anthropomorphizing. There's no special reason to believe that the processes within an LLM are analogous to human thought. This is not a "stochastic parrot" argument. I think LLMs can be intelligent without being like us. It's just that we're jumping the gun in assuming that LLMs have a single, coherent set of values, or that they "knowingly" employ deception, when the only thing we reward them for is completing text in a way that pleases the judges.