Link to paper: https://arxiv.org/abs/2301.11325 (the one on the site seems broke...

Link to paper: https://arxiv.org/abs/2301.11325 (the one on the site seems broken)

Even with the knowledge that something like this is inevitabely coming, it's pretty crazy to see the examples. The fact it can create fitting music from painting descriptions (including ones such as or Guernica) is unexpected. And it seems to generate almost naturalistic vocals with reasonable-sounding lyrics, with no embedded structure for lyrics or anything.

The paper itself is pretty cool. While GPT-3 was just "GPT-2, but bigger", this seems to have required some clever combination of models. Wonder if similar ideas will be needed for longer-form text or video generation.