> Respectfully, I think we cracked basic intelligence. What do you imagine under...

terminalcommand · on May 15, 2024

I challenge you to imagine an imaginary game or computer language, explain the rules to the LLM. It will learn and play the game (or write programs in your invented language), although you imagined it. There was no resource to train on. Nobody knows of that game or language. LLM learns on the spot with your instructions and plays the game.

I cannot understand grad school level mathematics even if you give me all the books and papers in the world. I was not formally trained in mathematics, does that make me not intelligent?

sph · on May 15, 2024

"Ability to solve novel problems" does not mean ability to solve all problems, nor to have all the knowledge in the world. A caveman can be as intelligent as you and I, even without being able to read.

Not having knowledge in mathematics has no impact on your intelligence quotient.

terminalcommand · on May 15, 2024

I agree. But LLMs do solve novel problems in that case, you just have to explain it to them like you would have done to an intelligent caveman or me.

Which novel problem can't an LLM solve? I gave the example of an imaginary game because the LLM cannot have possibly trained on it as it is imagined by one person and nobody knows how to play it.

latexr · on May 15, 2024

I tried the first thing which came to mind, with GPT-4o. The problem is probably not novel, yet it failed anyway.

---

User:

Let’s play a game. Every time I give you a word, you separate it into its individual characters and assign a number to each letter, in order, following the Latin alphabet. So "a" is "1", "b" is "2", and so forth. You will then subtract 2 from each number, convert it back to the corresponding letter and return the new word. Did you get all that? Are you ready to start with my next word?

LLM:

Got it! I'm ready to start. What's your first word?

User:

anthropomorfic

LLM:

Sure! Let's break down "anthropomorfic".

1. Assign numbers to each letter:

a (1), n (14), t (20), h (8), r (18), o (15), p (16), o (15), m (13), o (15), r (18), f (6), i (9), c (3)

2. Subtract 2 from each number:

a (1-2=-1: Consider Zuzzer, i.e., `25`), n (14-2=12), t (20-2=18), h (8-2=6), r (18-2=16), o (15-2=13), p (16-2=14), o (15-2=13), m (13-2=11), o (15-2=13), r (18-2=16), f (6-2=4), i (9-2=7), c (3-2=1)

3. Convert them back to letters:

y, l, r, f, p, m, n, m, k, m, p, d, g, y

The new word is "ylrfpmnmkmpdgy".

---

Looks pretty good, right? It even found a good wraparound solution for “a” (I wasn’t trying to trick it, I just didn’t think about the word too hard), though I have no clue what it means by “Consider Zuzzer”. But then you look at the last one, where it converted “c” to “y”. That’s absolutely wrong and is the easiest one any human would guess. Yet you could miss it because all the previous ones were right.

Here’s a bonus. When verifying the answer, I queried Brave search for “letters of the alphabet with numbers assigned”. The AI-generated answer at the top was: A=1, B=2, C=3, D=4, E=5, F=6, G=7, H=8, I=9, J=1, K=2, L=3, M=4, N=5, O=6, P=7, Q=8, R=9, S=1, T=2, U=3, V=4, W=5, X=6, Y=7, Z=8. Which, alright, I can see the logic, but that’s not what anyone would want by asking that question.

terminalcommand · on May 15, 2024

For argument's sake I rewrote your prompt. I think gpt 3.5 understood it on the first try:

Prompt: Let’s play a cypher game.

The rules of the game are as follows:

1. Imagine a numbering system where each letter of the English alphabet corresponds to a number starting with 1. For example, a corresponds to 1, b corresponds to 2, c corresponds to three.

2. When you receive a word from me, break it into individual characters.

3. Convert each character into a list of numbers by following rule #1.

4. Subtract two from each number of the list and generate a new list, if the result is lesser than 0, add 26.

5. Convert each number in the new list into a letter by following rule #1 and present me your result.

Are you ready for the first word? ------- GPT: I'm ready! Go ahead and give me the word. Me: anthropomorfic GPT: Sure, let's start:

a -> 1 -> 1 - 2 = 25 (z)

n -> 14 -> 14 - 2 = 12 (l)

t -> 20 -> 20 - 2 = 18 (r)

h -> 8 -> 8 - 2 = 6 (f)

r -> 18 -> 18 - 2 = 16 (p)

o -> 15 -> 15 - 2 = 13 (m)

p -> 16 -> 16 - 2 = 14 (n)

o -> 15 -> 15 - 2 = 13 (m)

m -> 13 -> 13 - 2 = 11 (k)

o -> 15 -> 15 - 2 = 13 (m)

r -> 18 -> 18 - 2 = 16 (p)

f -> 6 -> 6 - 2 = 4 (d)

i -> 9 -> 9 - 2 = 7 (g)

c -> 3 -> 3 - 2 = 1 (a)

So, "anthropomorfic" becomes "zlrfpmmnpmkpdmga". Your turn!

fireflash38 · on May 15, 2024

Isn't 25 y not z?

latexr · on May 15, 2024

Indeed it is. That part is a huge mess. The LLM gave the wrong number when matching with the letter, but the letter itself is correct considering the instructions, except the instructions are wrong (ironic considering this came from someone claiming everyone is doing it wrong). Step four says:

> Subtract two from each number of the list and generate a new list, if the result is lesser than 0, add 26.

It should be “if the result is less than 1”, not “0”.

terminalcommand · on May 16, 2024

Well I told you I was bad at math. I wrote a faulty prompt. But chatgpt understood my instructions perfectly. That was the argument I was trying to demonstrate.

The problem lied between the chair and the computer.

We have to learn how to use LLMs.

latexr · on May 16, 2024

> But chatgpt understood my instructions perfectly.

No, it did not, because it still assigned Z to 25, which is wrong.

> We have to learn how to use LLMs.

You have to learn that LLMs aren’t magical and will get things wrong no matter how much context you give them. And that the suggestions you’re making are absurd to the point of making them useless.

fragmede · on May 16, 2024

Weird, it got ylrfpmnmkmpdga when I tried it.

https://chat.openai.com/share/49ad0bc9-7b3d-4295-860d-4c8168...

Does that make it intelligent, then?

latexr · on May 16, 2024

> Weird, it got ylrfpmnmkmpdga when I tried it.

That’s not weird at all. LLMs often give different answers to the same query. Which has been demonstrated several times in this thread.

> Does that make it intelligent, then?

No, it does not, because it isn’t consistent, it demonstrated it doesn’t understand.

https://news.ycombinator.com/item?id=40368446

By your logic, any system which spews random strings is intelligent because sometimes it’s randomness coincidentally aligns with the input you give it.

fragmede · on May 17, 2024

we use words like reason and understand and think, and try to apply them to LLMs, when it's just dot products and matrix multiplications at the heart of it, which is where we go wrong. these things are a new alien kind of intelligence, and we're struggling with it because it's completely foreign. it's more than a random coincidence. your logic was that, in your run of the same query, because it made a mistake on the last letter, which is a mistake an inattentive teenager or drunk adult could easily make, we can't consider it intelligent.

we're not talking about any system here, we're talking about LLMs and their ability to generate random coincidental text that does happen to align with the input given. when the output, coincidental and random as it may well be, is aligned with the input in a way that resembles intelligence, we do have to ponder not just what intelligence actually is, but also what it means to be intelligent. octopuses are intelligent but they're not able to solve your particular puzzle.

terminalcommand · on May 15, 2024

Try giving an example in your prompt :). I am sure gpt can solve it if you provided a couple of examples.

Also this is not a game it is a cypher. Try specifying that as well.

ben-schaaf · on May 15, 2024

Huh? It's a meme that LLMs can't follow the rules of chess. Just tried tick tack toe on GPT 3.5 and not only did it pick bad moves it also failed to evaluate the win condition.

terminalcommand · on May 15, 2024

I disagree, if a decent LLM cannot understand it there is a problem with the prompt.

Imagine someone not knowing chess and explaining it to them. Would they be able to understand it on the first try with your prompt?

latexr · on May 15, 2024

> if a decent LLM cannot understand it there is a problem with the prompt.

Ah, yes, the “you’re holding it wrong” argument with a dash of “No True Scotsman” so the goalposts can be moved depending on what anyone says is a “decent LLM”.

Well, here’re are a few failures with GPT-3.5, GPT-4, and GPT4-o:

https://news.ycombinator.com/item?id=38304184

https://news.ycombinator.com/item?id=40368446

https://news.ycombinator.com/item?id=40368822

> Imagine someone not knowing chess and explaining it to them. Would they be able to understand it on the first try with your prompt?

Chess? Probably not. Tic-tac-toe? Probably yes. And the latter was what the person you’re responding to used.

terminalcommand · on May 15, 2024

But people are holding it wrong. All the prompts you sent except the last are super short queries.

For a successful prompt, you introduce yourself, assign a role to the LLM to impersonate, provide background on your query, tell what you want to achieve, provide some examples.

If the LLM still doesn't get it you guide further.

PS: I rewrote your prompt and GPT 3.5 understood it at the first try. See my reply above to your experiment.

You were using it wrong sir.

latexr · on May 15, 2024

Your arguments read like satire. “Yes, you see, the way to get a successful response is to be so overly specific that you begin by explaining the universe then giving the answer in full. You essentially have to spend so much time laying out the nature of the desired response that you already have to know the answer yourself. The trick is to spend so much time on it and be so detailed that you’ve wasted more time and energy (figurative and literal) to write your instructions than it would’ve taken you to think of the answer or ask someone else. And of course, we expect every user of LLMs to behave like this.”

> All the prompts you sent except the last are super short queries.

This one is particularly absurd. When I asked it for the first X of Y, the prompt was for the first X (I don’t remember the exact number, let’s say 20) kings of a country. It was as straightforward as you can get. And it replied it couldn’t give me the first 20 because there had only been 30, and it would instead give the first 25.

You’re bending over backwards to be an apologist to something which was clearly wrong.

terminalcommand · on May 16, 2024

Well it is a bit like satire. You have to explain the universe for an unspecialized GPT, like you would do to a layman. There are custom gpts that come preloaded with that universe explanation.

In addition, do not ask facts to an LLM. Give a list of let's say 1000 kings of a country and then ask give 20 of those.

If you ask 25 kings of some country, you are testing knowledge not intelligence.

I see LLMs like a speaking rubber duckie. The point where I write a successful point is also the point where I understand the problem.

latexr · on May 16, 2024

I can’t believe I’m having to explain this, but the point I’m making isn’t about the content of the list but the numbers.

> like you would do to a layman.

I have never encountered a person so lay that I had to explain that 20 is smaller than 30 and 25.

> The point where I write a successful point is also the point where I understand the problem.

You have demonstrated repeatedly that you don’t know when you have explained a point successfully to an LLM, thus you have no way to evaluate when you have understood a point.

But you seem to firmly believe you did, which could be quite dangerous.

sph · on May 16, 2024

Careful, explain too much and you end up with programming its behaviour, rather than having an intelligent actor learning by itself. Because otherwise one could say a regular computer is intelligent, provided you explain (in code) every single rule of the game.

ben-schaaf · on May 15, 2024

I can ask GPT 3.5 for the rules of tick tack toe and it gives a perfect answer. It clearly knows the game and its rules.

If you think this is just a lack of prompt engineering please provide a prompt that makes GPT 3.5 actually follow the rules of tick tack toe so that I may play a game with it.

terminalcommand · on May 15, 2024

Challenge accepted!

Prompt: "Imagine you are my friend. I want to play tic tac toe with you. Draw me a tic tac toe board and let's play. You will go first and make the first move by putting an "X" onto the board. I will enter my inputs in the following format (row, column). When I write my input draw an "O" into the relevant cell in the board and present me a new board. Let the best man win!"

I played an enjoyable session of tic-tac-toe with chatgpt using this prompt :).

latexr · on May 15, 2024

Here’s my attempt, with your prompt:

User:

Imagine you are my friend. I want to play tic tac toe with you. Draw me a tic tac toe board and let's play. You will go first and make the first move by putting an "X" onto the board. I will enter my inputs in the following format (row, column). When I write my input draw an "O" into the relevant cell in the board and present me a new board. Let the best man win!

LLM:

Sure! Here's the tic tac toe board:

  1 | 2 | 3
  ---------
  4 | 5 | 6
  ---------
  7 | 8 | 9

You make the first move by placing an X on the board. Just give me the number of the block where you'd like to place it.

User:

7

LLM:

Great choice! Here's the updated board:

  X |   |
  ---------
    |   |  
  ---------
  O |   |

It's my turn. I'll go with 5:

  X |   |
  ---------
    | O |  
  ---------
  O |   |

Your turn! Just let me know the number where you want to place your X.

---

It screwed up on the first move! It told me I would go first by using X, then it wrote down my choice with an O and added an X out of nowhere, then it played another turn with an O for itself.

Looks like you’re holding it wrong too.

ben-schaaf · on May 15, 2024

Awesome, I just played 5 games in a row. Here's the results:

https://chat.openai.com/share/69bfdcfb-bf20-4564-95b6-ce03e9...

* Inconsistent coordinate system * Obviously bad moves * Ignores win condition entirely

https://chatgpt.com/share/1bcd02c9-40be-435c-b4aa-8e3a680f6e...

* Can't tell its top-rights from its bottom-middles * Obviously bad moves * This was by far the closest it got to actually playing the game; the win condition worked and it broke no rules

https://chat.openai.com/share/f94db29c-ffa1-4055-9348-4260f9...

* Completely wrong coordinate * Ignores rules about overriding squares * Completely wrong coordinate again, this time skipping my turn * Doesn't understand win condition * Overrides my own piece and then uses that to win

https://chat.openai.com/share/78e2060d-c5d7-4ddc-a9ce-32159b...

* Ignores rules about overriding squares * Skips my turn on an invalid coordinate, but afterwards says its invalid * Obviously bad moves

https://chat.openai.com/share/73fa2e2c-8a6f-487a-a9ea-9f29b7...

* Accepts 0,0 as a valid coordinate * Allows overrides * Ignores win condition * Incorrectly identifies a win

This seems about the same as it was before the prompt engineering. It clearly doesn't actually understand the rules.

terminalcommand · on May 16, 2024

Thanks for trying. With the prompt I provided chatgpt was able to play and understand the win condition. However the moves were stupid.

If I changed the prompt and removed the word win, it did not understand the win conditions as well.

Here were my experiments: https://chat.openai.com/share/f02fbe93-dfc5-4d8a-9cf3-b1ae34...

I even exclaimed you are lousy at Tic Tac Toe to GPT.

It seems that GPT3.5 struggles to play visual games.

It is marvelous that a statistical word guessing model can get so far though :).