> Respectfully, I think we cracked basic intelligence. What do you imagine under basic intelligence?
It all depends on your definition of intelligence. Mine is the ability to solve novel problems.
AI is unable to solve novel problems, only things it has been trained against. AI is not intelligent, unless you change the very definition of the word.
I challenge you to imagine an imaginary game or computer language, explain the rules to the LLM. It will learn and play the game (or write programs in your invented language), although you imagined it. There was no resource to train on. Nobody knows of that game or language. LLM learns on the spot with your instructions and plays the game.
I cannot understand grad school level mathematics even if you give me all the books and papers in the world. I was not formally trained in mathematics, does that make me not intelligent?
"Ability to solve novel problems" does not mean ability to solve all problems, nor to have all the knowledge in the world. A caveman can be as intelligent as you and I, even without being able to read.
Not having knowledge in mathematics has no impact on your intelligence quotient.
I agree. But LLMs do solve novel problems in that case, you just have to explain it to them like you would have done to an intelligent caveman or me.
Which novel problem can't an LLM solve? I gave the example of an imaginary game because the LLM cannot have possibly trained on it as it is imagined by one person and nobody knows how to play it.
I tried the first thing which came to mind, with GPT-4o. The problem is probably not novel, yet it failed anyway.
---
User:
Let’s play a game. Every time I give you a word, you separate it into its individual characters and assign a number to each letter, in order, following the Latin alphabet. So "a" is "1", "b" is "2", and so forth. You will then subtract 2 from each number, convert it back to the corresponding letter and return the new word. Did you get all that? Are you ready to start with my next word?
LLM:
Got it! I'm ready to start. What's your first word?
User:
anthropomorfic
LLM:
Sure! Let's break down "anthropomorfic".
1. Assign numbers to each letter:
a (1), n (14), t (20), h (8), r (18), o (15), p (16), o (15), m (13), o (15), r (18), f (6), i (9), c (3)
2. Subtract 2 from each number:
a (1-2=-1: Consider Zuzzer, i.e., `25`), n (14-2=12), t (20-2=18), h (8-2=6), r (18-2=16), o (15-2=13), p (16-2=14), o (15-2=13), m (13-2=11), o (15-2=13), r (18-2=16), f (6-2=4), i (9-2=7), c (3-2=1)
3. Convert them back to letters:
y, l, r, f, p, m, n, m, k, m, p, d, g, y
The new word is "ylrfpmnmkmpdgy".
---
Looks pretty good, right? It even found a good wraparound solution for “a” (I wasn’t trying to trick it, I just didn’t think about the word too hard), though I have no clue what it means by “Consider Zuzzer”. But then you look at the last one, where it converted “c” to “y”. That’s absolutely wrong and is the easiest one any human would guess. Yet you could miss it because all the previous ones were right.
Here’s a bonus. When verifying the answer, I queried Brave search for “letters of the alphabet with numbers assigned”. The AI-generated answer at the top was: A=1, B=2, C=3, D=4, E=5, F=6, G=7, H=8, I=9, J=1, K=2, L=3, M=4, N=5, O=6, P=7, Q=8, R=9, S=1, T=2, U=3, V=4, W=5, X=6, Y=7, Z=8. Which, alright, I can see the logic, but that’s not what anyone would want by asking that question.
For argument's sake I rewrote your prompt. I think gpt 3.5 understood it on the first try:
Prompt:
Let’s play a cypher game.
The rules of the game are as follows:
1. Imagine a numbering system where each letter of the English alphabet corresponds to a number starting with 1. For example, a corresponds to 1, b corresponds to 2, c corresponds to three.
2. When you receive a word from me, break it into individual characters.
3. Convert each character into a list of numbers by following rule #1.
4. Subtract two from each number of the list and generate a new list, if the result is lesser than 0, add 26.
5. Convert each number in the new list into a letter by following rule #1 and present me your result.
Are you ready for the first word?
-------
GPT: I'm ready! Go ahead and give me the word.
Me: anthropomorfic
GPT: Sure, let's start:
a -> 1 -> 1 - 2 = 25 (z)
n -> 14 -> 14 - 2 = 12 (l)
t -> 20 -> 20 - 2 = 18 (r)
h -> 8 -> 8 - 2 = 6 (f)
r -> 18 -> 18 - 2 = 16 (p)
o -> 15 -> 15 - 2 = 13 (m)
p -> 16 -> 16 - 2 = 14 (n)
o -> 15 -> 15 - 2 = 13 (m)
m -> 13 -> 13 - 2 = 11 (k)
o -> 15 -> 15 - 2 = 13 (m)
r -> 18 -> 18 - 2 = 16 (p)
f -> 6 -> 6 - 2 = 4 (d)
i -> 9 -> 9 - 2 = 7 (g)
c -> 3 -> 3 - 2 = 1 (a)
So, "anthropomorfic" becomes "zlrfpmmnpmkpdmga". Your turn!
Indeed it is. That part is a huge mess. The LLM gave the wrong number when matching with the letter, but the letter itself is correct considering the instructions, except the instructions are wrong (ironic considering this came from someone claiming everyone is doing it wrong). Step four says:
> Subtract two from each number of the list and generate a new list, if the result is lesser than 0, add 26.
It should be “if the result is less than 1”, not “0”.
Well I told you I was bad at math. I wrote a faulty prompt. But chatgpt understood my instructions perfectly. That was the argument I was trying to demonstrate.
The problem lied between the chair and the computer.
> But chatgpt understood my instructions perfectly.
No, it did not, because it still assigned Z to 25, which is wrong.
> We have to learn how to use LLMs.
You have to learn that LLMs aren’t magical and will get things wrong no matter how much context you give them. And that the suggestions you’re making are absurd to the point of making them useless.
By your logic, any system which spews random strings is intelligent because sometimes it’s randomness coincidentally aligns with the input you give it.
we use words like reason and understand and think, and try to apply them to LLMs, when it's just dot products and matrix multiplications at the heart of it, which is where we go wrong. these things are a new alien kind of intelligence, and we're struggling with it because it's completely foreign. it's more than a random coincidence. your logic was that, in your run of the same query, because it made a mistake on the last letter, which is a mistake an inattentive teenager or drunk adult could easily make, we can't consider it intelligent.
we're not talking about any system here, we're talking about LLMs and their ability to generate random coincidental text that does happen to align with the input given. when the output, coincidental and random as it may well be, is aligned with the input in a way that resembles intelligence, we do have to ponder not just what intelligence actually is, but also what it means to be intelligent. octopuses are intelligent but they're not able to solve your particular puzzle.
Huh? It's a meme that LLMs can't follow the rules of chess. Just tried tick tack toe on GPT 3.5 and not only did it pick bad moves it also failed to evaluate the win condition.
> if a decent LLM cannot understand it there is a problem with the prompt.
Ah, yes, the “you’re holding it wrong” argument with a dash of “No True Scotsman” so the goalposts can be moved depending on what anyone says is a “decent LLM”.
Well, here’re are a few failures with GPT-3.5, GPT-4, and GPT4-o:
But people are holding it wrong. All the prompts you sent except the last are super short queries.
For a successful prompt, you introduce yourself, assign a role to the LLM to impersonate, provide background on your query, tell what you want to achieve, provide some examples.
If the LLM still doesn't get it you guide further.
PS: I rewrote your prompt and GPT 3.5 understood it at the first try. See my reply above to your experiment.
Your arguments read like satire. “Yes, you see, the way to get a successful response is to be so overly specific that you begin by explaining the universe then giving the answer in full. You essentially have to spend so much time laying out the nature of the desired response that you already have to know the answer yourself. The trick is to spend so much time on it and be so detailed that you’ve wasted more time and energy (figurative and literal) to write your instructions than it would’ve taken you to think of the answer or ask someone else. And of course, we expect every user of LLMs to behave like this.”
> All the prompts you sent except the last are super short queries.
This one is particularly absurd. When I asked it for the first X of Y, the prompt was for the first X (I don’t remember the exact number, let’s say 20) kings of a country. It was as straightforward as you can get. And it replied it couldn’t give me the first 20 because there had only been 30, and it would instead give the first 25.
You’re bending over backwards to be an apologist to something which was clearly wrong.
Well it is a bit like satire. You have to explain the universe for an unspecialized GPT, like you would do to a layman. There are custom gpts that come preloaded with that universe explanation.
In addition, do not ask facts to an LLM. Give a list of let's say 1000 kings of a country and then ask give 20 of those.
If you ask 25 kings of some country, you are testing knowledge not intelligence.
I see LLMs like a speaking rubber duckie. The point where I write a successful point is also the point where I understand the problem.
I can’t believe I’m having to explain this, but the point I’m making isn’t about the content of the list but the numbers.
> like you would do to a layman.
I have never encountered a person so lay that I had to explain that 20 is smaller than 30 and 25.
> The point where I write a successful point is also the point where I understand the problem.
You have demonstrated repeatedly that you don’t know when you have explained a point successfully to an LLM, thus you have no way to evaluate when you have understood a point.
But you seem to firmly believe you did, which could be quite dangerous.
Careful, explain too much and you end up with programming its behaviour, rather than having an intelligent actor learning by itself. Because otherwise one could say a regular computer is intelligent, provided you explain (in code) every single rule of the game.
I can ask GPT 3.5 for the rules of tick tack toe and it gives a perfect answer. It clearly knows the game and its rules.
If you think this is just a lack of prompt engineering please provide a prompt that makes GPT 3.5 actually follow the rules of tick tack toe so that I may play a game with it.
Prompt:
"Imagine you are my friend. I want to play tic tac toe with you. Draw me a tic tac toe board and let's play. You will go first and make the first move by putting an "X" onto the board. I will enter my inputs in the following format (row, column). When I write my input draw an "O" into the relevant cell in the board and present me a new board. Let the best man win!"
I played an enjoyable session of tic-tac-toe with chatgpt using this prompt :).
Imagine you are my friend. I want to play tic tac toe with you. Draw me a tic tac toe board and let's play. You will go first and make the first move by putting an "X" onto the board. I will enter my inputs in the following format (row, column). When I write my input draw an "O" into the relevant cell in the board and present me a new board. Let the best man win!
LLM:
Sure! Here's the tic tac toe board:
1 | 2 | 3
---------
4 | 5 | 6
---------
7 | 8 | 9
You make the first move by placing an X on the board. Just give me the number of the block where you'd like to place it.
User:
7
LLM:
Great choice! Here's the updated board:
X | |
---------
| |
---------
O | |
It's my turn. I'll go with 5:
X | |
---------
| O |
---------
O | |
Your turn! Just let me know the number where you want to place your X.
---
It screwed up on the first move! It told me I would go first by using X, then it wrote down my choice with an O and added an X out of nowhere, then it played another turn with an O for itself.
* Can't tell its top-rights from its bottom-middles
* Obviously bad moves
* This was by far the closest it got to actually playing the game; the win condition worked and it broke no rules
* Completely wrong coordinate
* Ignores rules about overriding squares
* Completely wrong coordinate again, this time skipping my turn
* Doesn't understand win condition
* Overrides my own piece and then uses that to win
It all depends on your definition of intelligence. Mine is the ability to solve novel problems.
AI is unable to solve novel problems, only things it has been trained against. AI is not intelligent, unless you change the very definition of the word.