This is probably a semantics problem. You’re right. The models don’t know how to mcp. The harness they run in does though (Claude code, Claude desktop, etc), and dynamically exposes mcp tools as tool calls.
HN loves inventing semantics problems around AI. It's gotten really, really annoying and I'm not sure the people doing it are even close to understanding it.
That is an easily falsifiable statement. If I ask ChatGPT or Claude what MCP is Model Context Protocol comes up, and furthermore it can clearly explain what MCP does. That seems unlikely to be a coincidental hallucination.
Both ChatGPT and Claude will perform web searches when you ask them a question, which the fact that you got this confused is ironically topical.
But you're still misunderstanding the principle point because at some point these models will undoubtedly have access to that data and be trained on it.
But they didn't need to be, because LLM function & tool calling is already trained on these models and MCP does not augment this functionality in any way.
> But they didn't need to be, because LLM function & tool calling is already trained on these models and MCP does not augment this functionality in any way.
I think you're making a weird semantic argument. How is MCP use not a tool call?
OP is saying that the models have not been trained on particular MCP use, which is why MCP servers serve up tool descriptions, which are fed to the LLM just like any other text -- that is, these descriptions consume tokens and take up precious context.
Here's a representative example, taken from a real world need I had a week ago. I want to port a code base from one language to another (ReasonML to TypeScript, for various reasons). I figure the best way to go about this would be to topologically sort the files by their dependencies, so I can start with porting files with absolutely zero imports, then port files where the only dependencies are on files I've already ported, and so on. Let's suppose I want to use Claude Code to help with this, just to make the choice of agent concrete.
How should I go about this?
The overhead of the MCP approach would be analogous to trying to cram all of the relevant files into the context, and asking Claude to sort them. Even if the context window is sufficient, that doesn't matter because I don't want Claude to "try its best" to give me the topological sort straight from its nondeterministic LLM "head".
So what did I do?
I gave it enough information about how to consult build metadata files to derive the dependency graph, and then had it write a Python script. The LLM is already trained on a massive corpus of Python code, so there's no need to spoon feed it "here's such and such standard library function", or "here's the basic Python syntax", etc -- it already "knows" that. No MCP tool descriptions required.
And then Claude code spits out a script that, yes, I could have written myself, but it does it in maybe 1 minute total of my time. I can skim the script and make sure that it does exactly what it should be doing. Given that this is code, and not nondeterministic wishy washy LLM "reasoning", I know that the result is both deterministic and correct. The total token usage is tiny.
If you have the LLM write code to interface with the world, it can leverage its training in that code, and the code itself will do what code does (precisely what it was configured to do), and the only tokens consumed will be the final result.
MCP is incredibly wasteful and provides more opportunities for LLMs to make mistakes and/or get confused.
I think most people, even most devs, don't actually know how crappy an MCP client is built, and that it's essentially an MITM approach and that the client sends the LLM on the other end a crappy pretext of what tools are mounted and how to call their methods in a JSON, and then tries to intelligently guess what response was a tool call.
And that intelligent guess is where it gets interesting for pentesting, because you cannot guess anything failsafe.
What whoknowsidont is trying to say (IIUC): the models aren't trained on particular MCP use. Yes, the models "know" what MCP is. But the point is that they don't necessarily have MCP details baked in -- if they did, there would be no point in having MCP support serving prompts / tool descriptions.
Well, arguably descriptions could be beneficial for interfaces that let you interactively test MCP tools, but that's certainly not the main reason. The main reason is that the models need to be informed about what the MCP server provides, and how to use it (where "how to use it" in this context means "what is the schema and intent behind the specific inputs/outputs" -- tool calls are baked into the training, and the OpenAI docs give a good example: https://platform.openai.com/docs/guides/function-calling).
Most likely! It's hard to qualify which specific models and version I'm talking about because they're constantly being updated.
But the point is that function & tool calling was already built in. If you take a model from before "MCP" was even referenced on the web it will still _PERFECTLY_ interact with not only other MCP servers and clients but any other API as well.
No. It is not. Please understand what the LLM's are doing. Claude nor ChatGPT nor any major model knows what MCP is.
They know how to function & tool call. They have zero trained data on MCP.
That is a factual statement, not an opinion.