This reminds me the article above. Now people have diverse ideas on agentic coding. Some suggest human-in-the-loop while others suggest giving a detailed specification and let the agent run freely; some suggest leveraging LLM's high productivity and here we get an opinion that LLM can actually slowly write good code.
It's happy to see opinions that are more practical and variant emerging, turning LLM into literally a tool instead of something to be hated or hyped.
In my own practice, I find LLMs (SOTA ones) good at medium-level tasks, those needed to reason and plan for a while. However, the design taste on architecture is unexpectedly disgusting. Sometimes writing interfaces myself and asking LLMs to fill in implementations, alongside context-completing tools like context7, deepwiki, docs.rs MCPs, etc. and giving a escape hatch (e.g. encouraging it to use the AskUser tool in Claude Code), may be considered my best practice.
> The kid who is right now learning to code by chatting with an agent is not a worse programmer than I was at 12, hunched over Learning Perl, retyping examples that would not run because I missed a semicolon.
To be honest, I'm 17 y.o., I'm coding by chatting with an agent, but it seems like we can't tell the distinction too absolutely.
At the first time writing a React app, I forgot to name a file with a .tsx extension and I used .ts instead, then spotting ugly error lines across my JSX syntax, confusing and sharing with my friend, and laughing this little funny thing all the day.
I once spent the whole afternoon choosing a js linter, reading their docs and perceiving different tastes. In my early twelve-ties (uh this sounds funny too) I'm always arrested by configuring Windows PEs, installing different Linux distributions on my PC, etc. Today I still read tech books, alongside videos, articles and also chatbots. Chatbot is a new tool, but there's no doubt it cannot replace other media types and what they bring to us/me.
What may I express is that a natural interest in programming or computer things cannot really be overwhelmed by LLM things. I don't know how to use vim skillfully since I majorly used Windows at my early age and I'm not familiar with vim's logic, but this practically doesn't stop anything. I still found Linux's fantasy, at last. And same for LLMs.
It's not the first time we hear about prompt injection attacks, and for sure it's the fault of Microsoft. Many talking about the prompt injection itself, whether Copilot should be able to defense prompt injections, etc. But that's not the problem.
IMO the real vulnerability is located at the "Act" part of "ReAct" (reasoning and action) agent framework.
> “[Copilot] Cowork asks for your permission before taking sensitive actions...” ... when the recipient is the active user, these actions execute immediately without requiring human approval (users do not have a setting to modify this behavior).
> Copilot Cowork can retrieve ‘pre-authenticated download links’ for files the user has access to, which allow anyone who opens the link to download that file.
> Microsoft Copilot Cowork has read access to essentially any resource a user does through Microsoft Graph. As such, the primary mechanism to reduce the blast radius of attacks like this is to restrict excessive permissioning across one’s Microsoft ecosystem.
Take it easy. Inside the whole attack flow, Microsoft gives Cowork unrestricted access and the ability to bypass approvals. I don't find much problem with LLMs here. It's said the attack is also a threat for Opus 4.7, but I've found several times Opus 4.7 forbidding context7.com's "prompt injections" only requiring opus to ask me creating an context7 API key to get more requests for free. From my personal experience, such models indeed are trained to perceive injections, but these injections could mask themselves as sth like Agent Skills, and there are always ways to win as red teams.
We may not lay our hope too much on defense of injections, but concentrating on restricting LLM's permissions. The popular usage of CLIs in agents' (especially coding agents) workflow has also concerned me since most cli tools an agent can access actually have the same permissions with users.
“IMO the real vulnerability is located at the "Act" part of "ReAct" (reasoning and action) agent framework.”
This is a fancy way of saying that “the problem is tool calling”, which is obviously true. The problem is that, when it works correctly (99.99% of the time), it adds so much more value to LLMs.
Sandboxing is a step in the right direction, but can also add friction.
Using guardrails is also good, but adds latency, expenses, and also doesn’t solve 100% of the issues.
IMHO there currently does not exist a proper solution to this problem, and it has yet to be discovered. The proper solution, however, should NOT be based on LLMs, so guardrails are the incorrect direction (albeit effective and easier to implement).
By using "ReAct", I just wanted to emphasize the "agentic" perspective of tool calling, which makes tool calling facing the real world and at risk sometimes. So I'm not downplaying the significance of tool callings.
Yes I'm a builder of an agent infra on PCs, so I can completely sense that the protective measures are weak and inadequate, sometimes seeming like an unsolvable problem. But according to the article, what Microsoft did was hard to tell in a polite way. If they had even a little security awareness, I could completely understand, but it's like they've vibe coded the entire permissions system of Cowork.
Ultimately it all sounds like variations of “don’t blame the tool for situations the tool enables,” which has never been particularly convincing as an argument if you ask me.
The problem is natural language as a medium. It is too ambiguous and has way too many variants to say literally anything imaginable that there is no way of protecting against prompt injection without some kind of NLP filter or something. I don't really see how someone can develop a kind of protection against this given these problems.
Yes. It assumes author of the macro guarantees the safety. Common cases are not adding unsafe{} and leaving this to user, relying on audit tools or [highlighters](https://lukaswirth.dev/posts/semantic-unsafe/), etc. However, it's indeed allowed to silently add unsafe blocks in macros. I'm not working on rust frequently btw, mistakes may exist.
though said for education purpose, keep finding these boundary-pushings playful. I can recall early days arrested by "several ways to access private members in C++" lol
I personally hate access controls in general since it always made be release a big sigh as a I was typing .getClass().getMethod()/getField() knowing that it hurts performance.
That kind of code doesn't have to hurt performance, as long as monomorphization, inlining or JITting are available to the toolchain. If every single method access is a virtual-table call, then yes, there's an "unnecessary" cost. But you shouldn't be writing high-level looking code in such a language if you care about that level of performance.
it's more about the fact that the servers are java and invoking a reflection method does have a non-zero cost that isn't substantial but still makes you sigh as you either eat the performance cost or spend 10 minutes creating a patch and recompiling the server.
This remind me of [jj megamerge](https://isaaccorbrey.com/notes/jujutsu-megamerges-for-fun-an...). jj allows concentrating on developing while leaving things for vcs alone, as well as solving vcs things (conflicts) at very beginning (megamerge). Really good.
> just like we don’t read assembly, or bytecode, or transpiled JavaScript
This makes sense since certain higher-level code produces certain lower-level code, while LLM cannot. If the transpired JS code doesn't work we could just find out the bug in minifiers, etc. but one cannot figure out why LLM fails at one task, especially considering LLMs, even SOTA ones, could be strongly affected by even small prompt changes. Taking this into consideration, I don't think this is a sound reasoning why we don't need to review ai-generated code.
> The LLMs produce non-deterministic output and generate code much faster than we can read it, so we can’t seriously expect to effectively review, understand, and approve every diff anymore.
Exactly. However, this could also indicate a weaker review standard instead of just dropping review. We could also suggest an idea where devs mainly review code design or interfaces, leveraging one's *taste*, while leaving strict logic reasoning, validating and testing to other tools or approaches. It cannot pursuade me that the nature of LLM's code generation must lead to a complete cancel of the code review.
Anyway, I'm not opposing this article and its thought of shift in the future is really good.
Couldn't we slowly add guardrails that eventually lead to code generation becoming more and more deterministic over time?
I'm seeing in my experience that Claude has become better with every version at producing uniformity in its code output. Especially where the architecture is clear and documented. And even more so in languages with built in uniformity (Go, HTMX, SQL) where there is intentionally only one or two ways of doing things. In such environments, the output is nearly deterministic.
I once thought about this and found that n-shots makes greater influences on LLMs. In other words, in a repo with good code quality and architecture (which offers good n-shots) and on a task with clear instructions and goals, LLM's output seems reliable enough, which meets your opinion. And n-shots is always better than relying on instruction following, instruction following mentioned in the article ("specifications") as an approach facing LLM's productivity, so imo the idea you suggested is another probability against/comparing with the article as well.
reply