Hacker Newsnew | past | comments | ask | show | jobs | submit | Aditya_Garg's commentslogin

yes its ridiculously good at stuff like that now. I dare you to try and trick it.


what bothers me is not that this issue will certainly disappear now that it has been identified, but that that we have yet to identify the category of these "stupid" bugs ...

We already know exactly what causes these bugs. They are not a fundamental problem of LLMs, they are a problem of tokenizers. The actual model simply doesn't get to see the same text that you see. It can only infer this stuff from related info it was trained on. It's as if someone asked you how many 1s there are in the binary representation of this text. You'd also need to convert it first to think it through, or use some external tool, even though your computer never saw anything else.

> It's as if someone asked you how many 1s there are in the binary representation of this text.

I'm actually kinda pleased with how close I guessed! I estimated 4 set bits per character, which with 491 characters in your post (including spaces) comes to 1964.

Then I ran your message through a program to get the actual number, and turns out it has 1800 exactly.


>I estimated 4 set bits per character, which with 491 characters in your post (including spaces) comes to 1964

And that's exactly the kind of reasoning an LLM does when you ask it about characters in a word. It doesn't come from the word, it comes from other heuristics it picked up during training.


Okay but, genuinely not an expert on the latest with LLMs, but isn’t tokenization an inherent part of LLM construction? Kind of like support vectors in SVMs, or nodes in neural networks? Once we remove tokenization from the equation, aren’t we no longer talking about LLMs?

It's not a side effect of tokenization per se, but of the tokenizers people use in actual practice. If somebody really wanted an LLM that can flawlessly count letters in words, they could train one with a naive tokenizer (like just ascii characters). But the resulting model would be very bad (for its size) at language or reasoning tasks.

Basically it's an engineering tradeoff. There is more demand for LLMs that can solve open math problems, but can't count the Rs in strawberry, than there is for models that can count letters but are bad at everything else.


I’ve said this many times before

AI is just a tool

If you used a fancy auto bake cake machine instead of an oven, you still get to claim that you made the cake.

100 years ago someone would be making the claim that using an oven to make cakes “doesn’t count”

All AI did was raise the bar

It’s quite clear here that the author spent a lot of time on this so he absolutely gets credit as the author


I think there's a distinction.

Imagine if you had an auto cake making machine that decides on its own the best time to make cake. It adds the ingredients, stirs, turns the oven on, and leaves the finished cake on the counter for you.

People start opening bakeries consisting entirely of cakes baked by the automatic machines. The owners of these machines have no idea whether the cakes have a bit too much flour or were slightly over-stirred. In some cases, they haven't even tried the cakes.

Who gets to claim they made the cake?

By contrast, there are others who carefully tune their machines to make sure everything is perfect. They adjust the mixing settings and ingredient proportions. They experiment and iterate. They taste test throughout the process. And what they give to the public tastes every bit as good as a homemade cake.

The first group is creating slop. The second group, I think, is baking. And OP is in the second group.


Replace "oven" with a dish washer or a washing machine for your clothes. Those things do exactly all of this. Yet we still complain about washing clothes and doing the dishes, even though it is far less effort than anything our parents did, or their parents before them.

If you commission a baker to bake you a cake, did you make the cake? What if you added sprinkles on top?

If you commission a baker, another person, with wants and desires of their own, is involved.

If you use an AI, there isn't.

Either way, it's clear that the author (yes, the author) put a lot of work into this by iterating and shaping it to what he wanted, and that's a lot more than sprinkles.


> If you commission a baker, another person, with wants and desires of their own, is involved.

> If you use an AI, there isn't.

What is the functional difference here? You are commissioning (see: prompting) someone (see: an AI) for a piece of work, or artwork or whatever. The output is out of your control; and I don't think the existence or lack thereof of a human on the other end materially matters.

If we had hyper-advanced ovens from The Jetsons where we could type a prompt using a fold-out keyboard and it would magically generate whatever cake we ask of it: did we or did we not bake that cake? And I do not think it is clear the author put a lot of work iterating and shaping it into what he wanted; we have zero insight into that.


I didn't say the difference was functional. If you don't think the presence of a human on the other end matters (materially or not), feel free to continue this conversation with an LLM simulation of me. You can even prompt it so that you logically triumph and convince "me".

I'm asking you to explain what the actual difference is and you're avoiding the question.

If we had a complete black box where you submitted Prompt and out came Thing, and you had zero clue what said black box actually did, could you claim creation over Thing? What does knowing that it's a human vs LLM make materially different in terms of whether or not you created it?


And I - or did I turn this thread over to an LLM already? - am asking you a question in return, whose answer should give you the answer you want.

No please, I also agree with parent poster. Talk to the LLM, cause the human ain't listening.

Eh.

Why would I give him the same credit I would give a writer.

Or why would I give a writer the same credit I would give someone who created the AI prompts and scaffolding to generate this?

Being unhappy about not being able to call oneself an author, ends up betraying a lack of confidence in the work or process.

In the end writer, dancer, actor, whatever - these titles come from their impact.

There will be a different name for this, and eventually there will be something made that is good enough that people will be spell bound. At which point its going to be named something else.

At which point.


Ironically, the story can be read as gesturing in that direction, as it's ostensibly about giving a new title to a particular job.

In general, though, I think part of the mistake people keep making is that they try to imitate what would be value to engage with if a human wrote it, in an attempt to claim the role of an author of a book or whatever. There's likely artforms that are unique to what an LLM can facilitate, but trying to imitate human artforms is going to give you stunted results. The AI is very good at imitating the form but not the substance.

Once we stop trying to generate and pass off AI essays, novels, choose your own adventure stories, and all the other human genres as being human writing, we'll have a chance to figure out actually interesting artistic forms.


Yes. In the end what mattered truly was the expression.

However - since we are humans - we also care about the artist.

Creating something without the effort previous works involved, can and do affect the context and understanding of it.

Hah - just thought of one good example: how would people feel about talking to only fans creators, if they didn’t know it was AI.


> Creating something without the effort previous works involved, can and do affect the context and understanding of it

not really. Unless you place value on _effort_, rather than be objectively outcome based. Someone digging a hole with a spoon doesn't make it a better hole than a jackhammer.

I maintain that the work itself - that is, the contents of what is being expressed - is the sole judgement of how good the works is. Not the authorship, LLM-usage or otherwise.


Eh, by that same argument, how would LLMs fair when the content of the work itself is about “Something made by a human”.

A core fact about information, is that signal only exists in the right context.

As an illustrative example: A string of static or gibberish numbers converts to signal when we have the right tools to interpret it.

You could see a bunch of rocks arranged on a beach, while someone who understands the local language may see an SOS.

Culture itself keeps evolving, and teenagers reuse language to create jargon that makes sense to them, but is opaque to others.

I am arguing that your point is true, but its phrasing focuses on the Platonic ideal, and avoiding the messy practical context of communication.


The context exists whether it's LLM generated or not, because the context sits broadly in society, culture, and manifests in the mind of the reader.

> how would LLMs fair when the content of the work itself is about “Something made by a human”.

it would fair just as well as if the same words had been written by a human, provided the contents are sound and has good meaning - conversely, slop is slop, regardless if it was written by an LLM or human.

My point at the grandparent post is that there's a lot of blind discrimination on the origin of a works - if it was written by or with the help of LLM, then it automatically deserves less attention, and/or its content's worth diminished. All without actually discussing the content.


losers, clueless never had to be productive, just scapegoats. But now losers dont get that buffer window to try and become sociopaths, they just dont get hired at all.


But clueless need losers to exist, so as a second order effect, they lose as well.


Wild stuff and great read

Do you think karpathy's autoresearch would be useful here?


Based on Karpathy’s writeup the auto research would not have found this. He tells the agent to improve the model and training loop with a five minute time limit, but honestly this “hack” is so far out of distribution that it seems really unlikely an agent would find this.


Adding, swapping, or duplicating layers has a long history (eg. StyleGAN, upcycling), and it was pointed out at least as far back as He et al 2015 (Resnets) that you could ablate or add more layers because they functioned more as just doing some incremental compute iteratively, and many of them were optional. (Or consider Universal Transformers or heck, just how BPTT works.) So this idea is not far out of distribution, if at all, especially if you're a LLM who knows the literature and past approaches (which most humans would not because they only just got into this area post-ChatGPT).


I don’t disagree, but it’s worth having a look at the changes the LLM did apply.

https://github.com/karpathy/autoresearch/blob/master/progres...

My opinion is you’d have to go pretty far down the x axis to get to anything that’s not things like tinkering with bs, lr, or positional encodings. There are so many hyperparameter knobs already exposed that duplicating layers is unlikely to be proposed for a long time.

I also just noticed that the last change it applied was changing the random seed. Lol.


My understanding was that Autoresearch was defined as training from scratch (since it's based on the nanogpt speedrun), not using any pretrained models. So it couldn't do anything like upcycling a pretrained model or the Frankenmerge, because it's not given any access to such a thing in the first place. (If it could, the speedrun would be pointless as it would mostly benchmark what is the fastest fileserver you can download a highly compressed pretrained model checkpoint from...) It can increase the number of layers for a new architecture+run, but that's not the same thing.


This is so cute


This is a common misconception

OpenAI and others are already profitable on inference (inference is really really cheap)

They are just heavily investing into the latest frontier

The biggest risk is whether they can stay cutting edge, or if open source or others will catch up quickly.


> OpenAI and others are already profitable on inference (inference is really really cheap)

If it's that cheap I'll soon be doing it self-hosted, or switching to a local provider.

It's a race to the bottom for tokens-providers.


It is that cheap. Look at Deepseek or GLM pricing.


> It is that cheap. Look at Deepseek or GLM pricing.

Then it's a race to the bottom.


Yep.

And unlike competitors, OpenAI has no ecosystem. Just a website and a domain name. Even a VSCode fork like Cursor is an improvement over that state.

Google pays over 15% of search revenue to be the default search engine on various browsers.


If you need to do the latter to be able to make money on the former, then you're not making money. Because if the latter requirement would disappear, inference margins would also drop.


At the end of the day, they're still burning cash. Even if inference is cheap, it's also not hard to compete on. They aren't going to be a trillion dollar inference company.

Eventually there will be a race to the bottom on inference price to the customer by companies that aren't trying to subsidize their GPU investments.

OpenAI is spending money because they think they need to for their business to survive. They're hoping that the next big breakthrough just requires more compute and, somehow, that'll build them a moat.


OpenAI and quite honestly the others think they are in a race to AGI not the bottom. That's why they aren't concerning themselves with moats or cost. This is quite simply a massive bet that we've already cracked AGI and the rest is just funding the engineering to make it happen.

I personally think we haven't cracked AGI yet but it doesn't change their calculus.


>inference is really really cheap

cough Sora cough


Pretty cool

Your goal.md examples are all features for the existing codebase. Any largish goal.md examples where your system is able to 1 shot a pretty large app?

The goal.md is what makes this thing either amazing or terrible for the user, so any guidelines or clear examples on writing a good one would go a long way.


author here! Good suggestion, we should probably come up with some GOAL.md examples. With that said, one-shotting a pretty large app is a somewhat doable task, and that's one of the reasons we have introduced the interview step: exactly to let the model pull from you (instead you pushing into the model a spec document) what it needs to know to be able to work autonomously.


This is insane man, great work. Video within video blew my mind

Could you push it even further? Infini video zoom?

I keep zooming and its just videos all the way down


i think it would be technically possible, although with videos a tiny change in pixels at 1000x zoom causes the whole screen to flash different sub-videos rapidly. an infinite photo zooming effect would remain more consistent


It absolutely can be a vm. Someone even got it running on a 2 dollar esp32. Its just making api calls


^ AI slop


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: