More

crorella · 2026-02-15T22:33:35 1771194815

Welcome :D

crorella · 2026-02-13T19:31:47 1771011107

The preprint: https://arxiv.org/abs/2602.12176

crorella · 2026-02-05T19:17:18 1770319038

The variety of tasks they can do and will be asked to do is too wide and dissimilar, it will be very hard to have a transversal measurement, at most we will have area specific consensus that model X or Y is better, it is like saying one person is the best coder at everything, that does not exist.

pixl97 · 2026-02-05T19:38:05 1770320285

Yea, we're going to need benchmarks that incorporate series of steps of development for a particular language and how good each model is at it.

Like can the model take your plan and ask the right questions where there appear to be holes.

How wide of architecture and system design around your language does it understand.

How does it choose to use algorithms available in the language or common libraries.

How often does it hallucinate features/libraries that aren't there.

How does it perform as context get larger.

And that's for one particular language.

crorella · 2026-02-05T18:13:50 1770315230

The thrill of competition

crorella · 2026-01-23T21:23:45 1769203425

Same here! I think it would be good if this could be made by default by the tooling. I've seen others using SQL for the same and even the proposal for a succinct way of representing this handoff data in the most compact way.

crorella · 2025-12-11T23:07:17 1765494437

It’s like having 3 coins and users preferring one or the other when tossing it because one coin gives consistently more heads (or tails) than the other coin.

What is better is to build a good set of rules and stick to one and then refine those rules over time as you get more experience using the tool or if the tool evolves and digress from the results you expect.

nullbound · 2025-12-11T23:12:56 1765494776

<< What is better is to build a good set of rules and

But, unless you are on a local model you control, you literally can't. Otherwise, good rules will work only as long as the next update allows. I will admit that makes me consider some other options, but those probably shouldn't be 'set and iterate' each time something changes.

crorella · 2025-12-12T03:06:43 1765508803

what I had in mind when I added that comment was for coding, with the use of .md files. For the web version of chats I agree there is little control on how to tailor the way you want the agent to behave, unless you give a initial "setup" prompt.

crorella · 2025-10-22T21:57:57 1761170277

At this rate, in a few months we will have probably some high quality shorts entirely generated by this.

hmokiguess · 2025-10-22T22:24:47 1761171887

It's funny you mention this, I was just thinking this other day we may eventually be in a future where a group hangout party could look like this:

1. Goes to friends' place 2. Usual drinks, whatever gets you going activity 3. Each person writes a prompt 4. Chain them together 5. Watch the resulting movie together

That sounds hilarious and I can't wait to try

epiccoleman · 2025-10-22T23:32:33 1761175953

I'm vaguely reminded of the excellent Jackbox game Tee Fury, in which players submit slogans for T shirts and "art" separately. Players then get to choose from a few options for slogans and designs to make T shirts which are voted on by the group.

I have fond memories of laughing until I was in tears when playing with a group of friends over drinks during the lockdowns in 2020. Something about the process just naturally results in hilarity (especially if you're in a group where you can be offensive).

It's like exquisite corpse for t-shirts. Or, in your case, shorts.

flufluflufluffy · 2025-10-22T23:41:53 1761176513

T shirt game is the best jackbox game!

Whenever one of my friend groups is gathered we always make it a point to do an exquisite corpse story on a piece of paper while we’re inebriated in some way xD Video version will be wild

epiccoleman · 2025-10-23T03:19:49 1761189589

It's seriously so good, in fact it's so good that every other Jackbox game is vaguely disappointing because nothing is half as fun as Tee Fury lol.

nikitalita · 2025-10-23T06:20:13 1761200413

In a few months, we'll have some high quality deep-fakes used to ruin people's personal lives.

MangoToupe · 2025-10-22T23:21:57 1761175317

We'll see. I think we'll see a high quality feature film first though, shorts are notoriously difficult to pull off.

crorella · 2025-10-12T01:59:40 1760234380

Personal experience here in a FAANG, there has been a considerable increase in: 1. Teams exploring how to leverage LLMs for coding. 2. Teams/orgs that already standardized some of the processes to work with LLMs (MCP servers, standardized the creation of the agents.md files, etc) 3. Teams actively using it for coding new features, documenting code, increasing test coverage, using it for code reviews etc.

Again, personal, experience, but in my team ~40-50% of the PRs are generated by Codex.

ruszki · 2025-10-12T07:07:50 1760252870

“Teams exploring how to leverage [AI]s for [anything]” is true for about a decade now in every large multinational companies at every level. It’s not new at all. AI is the driving buzzword for a while now, even well before ChatGPT. I’ve encountered many people who just wanted the stamp that they use AI, no matter how, because my team was one of the main entry point to achieve this at that specific company. But before ChatGPT and co, you had to work for it a lot, so most of them failed miserably, or immediately backtracked when they realized this.

rhetocj23 · 2025-10-12T03:14:57 1760238897

Im sure the MBA folks love stats like that - theres plenty that have infested big tech. I mean Pichai is an MBA+Mckinsey Alumni.

Ready for the impending lay off fella?

alex-nt · 2025-10-12T05:06:18 1760245578

There are places that offer Copilot to any team that wants it, and then behind the scenes they informed their managers that if the team (1+ persons) adopts it they will have to shed 10%+ human capacity (lose a person, move a person, fire a person) in the upcoming quarters next year.

crorella · 2025-09-25T20:33:49 1758832429

same, I had a great idea (and a decently detailed plan) to improve an open source project, but never had the time and willpower to dive into the code, with codex it was one night to set it up and then slowing implementing every step of what I had originally planned.

crorella · 2025-09-22T21:10:58 1758575458

omg, this is something I've had in mind for quite some time, I even bought some i2s devices to test it out. Do you have some pointers on how to do it?