Hacker Newsnew | past | comments | ask | show | jobs | submit | crorella's commentslogin

Welcome :D


The variety of tasks they can do and will be asked to do is too wide and dissimilar, it will be very hard to have a transversal measurement, at most we will have area specific consensus that model X or Y is better, it is like saying one person is the best coder at everything, that does not exist.

Yea, we're going to need benchmarks that incorporate series of steps of development for a particular language and how good each model is at it.

Like can the model take your plan and ask the right questions where there appear to be holes.

How wide of architecture and system design around your language does it understand.

How does it choose to use algorithms available in the language or common libraries.

How often does it hallucinate features/libraries that aren't there.

How does it perform as context get larger.

And that's for one particular language.


The thrill of competition

Same here! I think it would be good if this could be made by default by the tooling. I've seen others using SQL for the same and even the proposal for a succinct way of representing this handoff data in the most compact way.


It’s like having 3 coins and users preferring one or the other when tossing it because one coin gives consistently more heads (or tails) than the other coin.

What is better is to build a good set of rules and stick to one and then refine those rules over time as you get more experience using the tool or if the tool evolves and digress from the results you expect.


<< What is better is to build a good set of rules and

But, unless you are on a local model you control, you literally can't. Otherwise, good rules will work only as long as the next update allows. I will admit that makes me consider some other options, but those probably shouldn't be 'set and iterate' each time something changes.


what I had in mind when I added that comment was for coding, with the use of .md files. For the web version of chats I agree there is little control on how to tailor the way you want the agent to behave, unless you give a initial "setup" prompt.


At this rate, in a few months we will have probably some high quality shorts entirely generated by this.


It's funny you mention this, I was just thinking this other day we may eventually be in a future where a group hangout party could look like this:

1. Goes to friends' place 2. Usual drinks, whatever gets you going activity 3. Each person writes a prompt 4. Chain them together 5. Watch the resulting movie together

That sounds hilarious and I can't wait to try


I'm vaguely reminded of the excellent Jackbox game Tee Fury, in which players submit slogans for T shirts and "art" separately. Players then get to choose from a few options for slogans and designs to make T shirts which are voted on by the group.

I have fond memories of laughing until I was in tears when playing with a group of friends over drinks during the lockdowns in 2020. Something about the process just naturally results in hilarity (especially if you're in a group where you can be offensive).

It's like exquisite corpse for t-shirts. Or, in your case, shorts.


T shirt game is the best jackbox game!

Whenever one of my friend groups is gathered we always make it a point to do an exquisite corpse story on a piece of paper while we’re inebriated in some way xD Video version will be wild


It's seriously so good, in fact it's so good that every other Jackbox game is vaguely disappointing because nothing is half as fun as Tee Fury lol.


In a few months, we'll have some high quality deep-fakes used to ruin people's personal lives.


We'll see. I think we'll see a high quality feature film first though, shorts are notoriously difficult to pull off.


Personal experience here in a FAANG, there has been a considerable increase in: 1. Teams exploring how to leverage LLMs for coding. 2. Teams/orgs that already standardized some of the processes to work with LLMs (MCP servers, standardized the creation of the agents.md files, etc) 3. Teams actively using it for coding new features, documenting code, increasing test coverage, using it for code reviews etc.

Again, personal, experience, but in my team ~40-50% of the PRs are generated by Codex.


“Teams exploring how to leverage [AI]s for [anything]” is true for about a decade now in every large multinational companies at every level. It’s not new at all. AI is the driving buzzword for a while now, even well before ChatGPT. I’ve encountered many people who just wanted the stamp that they use AI, no matter how, because my team was one of the main entry point to achieve this at that specific company. But before ChatGPT and co, you had to work for it a lot, so most of them failed miserably, or immediately backtracked when they realized this.


Im sure the MBA folks love stats like that - theres plenty that have infested big tech. I mean Pichai is an MBA+Mckinsey Alumni.

Ready for the impending lay off fella?


There are places that offer Copilot to any team that wants it, and then behind the scenes they informed their managers that if the team (1+ persons) adopts it they will have to shed 10%+ human capacity (lose a person, move a person, fire a person) in the upcoming quarters next year.


same, I had a great idea (and a decently detailed plan) to improve an open source project, but never had the time and willpower to dive into the code, with codex it was one night to set it up and then slowing implementing every step of what I had originally planned.


omg, this is something I've had in mind for quite some time, I even bought some i2s devices to test it out. Do you have some pointers on how to do it?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: