The variety of tasks they can do and will be asked to do is too wide and dissimilar, it will be very hard to have a transversal measurement, at most we will have area specific consensus that model X or Y is better, it is like saying one person is the best coder at everything, that does not exist.
Same here! I think it would be good if this could be made by default by the tooling. I've seen others using SQL for the same and even the proposal for a succinct way of representing this handoff data in the most compact way.
It’s like having 3 coins and users preferring one or the other when tossing it because one coin gives consistently more heads (or tails) than the other coin.
What is better is to build a good set of rules and stick to one and then refine those rules over time as you get more experience using the tool or if the tool evolves and digress from the results you expect.
<< What is better is to build a good set of rules and
But, unless you are on a local model you control, you literally can't. Otherwise, good rules will work only as long as the next update allows. I will admit that makes me consider some other options, but those probably shouldn't be 'set and iterate' each time something changes.
what I had in mind when I added that comment was for coding, with the use of .md files.
For the web version of chats I agree there is little control on how to tailor the way you want the agent to behave, unless you give a initial "setup" prompt.
It's funny you mention this, I was just thinking this other day we may eventually be in a future where a group hangout party could look like this:
1. Goes to friends' place
2. Usual drinks, whatever gets you going activity
3. Each person writes a prompt
4. Chain them together
5. Watch the resulting movie together
I'm vaguely reminded of the excellent Jackbox game Tee Fury, in which players submit slogans for T shirts and "art" separately. Players then get to choose from a few options for slogans and designs to make T shirts which are voted on by the group.
I have fond memories of laughing until I was in tears when playing with a group of friends over drinks during the lockdowns in 2020. Something about the process just naturally results in hilarity (especially if you're in a group where you can be offensive).
It's like exquisite corpse for t-shirts. Or, in your case, shorts.
Whenever one of my friend groups is gathered we always make it a point to do an exquisite corpse story on a piece of paper while we’re inebriated in some way xD Video version will be wild
Personal experience here in a FAANG, there has been a considerable increase in:
1. Teams exploring how to leverage LLMs for coding.
2. Teams/orgs that already standardized some of the processes to work with LLMs (MCP servers, standardized the creation of the agents.md files, etc)
3. Teams actively using it for coding new features, documenting code, increasing test coverage, using it for code reviews etc.
Again, personal, experience, but in my team ~40-50% of the PRs are generated by Codex.
“Teams exploring how to leverage [AI]s for [anything]” is true for about a decade now in every large multinational companies at every level. It’s not new at all. AI is the driving buzzword for a while now, even well before ChatGPT. I’ve encountered many people who just wanted the stamp that they use AI, no matter how, because my team was one of the main entry point to achieve this at that specific company. But before ChatGPT and co, you had to work for it a lot, so most of them failed miserably, or immediately backtracked when they realized this.
There are places that offer Copilot to any team that wants it, and then behind the scenes they informed their managers that if the team (1+ persons) adopts it they will have to shed 10%+ human capacity (lose a person, move a person, fire a person) in the upcoming quarters next year.
same, I had a great idea (and a decently detailed plan) to improve an open source project, but never had the time and willpower to dive into the code, with codex it was one night to set it up and then slowing implementing every step of what I had originally planned.
reply