Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The o1 model is really remarkable. I was able to get very significant speedups to my already highly optimized Rust code in my fast vector similarity project, all verified with careful benchmarking and validation of correctness.

Not only that, it also helped me reimagine and conceptualize a new measure of statistical dependency based on Jensen-Shannon divergence that works very well. And it came up with a super fast implementation of normalized mutual information, something I tried to include in the library originally but struggled to find something fast enough when dealing with large vectors (say, 15,000 dimensions and up).

While it wasn’t able to give perfect Rust code that compiled on the very first try, it was able to fix all the bugs in one more try after pasting in all the compiler warning problems from VScode. In contrast, gpt-4o usually would take dozens of tries to fix all the many rust type errors, lifetime/borrowing errors, and so on that it would inevitably introduce. And Claude3.5 sonnet is just plain stupid when it comes to Rust for some reason.

I really have to say, this feels like a true game changer, especially when you have really challenging tasks that you would be hard pressed to find many humans capable of helping with (at least without shelling out $500k+/year in compensation for).

And it’s not just the performance optimization and relatively bug free code— it’s the creative problem solving and synthesis of huge amounts of core mathematical and algorithmic knowledge plus contemporary research results, combined with a strong ability to understand what you’re trying to accomplish and making it happen.

Here is the diff to the code file showing the changes:

https://github.com/Dicklesworthstone/fast_vector_similarity/...



But a lot of what you pay humans $500k a year for is to work with enormous existing systems that an LLM cannot understand just yet. Optimizing small libraries and implementing fast functions though is a huge improvement in any programmer's toolbox.


Yes, that’s certainly true, and that’s why I selected that library in particular to try with it. The fact that it’s mathematical— so not many lines of code, but each line packs a lot of punch and requires careful thought to optimize— makes it a perfect test bed for this model in particular. For larger projects that are simpler, you’re probably better off with Claude3.5 sonnet, since it has double the context window.


Can’t Gemini work with a million+ input tokens?


Yes, but its reasoning ability is extremely poor in my experience with real world programming tasks. I’m talking about stuff that Claude3.5 Sonnet handles easily, and GPT4o can also handle if it can fit in its smaller context window, where Gemini 1.5 pro just completely fails.

Bigger context is definitely helpful, but not if it comes at the expense of reasoning/analytical ability. I’m always a bit puzzled why people stress the importance of these “needle in a haystack” tests where the model has to find one specific thing in a huge document. That seems far less relevant to me in terms of usefulness in the real world.


> I’m always a bit puzzled why people stress the importance of these “needle in a haystack” tests where the model has to find one specific thing in a huge document. That seems far less relevant to me in terms of usefulness in the real world.

How do you mean?

Half of writing code within a codebase, is knowing what functions already exist in the codebase for you to call in your own code; and/or, what code you'll have to change upstream and downstream of the code you're modifying within the same codebase — or even by forking your dependencies and changing them — to get what you want to happen, to happen.

And half of, say, writing a longform novel, is knowing all the promises you've made to the reader, the active Chekov's guns, and all the other constraints you've placed on yourself by hundreds of pages or even several books ago, that just became relevant again as of this very sentence. Or, moreover, which of those details it's the proper time to make relevant again for maximum impact and proper first-in-last-out narrative bridging structure.

In both cases, these aren't really literal "needle in a haystack" stress-tests; they should properly be tests of the model's ability to perform some kind of "associational priority indexing" on the context, allowing it to build concepts into associational sub-networks and then make long-distance associations where the nodes are entire subnetworks. (Which isn't something we really see yet, in any model.)


Yes agreed, I wasn’t trying to say it’s totally useless, but it’s not as helpful as synthesizing all that context intelligently. It’s more of a parlor trick. But that trick can be handy if you need something like that. Really, the main issue with Gemini is that it’s simply not very smart compared to the competition, and the big context doesn’t make up for that in the slightest.


It doesn't work well though. You can't just stuff your entire codebase into it and get good results. I work somewhere that tries to do this internally


> 1,337 additions

cough


>you would be hard pressed to find many humans capable of helping with (at least without shelling out $500k+/year in compensation for).

And now we have a $number we can relate, and refer, to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: