Hacker Newsnew | past | comments | ask | show | jobs | submit | PlasmonOwl's commentslogin

What do you mean boring? MOFs are a fascinating area of chemistry. Outside of nature, they are most likely our best example of rationally designed nanoscale systems. In chemistry, rational design - that based on rules - is a rare thing. Molecules bump around and stick together in unpredictable ways, but MOFs allow us to create very well defined nanoscale frameworks. They’re famously tricky, though!


Ok so I am always interested in these papers as a chemist. Often, we find that the LLM are terrible at chemistry. This is because the lived experience of a chemist is fundamentally different from the education they receive. Often, a masters student takes 6 months to become productive at research in a new sub field. A PhD, around 3 months.

Most chemists will begin to develop an intuition. This is where the issues develop.

This intuition is a combination of the chemists mental model, and how the sensory environment stimulates that. As a polymer chemist in a certain system maybe brown means I see scattering hence particles. My system is supposed to be homogeneous so I bin the reaction.

It is often known that good grades don’t make good researchers. That’s because researchers aren’t doing rote recall.

So the issue is this: we ask the LLM how many proton environment in this nmr?

We should ask: I’m intercalating Li into a perovskite using BuLi. Why does the solution turn pink?


I think a huge reason why LLMs are so far ahead in programming is because programming exists entirely in a known and totally severed digital environment outside our own. To become a master programmer all you need is a laptop and an internet connection. The nature of it existing entirely in a parallel digital universe just lends itself perfectly to training.

All of that is to say that I don't think the classic engineering fields have some kind of knowledge or intuition that is truly inaccessible to LLMs, I just think that it is in a form that is too difficult right now to train on. However if you could train a model on them, I strongly suspect they would get to the same level they are at today with software.


> I think a huge reason why LLMs are so far ahead in programming

Are they? Last time I checked (couple of seconds ago), they still made silly mistakes and hallucinated wildly.

Example: https://imgur.com/a/Cj2y8km (AI teaching me about the Coltrane operator, that obviously does not exist).


You're using the worst model when it comes to programming, not sure what point you're trying prove here. That's why when someone starts ranting how useless ai models are when it comes to coding I always assume they're just using inferior models.


My question was very simple. Suitable for a simpler model.

I can come up with prompts that make better models hallucinate (see post below).

I don't understand your objection. This is a known fact, LLMs hallucinate shit regardless of the model size.


LLMs are getting better. Are you?

Nothing matters in this business except the first couple of time derivatives.


Maybe I'm not.

However, I'm discussing this within the context of the study presented in the paper, not some future yet-to-be-achieved performance expectation.

If we step outside the context of the paper (not advised), I think any average developer is better than an LLM at energy efficiency. LLMs cheat by consuming more resources than a human. "Better" is quite relative. So, let's keep reasonable.


Are you intentionally sandbagging the LLMs to prove a point, or do you really think 4o-mini is good enough for programming?

Even 2.5 flash easily gets this https://imgur.com/a/OfW30eL


The point is that I can make them hallucinate quite easily. And they don't demonstrate knowing their own limitations.

For example, 2.5 Flash fails to explain the difference between the short ternary operator (null coalescing) and the Elvis operator.

https://imgur.com/a/xKjuoqV

Even when I specify a language (therefore clearing the confusion, supposedly), it still fails to even recognize the Elvis operator by its toupe, and mixes it up the explanation (it doesn't even understand what I asked).

https://imgur.com/a/itr87hM

So, the point I'm trying to make is that they're not any better for programming than they're for chemistry.


Flash is the wrong model for questions like that -- not that you care -- but if you'd like to share the actual prompt you gave it, I'll try it in 2.5 Pro.


"explain me the difference between the short ternary operator and the Elvis operator"

When it failed, I replied: "in PHP".

You don't seem to understand what I'm trying to say and instead is trying to defend LLMs for a fault that is a fact known in the industry at large.

I'm sure that in short time, I could make 2.5 Pro hallucinate as well. If not on this question, on others.

This behavior is inline with the paper conclusions:

> many models are not able to reliably estimate their own limitations.

(see Figure 3, they tested a variety of models of different qualities).

This is the kind of question a junior developer can answer with simple google searches, or by reading the PHP manual, or just by testing it on a REPL. Why do we need a fancy model in order to answer such a simple inquiry? Would a beginner know that the answer is incorrect and he should use a different model?

Also, from the paper:

> For very relevant topics, the answers that models provide are wrong.

> Given that the models outperformed the average human in our study, we need to rethink how we teach and examine chemistry.

That's true for programming as well. It outperforms the average human, but then it makes silly mistakes that could confuse beginners. It displays confidence in being plain wrong.

The study also used manually curated questions for evaluation, so my prompt is not some dirty trick. It's totally inline with the context of this discussion.


It's better than it was a year ago, as you'd have discovered for yourself if you used current models. Nothing else matters.

See if this looks any better (I don't know PHP): https://g.co/gemini/share/7849517fdb89

If it doesn't, what specifically is incorrect?


What I expect from a human is to ask "in which language?", because it makes a difference. If no language was supplied, I expect a brief summary of null coalescing and shorthand ternary options with useful examples in the most popular languages.

--

The JavaScript example should have mentioned the use of `||` (or operator) to achieve the same effect of a shorthand ternary. It's common knowledge.

In PHP specifically, `??` allows you to null coalesce array keys and other types of complex objects. You don't need to write `isset($arr[1]) ? $arr[1] : "ipsum"`, you can just `$arr[1] ?? "ipsum"`. TypeScript has it too and I would expect anyone answering about JavaScript to mention that, since it's highly relevant for the ecosystem.

Also in PHP, there is the `?:` that is similar to what `||` does in JavaScript in an assignment context, but due to type juggling, it can act as a null coalesce operator too (although not for arrays or complex types).

The PHP example they present, therefore, is plain wrong and would lead to a warning for trying to access an unset array key. Something that the `??` operator (not mentioned in the response) would solve.

I would go as far as explaining null conditional acessors as well `$foo?->bar` or `foo?.bar`. Those are often called Elvis operators coloquially and fall within the same overall problem-solving category.

The LLM answer is a dangerous mix of incomplete and wrong. It could lead a beginner to adopt an old bad practice, or leave a beginner without a more thorough explanation. Worst of all, the LLM makes those mistakes with confidence.

--

What I think is going on is that null handling is such a basic task, that programmers learn it in the first few years of their careers and almost never write about it. There's no need to. I'm sure a code-completion LLM can code using those operators effectively, but LLMs cannot talk about them consistently. They'll only get better at it if we get better at it, and we often don't need to write about it.

In this particular elvis operator thing, there has been no significant improvement in the correctedness of the answer in __more than 2 whole years__. Samples from ChatGPT in 2023 (note my image date): https://imgur.com/UztTTYQ https://imgur.com/nsqY2rH.

So, _for some things_, contrary to what you suggested before, LLMs are not getting that much better.


Having read the reply in 2.5 Pro, I have to agree with you there. I'm surprised it whiffed on those details. They are fairly basic and rather important. It could have provided a better answer (I fed your reply back to it at https://g.co/gemini/share/7f87b5e9d699 ), but it did a crappy job deciding what to include in its initial response.

I don't agree that you can pick one cherry example and use it to illustrate anything general about the progress of the models in general, though. There are far too many counterexamples to enumerate.

(Actually I suspect what will happen is that we'll change the way we write documentation to make it easy for LLMs to assimilate. I know I'm already doing that myself.)


> I don't agree that you can pick one cherry example

Benchmarks and evaluations are made of cherry picked examples. What makes my example invalid, and benchmark prompts valid? (it's a rethorical question, you don't need to answer).

> write documentation to make it easy for LLMs to assimilate.

If we ever do that, it means LLMs failed at their job. They are supposed to help and understand us, not the other way around.


If we ever do that, it means LLMs failed at their job. They are supposed to help and understand us, not the other way around.

If you buy into the whole AGI thing, I guess so, but I don't. We don't have a good definition of intelligence, so it's a meaningless question.

We do know how to make and use tools, though. And we know that all tools, especially the most powerful and/or hazardous ones, reward the work and care that we put into using them. Further, we know that tool use is a skill, and that some people are much better at it than others.

What makes my example invalid, and benchmark prompts valid?

Your example is a valid case of something that doesn't work perfectly. We didn't exactly need to invent AI to come up with something that didn't work perfectly. I have examples of using LLMs to generate working, useful code in advanced, specialized disciplines, code that I frankly don't understand myself and couldn't have written without months of study, but that I can validate.

Just one of those examples is worth a thousand examples like yours, in my book. I can now do things that were simply impossible for me before. It would take some nerve to demand godlike perfection on top of that, or to demand useful results with little or no effort on my part.


> We do know how to make and use tools

It's the same principle. A tool is supposed to assist us, not the other way around.

An LLM, "AGI magic" or not, is supposed to write for me. It's a tool that writes for me. If I am writing for the tool, there's something wrong with it.

> I have examples [...] Just one of those examples is worth a thousand examples like yours

Please, share them. I shared my example. It can be a very small "bug report", but it's real and reproducible. Other people can build on it, either to improve their "tool skills" or to improve LLMs themselves.

An example that is shared is worth much more than an anectode.


It's hard to get too specific without running afoul of NDAs and such, since most of my work is for one customer or another, but the case that really blew me away was when I needed to find a way to correct an oscillator that had inherent stability problems due to a combination of a very good crystal and very poor thermal engineering on the OEM's part. The customer uses a lot of these oscillators, and they are a massive pain point in production test because they often perform so much worse than they should.

I started out brainstorming with o1-pro, trying to come up with ways to anticipate drift on multiple timescales, from multiple influences with differing lag times, and correct it using temperature trends measured a couple of inches away on a different component. It basically said, "Here, train this LSTM model to predict your drift observations from your observed temperature," and spewed out a bunch of cryptic-looking PyTorch code. It would have been familiar enough to an ML engineer, I'm sure, but it was pretty much Perl to me.

I was like, Okaaaaayyy....? but I tried it anyway, suggested hyperparameters and all, and it was a real road-to-Damascus moment. Again, I can't share the plots and they wouldn't make sense anyway without a lot of explanation, but the outcome of my initial tests was freakishly good.

Another model proved to be able to translate the Python to straight C for use by the onboard controller, which was no mean feat in itself (and also allowed me to review it myself), and now that problem is just gone. Basically for free. It was a ridiculous, silly thing to try, and it worked.

When this tech gets another 10x better, the customer won't need me anymore... and that is fucking awesome.


I too have all sorts of secret stuff that I wouldn't share. I'm not asking for that. Isolating and reproducing example behavior is different from sharing your whole work.

> It would have been familiar enough to an ML engineer, I'm sure, but it was pretty much Perl to me.

How can you be sure that the solution doesn't have obvious mistakes that an ML engineer would spot right away?

> When this tech gets another 10x better

A chainsaw is way better than a regular saw, but it's also more dangerous. Learning to use it can be fun. Learning not to cut your toes is also important.

I am looking for ways in which LLMs could potentially cut people's toes.

I know you don't want to hear that your favorite tool can backfire, and you're still skeptic despite having experienced the example I gave you firsthand. However, I was still hopeful that you could understand my point.


They aren't getting any better at programming, so they naturally assume the LLMs aren't, either.


>the lived experience of a chemist is fundamentally different from the education they receive. Most chemists will begin to develop an intuition.

Is this a documentation problem? The LLMs are only trained on what is written down. Seems to track with another comment further down quoting:

"Models are limited in ability to answer knowledge-intensive questions, probably because the required knowledge cannot easily be accessed via papers but rather by lookup in specialized databases, which the humans used to answer such questions"


>using BuLi. Why does the solution turn pink?

I would say odds are because of an impurity. My first guess might be the solvent if there is more in action than reagents or reactants. Maybe could be confirmed or denied by some carefully figured filtration beforehand, which might not even be that difficult. I doubt I would try much further than that unless it was a bad problem.

Although for instance an alternate simple purification like distillation is pretty much routine for pure aniline to get some colorless material, and that's some pretty rough stuff to handle.

Now I once was a young chemist facing AI, I ended up highly focused on going forward in ways that would not be "taken over" by AI, and I knew I couldn't be slow or recession still might catch up with me, plus the 1990's were approaching fast ;)

By the mid 1990's I figured there's no way the stuff they have in this paper had not been well investigated.

I always knew it would take people that had way more megabytes than I could afford.

Sheesh, did I overestimate the progress people were making when I wasn't looking.


Just out of curiosity (not knowing anything about butyllithium other than what I've read on 'Things I Won't Work With'), is this answer from o3-pro even close?

https://chatgpt.com/share/685041db-c324-800b-afc6-5cb2c5ef31...


Loads of shit in the basement in the chemistry department, physics dept. etc. There's quite a few lead sarcophagus that we've labelled no go ha.


I don't remember a basement in the Oliver Lodge building, but I don't think there was much highly active after the tandem went, other than small short-lived sources made in the Universities research reactor.


Now repeat this study for meditators. I used to vocalise and the total volume of internal vocalisation decreased with increasing practice. I have a high level of verbal intelligence, so I wonder how this has impacted?


You are completely right. I am a chemist and this isn’t a self indulgent rant but there are those who “get” chemistry and those who don’t. We can teach and train a chemist to work in a lab - but one who groks it? Difficult to create. Sadly, there are scant opportunities for glory in chemistry. Salaries are usually low, issues with mental health are rampant, and it’s generally a career of high suffering. (For a white collar role) Many of us regret our choice, because we all feel like Walter White, funnily enough. Talented, but little to show for it. Most of us don’t start cooking though


> Salaries are usually low, issues with mental health are rampant, and it’s generally a career of high suffering. (For a white collar role)

Are you specifically referring to grad school and work in academia, or is this very location specific for people who've started their career? Because I know tons of chemists who went into industry after their Ph.D and they earn on the high side of overall STEM degrees.

Pharma, polymer producers, chemical bulk goods, petro, ... They all pay 6-figure salaries before mid-career. Of course it's not FAANG, but it's very comfortable.

So either my chemistry friends purposely got skills in grad school that transfer well into industry, or the German language region has unusually strong pharma/chemical industry.


I would say your last point. Chemical field is of extraordinarily high status in Germanic countries. Its like math for Francophones last century. (People learning the language for the field) Merkel? Angewandte Chemie? BASF? Not sure if that will last thanks to a lot of it based on fossil fuels.


> We can teach and train a chemist to work in a lab - but one who groks it?

This happens a lot in mature fields. Mechanics generally can't design cars. Doctors generally can't come up with drugs. IT staff can't generally write software. Pilots can't generally design aeroplanes. Homeowners can't generally build houses. The operator/builder split is real!


Aren't your examples also true about immature fields?

Very few doctors have every come up with drugs.

Few of the early pilots after the Wrights designed their own airplanes, but airplanes were certainly not mature by 1912.

When did home building change from an immature field to a mature one? I struggle to think of when most homeowners built houses.

Saying "IT staff can't generally write software" sounds like saying "sailors can't generally pilot a large vessel" - both are specialized abilities in a larger field.


> Saying "IT staff can't generally write software" sounds like saying "sailors can't generally pilot a large vessel" - both are specialized abilities in a larger field.

I'm not saying they aren't. That's why I said that there's a gap between build and operate.


It still has nothing to do with the maturity of the field.

How many astronauts could design a rocket that could get to space? I believe the answer is zero, irrespective of maturity.


Yeah, that’s it.

I didn’t "get" chemistry back in my school days and it was a real frustration because, well, of course I had bad grades but what saddened me were that I still found the topic to be very interesting and I loved physics, which I was pretty good at and I could see how the two were magically interconnected. But I failed to grasp the "logic" behind the system.

That’s truly one of my regrets because chemistry is probably one of the most important fundamental science for humanity’s future.

But that’s ok, I love computer science too and I did "get" it (mostly). Thanks for the awesome semiconductors, chemists !


The way chemistry is taught in schools isn't very logic in a lot of ways – it's based on heuristics. It's mostly some empirical rules, but if feels like you have more exceptions from those rules than real applications. The reasons are that 1) each molecule is a complex and complicated quantum mechanical system and 2) each observable unit of chemistry (one or more substances and their transitions in reactions or change of state) is a thermodynamic system with complex statistics. Highschoolers - not unlike alchemists - lack the math to describe these problems, so they are taught heuristics that are useful in understand a lot (but not all) everyday-chemistry.


Your phrase “logic behind the system” resonates with me. I loved and thrived in organic chemistry but it took hours and hours of sitting working through syntheses over and over-pulling books off library shelves looking for more examples. When things clicked it was beautiful. Many of my pre-med classmates were determined to memorize their way out of it, and I pleaded with them that it was impossible. It’s not list of parameters it is a way of thinking and more complex pattern recognition. They were likely busy with other pre-med stuff and had to allocate time elsewhere. I ended up pursuing microbiology/molecular biology as I didn’t have the financial runway to switch or expand majors at that point in my life, no regrets but I loved the logic of chemistry and probably could have led to some cool things had I done both.


So basically half of this is in an indirect form to practice the skills of meditation?


I was taught by J. M. Barrie’s nephew. A wonderful person that had a great impact on my desire to become a scientist despite my earlier failings and shortcomings. He was so well regarded.


Or you could just drop their coffee mug…?


Had the same issue on Reddit recently. Toxibaccin was discovered in soils in 2018. Hell I know someone who developed one recently at a university. It’s AMAZING AI can do this, but it certainly doesn’t destroy prior approaches. Yet.

Disclaimer: spelling and name of antibiotic may be incorrect. It is not my field, and the spelling is phonetic from myself. If pressed, I will ask the scientist for the specific name.


I expect you are referencing https://en.wikipedia.org/wiki/Teixobactin


Discovered in some random dirt from Maine.


"Discovered" like penicillin? It irks me when it's presented as accidental, when it does go through testing after


Penicillin was also accidental? I don't understand the point.


Maybe he conflates discovery with accident. Materials science searches chemical space and makes discoveries. There is no accident in that kind of discovery?


Author is leveraging mental inflexibility to generate an emotional response of denial. Sure, his points are correct but are constrained. Let’s remove 2 constraints and reevaluate:

1 - Babies learn much more with much less 2 - Video training data can be made in theory at incredible rates

The questions becomes: why is the author focusing on approaches in AI investigated in like 2012? Does the author think SOTA is text only? Are OpenAI or other market leaders only focusing on text? Probably not.


Isn't 1 a point for their "skeptic" persona?

If babies learn much more from much less, isn't that evidence that the LLM approach isn't as efficient as whatever approach humans implement biologically, so it's likely LLM processes won't "scale to ago"?

For video data, that's not how LLMs work(or any NNs for that matter). You have to train them on what you want them to look at, so if you want them to predict the next token of text given an input array, you need to train it on the input arrays and output tokens.

You can extract the data in the form you need from the video content, but presumably that's already been done for the most part, since video transcripts are likely included in the training data for gpt.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: