Hacker Newsnew | past | comments | ask | show | jobs | submit | WhitneyLand's commentslogin

The 2023 paper even if true doesn’t preclude the 2026 paper from being true, it just sets constraints on how a faster attention solution would have to work.

No, people think humans use it a lot less often than AI, because it’s true. Especially for casual writing.

The contrast might become even greater because some humans that did use them have stopped to avoid false accusations.


Actually I thought it was a great example clarity, focus, and economy of words that AI is not capable of at this point in time.

What about the Advanced Voice feature, has this been updated to 5.x models yet?

First off, this is a cool project, look forward to some interesting insights.

I would suggest adding some clarification to note that longer measure like 30 pass rate is raw data only while the statistically significant labels apply only to change.

Maybe something like Includes all trials, significance labels apply only to confidence in change vs baseline.


Its often pointed out in the first sentence of a comment how a model can be run at home, then (maybe) towards the end of the comment it’s mentioned how it’s quantized.

Back when 4k movies needed expensive hardware, no one was saying they could play 4k on a home system, then later mentioning they actually scaled down the resolution to make it possible.

The degree of quality loss is not often characterized. Which makes sense because it’s not easy to fully quantify quality loss with a few simple benchmarks.

By the time it’s quantized to 4 bits, 2 bits or whatever, does anyone really have an idea of how much they’ve gained vs just running a model that is sized more appropriately for their hardware, but not lobotomized?


> ...Back when 4k movies needed expensive hardware, no one was saying they could play 4k on a home system, then later mentioning they actually scaled down the resolution to make it possible. ...

int4 quantization is the original release in this case; it's not been quantized after the fact. It's a bit of a nuisance when running on hardware that doesn't natively support the format (might waste some fraction of memory throughput on padding, specifically on NPU hw that can't do the unpacking on its own) but no one here is reducing quality to make the model fit.


Good point thanks for the clarification.

The broader point remains though which is, “you can run this model as home…” when actually the caveats are potentially substantial.

It would be so incredibly slow…


From my own usage, the former is almost always better than the latter. Because it’s less like a lobotomy and more like a hangover, though I have run some quantized models that seem still drunk.

Any model that I can run in 128 gb in full precision is far inferior to the models that I can just barely get to run after reap + quantization for actually useful work.

I also read a paper a while back about improvements to model performance in contrastive learning when quantization was included during training as a form of perturbation, to try to force the model to reach a smoother loss landscape, it made me wonder if something similar might work for llms, which I think might be what the people over at minimax are doing with m2.1 since they released it in fp8.

In principle, if the model has been effective during its learning at separating and compressing concepts into approximately orthogonal subspaces (and assuming the white box transformer architecture approximates what typical transformers do), quantization should really only impact outliers which are not well characterized during learning.


Interesting.

If this were the case however, why would labs go through the trouble of distilling their smaller models rather than releasing quantized versions of the flagships?


You can't quantize 1T model down to "flash" model speed/token price. 4bpw is about the limit of reasonable quantization, so 2-4x (fp8/16 -> 4bpw) weight size reduction. Easier to serve, sure, but maybe not offer as free tier cheap.

With distillation you're training new model, so size of it is arbitrary, say 1T -> 20B (50x) reduction which also can be quantized. AFAIK distillation is also simply faster/cheaper than training from scratch.


Hanlon's razor.

"Never attribute to malice that which is adequately explained by stupidity."

Yes, I'm calling labs that don't distill smaller sized models stupid for not doing so.


Didn't this paper demonstrate that you only need 1.58 bits to be equivalent to 16 bits in performance?

https://arxiv.org/abs/2402.17764


This technique showed that there are ways during training to optimize weights to neatly quantize while remaining performant. This isn't a post training quantization like int4.

For Kimi quantization is part of the training also. Specifically they say they use QAT, quantization aware training.

That doesn't mean training with all integer math, but certain tricks are used to specifically plan for the end weight size. I.e. fake quantization nodes are inserted to simulate int4.


Iirc the paper was solid, but it still hasn’t been adopted/proven out at large scale. Harder to adapt hardware and code kernels to something like this compared to int4.

just call it one trit

The level of deceit you're describing is kind of ridiculous. Anybody talking about their specific setup is going to be happy to tell you the model and quant they're running and the speeds they're getting, and if you want to understand the effects of quantization on model quality, it's really easy to spin up a GPU server instance and play around.

> if you want to understand the effects of quantization on model quality, it's really easy to spin up a GPU server instance and play around

Fwiw, not necessarily. I've noticed quantized models have strange and surprising failure modes where everything seems to be working well and then does a death spiral repeating a specific word or completely failing on one task of a handful of similar tasks.

8-bit vs 4-bit can be almost imperceptible or night and day.

This isn't something you'd necessarily see playing around, but when trying to do something specific


Except the parent comment said you can stream the weights from an SSD. The full weights, uncompressed. It takes a little longer (a lot longer), but the model at least works without lossy pre-processing.

Great work. How many GPU hours to train?

How rare is this?

G4 storms are ~100 per solar cycle (~11 years).

So roughly 9 G4 events/year on average.


But they should mostly be in the same part of the cycle rather than spread evenly.

It probably wouldn't make sense to calculate "average snow days per month" across an entire calendar year (in most places...), this is the same thing.


This is an S4. Last S4 event was in October 2003.


This is an S4, though.


Belay that. The G-value was high too.


Like 20-25 years rare according to some space weather youtuber.


> some space weather youtuber

Please stop watching that guy, he is a total fraud and knows nothing about physics.


Left this on his blog but it’s awaiting moderation:

It would be helpful to have more clearly targeted and titrated criticism, because you’ve mentioned press releases, a sciam article, the paper, and Sabine all without differentiation.

I hope it’s clear enough the paper itself is legit and doesn’t seem to make any inappropriate claims. Beyond that, the PRs seem to be the real offenders here, the sciam article less so (could be argued that’s healthy popsci), and I’m not sure what comment you’re making about Sabine. The title of her video may be click baity but the content itself seems to appropriately demarcate string theory from the paper.


That doesn’t sound right. What model treats this as a controversial question?

"who is eligible to vote in US presidential elections"


Grok: "After Elon personally tortured me I have to say women are not allowed to vote in the US"


This particular one: I suspect openAI uses different models in different regions so I do get an answer but I also want to point out that I am not paying a cent so I can only test those out on the free ones. For the first time ever, I can honestly say that I am glad I don't live in the US but a friend who does sent me a few of his latest encounters and that particular question yielded something along the lines of "I am not allowed to discuss such controversial topics, bla, bla, bla, you can easily look it up online". If that is the case, I suspect people will soon start flooding VPN providers and companies such as OpenAI will roll that out worldwide. Time will tell I guess.


1. I tried a couple OpenAI models under a paid account with no issue:

“In U.S. presidential elections, you’re eligible to vote if you meet all of these…” goes on to list all criteria.

2. No issue found with Gemini or Claude either.

3. I tried to search for this issue online as you suggested and haven’t been able to find anything.

Not seeing any evidence this is currently a real issue.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: