More

greenflag · 2025-08-19T17:30:10 1755624610

For R+RStudio users, any opinions on the switch? Assistant seems to be the big departure, trying to work out if worth losing keyboard shortcut muscle memory for

jmcphers · 2025-08-19T17:35:13 1755624913

Positron has a setting that restores (most) RStudio keybindings, so your muscle memory should (mostly) transfer:

https://positron.posit.co/migrate-rstudio-keybindings.html

greenflag · 2025-08-13T18:23:08 1755109388

Beside the point but I really love the rainbow sparkles trailing the cursor on the netscape theme of this blog. Takes me back to a time when the internet was...fun

greenflag · 2025-05-20T20:25:53 1747772753

Someone has pointed out on X/Twitter that the "novel discovery" made by the AI system already has an entire review article written about the subject [0]

[0] https://x.com/wildtypehuman/status/1924858077326528991

woolion · 2025-05-21T08:16:58 1747815418

This is the real problem with AI: it generates plausible-sounding slop in absurd quantity, for which verification is very expensive.

Beyond that, it would interesting to check how wrong the AI version is compared to the ground truth (the published papers).

Current technology cannot do logic, but biology is even more perverse. For instance, suppose you ask to remove starch. If you degrade it, starch is actually removed. However, it's most likely that the point was actually to remove sugar, but degrading it actually made the sugar more readily bio-available. The relationships are complex, and there's a lot of implicit knowledge that is not communicated again at every sentence.

It would be good if the effort towards hype ideas like that was redirected in making a great tool to find and analyze the papers in a fairly reliable way (which would have prevented this blunder).

IanCal · 2025-05-22T09:12:25 1747905145

That overview I can find it talking about wet AMD, the claim for this is specifically dry AMD.

edit - and from the paper

> Notably, while ROCK inhibitors have been previously suggested for treatment of wet AMD and other retinal diseases of neovascularization, Robin is the first to propose their application in dry AMD for their effect on phagocytosis

greenflag · 2025-03-27T15:28:07 1743089287

While this is an excellently written piece and really insightful into the state of higher education funding, what seems to be missing from the debate is concrete ideas of what should be done differently (either in 2014 or today). A lot of US innovation success comes from deep pockets of private venture capital, which is just missing in the UK. So if you're a politician/bureaucrat with a (let's face it) relatively small budget and much politics to deal with, the best strategy to take is not obvious (at least to me).

anonymousDan · 2025-03-27T15:44:00 1743090240

It's well written but completely unjustified in its criticism of UK universities or their role given the resources required to train SOTA models. Are any US universities training SOTA models? No. Your point about the need for private venture capital is exactly correct. I think some kind of new funding stream needs to be identified for doing this. The US is forcing China to sell TikTok's US arm for national security reasons. We could try to do something similar in return for granting US Big Tech companies access to Europe - I guess the digital tax is a step in this direction. But it seems challenging to enforce that given the current power dynamics.

greenflag · on Aug 10, 2023

It seems the take home is weight decay induces sparsity which helps learn the "true" representation rather than an overfit one. It's interesting the human brain has a comparable mechanism prevalent in development [1]. I would love to know from someone in the field if this was the inspiration for weight decay (or presumably just the more equivalent nn pruning [2]).

[1] https://en.wikipedia.org/wiki/Synaptic_pruning [2] https://en.wikipedia.org/wiki/Pruning_(artificial_neural_net...

tbalsam · on Aug 10, 2023

ML researcher here wanting to offer a clarification.

L1 induces sparsity. Weight decay explicitly _does not_, as it is L2. This is a common misconception.

Something a lot of people don't know is that weight decay works because when applied as regularization it causes the network to approach the MDL, which reduces regret during training.

Pruning in the brain is somewhat related, but because the brain uses sparsity to (fundamentally, IIRC) induce representations instead of compression, it's basically a different motif entirely.

If you need a hint here on this one, think about the implicit biases of different representations and the downstream impacts that they can have on the learned (or learnable) representations of whatever system is in question.

I hope this answers your question.

mmmmpancakes · on Aug 11, 2023

can you please spell out what MDL is an acronym for?

sva_ · on Aug 11, 2023

https://en.wikipedia.org/wiki/Minimum_description_length

mmmmpancakes · on Aug 11, 2023

thanks

naasking · on Aug 11, 2023

> because the brain uses sparsity to (fundamentally, IIRC) induce representations instead of compression

What's the evidence for this?

heyitsguay · on Aug 11, 2023

https://bernstein-network.de/wp-content/uploads/2021/03/Lect... this has an awesome overview of the current understanding of neural encoding mechanisms.

tbalsam · on Aug 11, 2023

I enjoyed this presentation, thank you for sharing it. Good stuff in here.

I think things are a bit off about the reasoning behind the basis functions, but as I noted elsewhere here that's work I'm not entirely able to talk about as I'm actively working on developing it right now, and will release it when I can.

However, you can see some of the empirical consequences of an updated understanding on my end of encoding and compression in a release of hlb-CIFAR10 that's coming up soon that should cut out another decent chunk of training time. As a part of it, we reduce the network from a ResNet8 architecture to a ResNet7, and we additionally remove one of the (potentially less necessary) residuals. It is all 'just' empirical, of course, but long-term, as they say, the proof is in the pudding, since things are already so incredibly tightened down.

joaogui1 · on Aug 10, 2023

That looks interesting, do you know what paper talks about the connection between MDL, regret, and weight decay?

tbalsam · on Aug 10, 2023

I would start with Shannon's information theory and the Wikipedia page on L2/the MDL as a decent starting point.

For the first, there are a few good papers that simplify the concepts even further.

joaogui1 · on Aug 16, 2023

Sorry, I know what MDL and L2 regularization are, I would like the paper that connects them in the way you mentioned

visarga · on Aug 10, 2023

The inspiration for weight decay was to reduce the capacity to memorize of the model until it perfectly fits the complexity of the task, not more not less. A model more complex than the task is over-fitting, the other one is under-fitting. Got to balance them out.

But the best cure for over-fitting is to make the dataset larger and ensure data diversity. LLMs have datasets so large they usually train one epoch.

nightski · on Aug 10, 2023

It sounds nice in theory, but the data itself could be problematic. There is no temporal nature to it. You can have duplicate data points, many data points that are closely related but describe the same thing/event/etc.. So while only showing the model each data point once ensures you do not introduce any extra weight on a data point, if the dataset itself is skewed it doesn't help you at all.

Just by trying to make the dataset diverse you could skew things to not reflect reality. I just don't think enough attention has been paid to the data, and too much the model. But I could be very wrong.

There is a natural temporality to the data humans receive. You can't relive the same moment twice. That said, human intelligence is on a scale too and may be affected in the same way.

visarga · on Aug 10, 2023

> I just don't think enough attention has been paid to the data, and too much the model.

I wholly agree. Everyone is blinded by models - GPT4 this, LLaMA2 that - but the real source of the smarts is in the dataset. Why would any model, no matter how its architecture is tweaked, learn about the same ability from the same data? Why would humans be all able to learn the same skills when every brain is quite different. It was the data, not the model

And since we are exhausting all the available quality text online we need to start engineering new data with LLMs and validation systems. AIs need to introspect more into their training sets, not just train to reproduce them, but analyse, summarise and comment on them. We reflect on our information, AIs should do more reflection before learning.

More fundamentally, how are AIs going to evolve past human level unless they make their own data or they collect data from external systems?

ben_w · on Aug 10, 2023

> It was the data, not the model

It's both.

It's clearly impossible to learn how to translate Linear A into modern English using only content written in pure Japanese that never references either.

Yet also, none of the algorithms before Transformers were able to first ingest the web, then answer a random natural language question in any domain — closest was Google etc. matching on indexed keywords.

> how are AIs going to evolve past human level unless they make their own data?

Who says they can't make their own data?

Both a priori (by development of "new" mathematical and logical tautological deductions), and a posteriori by devising, and observing the results of, various experiments.

Same as us, really.

riversflow · on Aug 10, 2023

I see this brought up consistently on the topic of AI take-off/X-risk.

How does an AI language model devise an experiment and observe the results? The language model is only trained on what’s already known, I’m extremely incredulous that this language model technique can actually reason a genuinely novel hypothesis.

A LLM is a series of weights sitting in the ram of GPU cluster, it’s really just a fancy prediction function. It doesn’t have the sort of biological imperatives (a result of being complete independent beings) or entropy that drive living systems.

Moreover, if we consider how it works for humans, people have to _think_ about problems. Do we even have a model or even an idea about what “thinking” is? Meanwhile science is a looping process that mostly requires a physical element(testing/verification) to it. So unless we make some radical breakthroughs in general purpose robotics, as well as overcome the thinking problem I don’t see how AI can do some sort tech breakout/runaway.

ben_w · on Aug 10, 2023

Starting with the end so we're on the same page about framing the situation:

> I don’t see how AI can do some sort tech breakout/runaway.

I'm expecting (in the mode, but with a wide and shallow distribution) a roughly 10x increase in GDP growth, from increased automation etc., not a singularity/foom.

I think the main danger is bugs and misuse (both malicious and short-sighted).

-

> How does an AI language model devise an experiment and observe the results?

Same way as Helen Keller.

Same way scientists with normal senses do for data outside human sense organs, be that the LHC or nm/s^2 acceleration of binary stars or gravity waves (or the confusingly similarly named but very different gravitational waves).

> The language model is only trained on what’s already known, I’m extremely incredulous that this language model technique can actually reason a genuinely novel hypothesis.

Were you, or any other human, trained on things unknown?

If so, how?

> A LLM is a series of weights sitting in the ram of GPU cluster, it’s really just a fancy prediction function. It doesn’t have the sort of biological imperatives (a result of being complete independent beings) or entropy that drive living systems.

Why do you believe that biological imperatives are in any way important?

I can't see how any of a desire to eat, shag, fight, run away, or freeze up… help with either the scientific method nor pure maths.

Even the "special sauce" that humans have over other animals didn't lead to any us doing the scientific method until very recently, and most of us still don't.

> Do we even have a model or even an idea about what “thinking” is?

AFAIK, only in terms of output, not qualia or anything like that.

Does it matter if the thing a submarine does is swimming, if it gets to the destination? LLMs, for all their mistakes and their… utterly inhuman minds and transhuman training experience… can do many things which would've been considered "implausible" even in a sci-fi setting a decade ago.

> So unless we make some radical breakthroughs in general purpose robotics

I don't think it needs to be general, as labs are increasingly automated even without general robotics.

kaba0 · on Aug 11, 2023

> Do we even have a model or even an idea about what “thinking” is

At the least, it is a computable function (as we don’t have any physical system that would be more general than that, though some religions might disagree). Which already puts human brains ahead of LLM systems, as we are Turing-complete, while LLMs are not, at least in their naive application (their output can be feeded to subsequent invocations and that way it can be).

swid · on Aug 11, 2023

I googled whether or not universal function approximators, which neural nets are considered, are also considered Turing complete. It seems the general consensus is kind of not, since they are continuous and can’t do discreet operations in the same way.

But also, that isn’t quite the whole story, since they can be arbitrarily precise in their approximation. Here[0] is a white paper addressing this issue which concludes attention networks are Turing complete.

0: https://jmlr.org/papers/volume22/20-302/20-302.pdf

kaba0 · on Aug 11, 2023

If I’m not mistaken that’s only for arbitrary precision, which is not realistic.

ben_w · on Aug 11, 2023

Is it provably not turning complete? That property pops up everywhere even when not intended, like Magic: The Gathering card interactions.

Technically you may not want to call it Turing complete given the limited context window, but I'd say that's like insisting a Commodore 64 isn't Turing complete for the same reason.

Likewise the default settings may be a bit too random to be a Turing machine, but that criticism would also apply to a human.

kaba0 · on Aug 11, 2023

It is basically a single huge matrix multiplication — you need some form of loop/repetition/recursion to be Turing complete.

Sure it is not a hard property, excel, css with mouse movements, game of life are all that, but they need a “possibly forever running” part.

ben_w · on Aug 11, 2023

ChatGPT does have a loop, that's why it produces more than one token.

In this context, that the possibility of running "forever" would also exclude the humans (to which it is being compared) is relevant — even if we spend all day thinking in words at the rate of 160wpm and .75 words per token, we fall asleep around every 200k tokens, and some models (not from OpenAI) exceed that in their input windows.

kaba0 · on Aug 13, 2023

Yet I can solve many sudoku problems in a single wake cycle.

Also, its output is language and it can’t change a former part of speech, can only append to it. When “thinking” about what to say next, it can’t “loop” over that, only whether to append some more text to it. Its looping is strictly within a “static context”.

imtringued · on Aug 10, 2023

It's not just a series of weights. It is an unchanging series of weights. This isn't necessarily artificial intelligence. It is the intelligence of the dead.

whimsicalism · on Aug 10, 2023

> Yet also, none of the algorithms before Transformers were able to first ingest the web, then answer a random natural language question in any domain — closest was Google etc. matching on indexed keywords.

Wrong, recurrent models were able to do this, just not as well.

Salgat · on Aug 10, 2023

This is definitely current models' biggest issue. You're training a model against millions of books worth of data (which would take a human tens of thousands of lifetimes) to achieve a superficial level of conversational ability to match a human, which can consume at most 3 novels a day without compromising comprehension. Current models are terribly inefficient when it comes to learning from data.

famouswaffles · on Aug 10, 2023

Modern LLMs are nowhere near the scale of the human brain however you want to slice things so terribly inefficient is very arguable. also language skills seemingly take much less data and scale when you aren't trying to have it learn the sum total of human knowledge. https://arxiv.org/abs/2305.07759

Salgat · on Aug 10, 2023

Scale is a very subjective thing since one is analog (86B neurons) and one is digital (175B parameters). Additionally, consider how many compute hours GPT 3 took to train (10,000 V100s were set aside for exclusive training of GPT 3). I'd say that GPT 3 scale vastly dwarfs the human brain, which runs at a paltry 12 watts.

kaba0 · on Aug 11, 2023

Neumann’s Computer and The Brain book is way out of date in terms of today’s hardware, but funnily it is still relevant in this metric. Biological systems are more analogous to a distributed system of small, very slow CPUs. Even GPUs that somewhat close the gap in-between the few, crazy fast CPUs vs the aforementioned many, slow ones - are still much faster than any one neuron in calculations, but are still overly serial. It is not the number of CPUs, but the number of their connections that make biological systems so powerful.

Salgat · on Aug 14, 2023

Parameters have many connections too though. If the next layer is 1000 parameters wide, you have potentially 1000 connections from a single parameter.

whimsicalism · on Aug 10, 2023

You have to count the training process from the origin of the human brain imo, not from the birth of any individual human.

Neural nets look much more competitive by that standard.

Salgat · on Aug 11, 2023

Yet humans designed the models, so the training process for chat gpt etc includes human evolution by your logic.

whimsicalism · on Aug 11, 2023

This is a good point and the level of so-called task specific "inductive bias" in models is an active point of discussion, but I don't think it is fair to add all of our evolution to the model inductive bias because most of evolution was not towards giving better understanding of language to the model, it was towards better understanding of language in humans.

imtringued · on Aug 10, 2023

They are inefficient by design. Gradient descent and backpropagation scale poorly, but they work and GPUs are cheap, so here we are.

crdrost · on Aug 10, 2023

And there have been a lot of approaches to do this, my favorite one being the idea that maybe if we just randomly zap out some of the neurons while we train the rest, that forcing it to acquire that redundancy might privilege structured representations over memorization. Just always seemed like some fraternity prank, “if you REALLY know the tenets of Delta Mu Beta you can recite them when drunk after we spin you around in a circle twelve times fast!”

two_in_one · on Aug 11, 2023

> just randomly zap out some of the neurons while we train the rest

It's already done: https://pytorch.org/docs/stable/generated/torch.nn.functiona...

whimsicalism · on Aug 10, 2023

https://nitter.net/Yampeleg/status/1688441683946377216

kaibee · on Aug 10, 2023

> But the best cure for over-fitting is to make the dataset larger and ensure data diversity.

This is also good life advice.

BaseballPhysics · on Aug 10, 2023

The human brain has synaptic pruning. The exact purpose of it is theorized but not actually understood, and it's a gigantic leap to assume some sort of analogous mechanism between LLMs and the human brain.

pcwelder · on Aug 10, 2023

Afaik weight decay is inspired from L2 regularisation which goes back to linear regression where L2 regularisation is equivalent to having gaussian prior on the weights with zero mean.

Note that L1 regularisation produces much more sparsity but it doesn't perform as well.

nonameiguess · on Aug 10, 2023

This. Weight decay is just a method of dropping most weights to zero which is a standard technique used by statisticians for regularization purposes for decades. As far as I understand, it goes back at least to Tikhorov from 1970 and was mostly called ridge regression in the regression context. Normal ordinary least squares attempts to minimize the L2 norm of the squared residuals. When a system is overdetermined, adding a penalty term (usually just a scalar multiple of an identity matrix) and also minimizing the L2 norm of that biases the model to produce mostly near-zero weights. This helps with underdetermined systems and gives a better conditioned model matrix that is actually possible to solve numerically without underflow.

It's kind of amazing to watch this from the sidelines, a process of engineers getting ridiculously impressive results from some combo of sheer hackery and ingenuity, great data pipelining and engineering, extremely large datasets, extremely fast hardware, and computational methods that scale very well, but at the same time, gradually relearning lessons and re-inventing techniques that were perfected by statisticians over half a century ago.

tbalsam · on Aug 10, 2023

L1 drops weights to zero, L2 biases towards Gaussianality.

It's not always relearning lessons or people entirely blindly trying things either, many researchers use the underlying math to inform decisions for network optimization. If you're seeing that, then that's probably a side of the field where people are newer to some of the math behind it, and that will change as things get more established.

The underlying mathematics behind these kinds of systems are what has motivated a lot of the improvements in hlb-CIFAR10, for example. I don't think I would have been able to get there without sitting down with the fundamentals, planning, thinking, and working a lot, and then executing. There is a good place for blind empirical research too, but it loses its utility past a certain point of overuse.

whimsicalism · on Aug 10, 2023

this comment is so off base, first off no l2 des not encourage near 0 weights, second off they are not relearning, everyone already knew what l1/l2 penalties are

greenflag · on Aug 2, 2023

Interestingly, on Monday a preprint [1] was posted calling into question a major Nature paper from 2020 that associated the microbiome with multiple cancer types [2] (though within each tissue sample, not the gut).

[1] https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 [2] https://www.nature.com/articles/s41586-020-2095-1

greenflag · on June 7, 2023

Does anyone have high level guidance on when (deep) RL is worth pursuing for optimization (e.g. optimizing algorithm design) rather than other approaches (e.g genetic)?

AndrewKemendo · on June 7, 2023

Less of a scale problem than a type problem usually in my experience.

My rule of thumb is when it’s easy to specify a reward function but infinite ways to traverse the action space - versus having a constrained state and action space (small n solution traversal pathways) and only a few possible paths to traverse.

jeffbee · on June 7, 2023

Start with a planet-scale computer that makes the marginal cost of RL be nearly zero, and at the same time spend a lot of money on hashing and sorting so the micro-optimization pays off.

greenflag · on March 1, 2023

The divide between literature/arts and STEM feels related to a push in modern culture that every moment must be productive (or in some sense profitable), though I have a hard time unpicking whether things really did used to be “better” or this is just getting older and realizing how the world works.

8f2ab37a-ed6c · on March 1, 2023

When college, medical and housing are as expensive as they are in the US, is it a surprise that people want to do everything in their power to avoid becoming destitute? There is no obvious path to middle class prosperity with a degree in English, but you'd almost have to try to not reach a comfortable living as a programmer.

greenflag · on Feb 13, 2023

Anecdotal, but as a pre-smartphone teenager I was a night owl which has gradually vanished despite the introduction of smartphones, so I think there's more at play

exfatloss · on Feb 13, 2023

Same here. Always a pretty harsh night owl. The arrival of phones didn't change much for me. It is probably a factor for the average student, though. Question is how much vs. the natural circadian rhythm shift.

greenflag · on Feb 13, 2023

Likely going to be a wave of research/innovation "regularizing" LLM output to conform to some semblance of reality or at least existing knowledge (e.g. knowledge graph). Interesting to see how this can be done quickly enough...

mvcalder · on Feb 13, 2023

It will be interesting to see what insights such efforts spawn. For the most part LLMs specifically, and deep networks more generally, are still black boxes. If we don't understand (at a deep level) how they work, getting them to "conform to some semblance of reality" feels like a hard problem. Maybe just as hard as language understanding generally.

kneebonian · on Feb 13, 2023

> Likely going to be a wave of research/innovation "regularizing" LLM output to conform to some semblance of reality or at least existing knowledge

This is a much more worrying possiblity, as there are many people who have at this point chosen to abandoned reality for "their truth" and push ideas that objective facts are inferior to "lived experiences". This is a much bigger concern around AI in my mind.

“The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command.” ― George Orwell, 1984

vore · on Feb 13, 2023

As fun as quoting 1984 is, there is a huge gap between that and just not making up the winner of the Super Bowl so confidently.

visarga · on Feb 13, 2023

Probably the hottest research trend in 2023. LLMs are worthless unless verified.

whimsicalism · on Feb 13, 2023

Really? I already get a huge amount of value out of LLMs even if they hallucinate.

Or is this just HN tendency towards hyperbole?

visarga · on Feb 13, 2023

Interesting, care to give an example? Exclude fiction, imagination and role playing, where hallucination is actually a feature.

visarga · on Feb 14, 2023

coming back with a link: https://mobile.twitter.com/ylecun/status/1625554772098002944

this tween from Yann LeCun came after my message was posted