"I am not sure how many people will run AI models locally. It still seems like a...

root-parent · 2026-06-06T14:21:42 1780755702

"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games..."

This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"

throw0101a · 2026-06-06T14:57:34 1780757854

> This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"

Digging into this:

> In conclusion, there is evidence that Ken Olsen did doubt the need for computers in the home, but the evidence is based primarily on the testimony of David Ahl who was perturbed when the personal computer project he championed at DEC was not supported by Olsen in 1974.

> Olsen’s resistance may have been similar to that expressed by another DEC executive, Gordon Bell. In 1980 Bell thought home terminals would act as gateways to remote computers which would provide appropriate services.

* https://quoteinvestigator.com/2017/09/14/home-computer/

It was supposedly said in 1977: most computers at that time were not small, and so it would not be surprising that people would not expect the general public to desire a large, power-hungry, noise-y apparatus in their house.

wccrawford · 2026-06-06T17:22:14 1780766534

That's exactly the point. Until recently, AI models that could run on home machines were so bad that it was very hard to imagine anyone wanting to.

And, like the overly large machines of 1977, models are getting faster, leaner, and better. It's happening a lot quicker, though.

Silhouette · 2026-06-06T19:01:03 1780772463

This is why I'm bearish on Anthropic, OpenAI, and friends. I am not confident that we will continue to see the same pace of improvement in frontier model capabilities as we have seen over the past year or two - not using similar mathematics at least. But I think that getting results that are close enough to the same standard to be a realistic substitute but in a model small enough to run locally may well happen quite quickly. And if it does - where is the moat to defend these AI organisations with their astronomical budgets when they're already starting to price more realistically and that's already killing a lot of the hype they've enjoyed until very recently? They have an accidental moat because they bought up the global supply chain for storage but that surely isn't going to last once the data centres to hold that storage are becoming liabilities.

api · 2026-06-06T19:37:58 1780774678

If model performance asymptotes and CPU/GPU and RAM keep growing, even slowly, then eventually we will have frontier models on desktop that are totally competitive with hosted. It’s only a matter of time.

You already can if you’re willing to spend many thousands of dollars on a beast of a machine. I’m talking about middle tier desktops and laptops here. Maybe eventually even phones.

The only way hosted stays strongly competitive in that world is if they can keep pushing the frontier or by playing the classic social media and SaaS games of network effect building and integrations.

Many people might still use hosted, of course, but what I really mean is that their multiples won’t be justified and they will have little to no moat. AI will become commoditized, like a sophisticated next generation form of an encyclopedia with search.

throw0101a · 2026-06-07T15:02:27 1780844547

> This is why I'm bearish on Anthropic, OpenAI, and friends.

Just because you can do more and more things at home (thanks Moore and Dennard), doesn't preclude needing things also done remotely. The number of at-home systems seems to have fed a growing number of remote systems (especially once always-on connectivity became ubiquitous).

It's basically the angle Apple is going for: do as much locally (for the sake of privacy), and then offload when it becomes "too much".

Silhouette · 2026-06-09T02:03:35 1780970615

I agree that one doesn't preclude the other. But the sky high valuations we've been seeing for the AI industry recently can only be justified if they bring about a fundamental change in our society and those companies continue to bring in the lion's share of the resulting profits. I don't see why everyone else in our society - particularly other large businesses with lots of money to invest - is going to play a game by the AI companies' rules once they can take their ball and go home and still have most of the fun without paying much for it by comparison.

kristov · 2026-06-06T16:54:01 1780764841

We kinda ended up with terminals connected to mainframes anyway. The terminal being the web browser, and the mainframe being SaS. So it wasn't that far off.

supermatt · 2026-06-06T17:41:04 1780767664

the network is the computer

parineum · 2026-06-06T15:45:24 1780760724

It doesn't really need this much explanation.

People take these quotes out of context all the time. Said in a business context, there was no need, at that time, for someone to have a personal computer.

There's no business justification in 1977 for a personal computer department at a business. It's similar to the gates quote about RAM (I think it was 64KB?).

These statements aren't meant to be forever quotes. Their business plan quotes.

michaelcampbell · 2026-06-06T16:37:00 1780763820

> It's similar to the gates quote about RAM (I think it was 64KB?)

640, and Bill Gates said he either never said that, or at least never remembered having said it. I think there is no evidence anywhere that he did.

https://www.computerworld.com/article/1563853/the-640k-quote...

Shorel · 2026-06-06T18:37:35 1780771055

That exact quote? No, never. He said something like: current computers at the time had 64kb of RAM, so the OS was designed with a limit of 640kb, and he believed this would give them 10 years of future proofing. As it happened, that limit was reached much faster, in about 6 years.

valleyer · 2026-06-07T08:53:32 1780822412

MS-DOS didn't create that limit; the physical memory map of the 5150 did. So Microsoft (and Gates) would not have made that decision.

Shorel · 2026-06-08T12:17:36 1780921056

You are right. The quote must have been slightly different then. I'm sure about the 10 years part.

glimshe · 2026-06-06T16:00:02 1780761602

Or maybe he simply made a mistake. Big deal. This doesn't speak negatively of his other achievements.

shermantanktop · 2026-06-06T16:17:24 1780762644

He had a long career and presumably many successes, and is fallible like the rest of us. But a half-remembered zinger with no context makes for zippier posts I guess.

The early popularity of Minitel, the continued popularity of ssh/tmux, and the web browser itself indicates that bespoke client applications are not the only way. He wasn’t directionally wrong.

wslh · 2026-06-06T16:42:10 1780764130

The simple explanation is that predicting the future is generally impossible. It doesn't matter if it's Olsen or anybody else.

dakolli · 2026-06-06T20:18:51 1780777131

I will not be spending thousands in hardware to run the worlds most mediocre llms at meh speeds. Sorry. I know for llm bros they think every output made by an LLM is magic, like every NFT guy thought every NFT collection was game changing, but there's nothing useful you can do with llms and 128gb of RAM (and there never will be) unless you have llm psychosis. Who cares.

Gigachad · 2026-06-07T00:04:45 1780790685

Nothing isn't quite right but you wouldn't be using it like the hosted ones. 128gb is more than enough to run models to index my files and photos, denoise photos / AI photo masking, magic eraser type tasks for images, frame generation for gaming, etc.

Even for a lot of LLM type tasks, 128gb is likely more than enough to control a lot of PC configuration and automation with natural language.

joering2 · 2026-06-06T16:12:14 1780762334

or "640K ought to be enough for anybody."

shermantanktop · 2026-06-06T16:21:52 1780762912

https://quoteinvestigator.com/2011/09/08/640k-enough/

Nobody ever said that, at least not as an assertion or prediction. The actual instances of similar language are from multiple people describing their earlier thoughts before they learned it wasn’t true.

throw1234567891 · 2026-06-06T16:21:19 1780762879

There’s no public proof this has ever been said, and if it was, if it was not taken out of context.

DonHopkins · 2026-06-06T17:46:27 1780767987

I have that many browser tabs.

fg137 · 2026-06-06T16:23:39 1780763019

You seriously think running LLM is the same thing as general computing?

ako · 2026-06-06T17:37:17 1780767437

It’s better, it’s useful even for those who don’t have a deep knowledge of computers. I’d expect more AI users than programmers, than ms-word users, than excel users.

fg137 · 2026-06-07T11:40:27 1780832427

You are confusing "using AI" with "running LLM locally".

AaronAPU · 2026-06-06T14:28:48 1780756128

That’s too strong of an assertion.

Local models aren’t deterministically equivalent in capabilities to foundation models. Home computers are turing complete; just like a mainframe. They are just slower. Often not slower enough to matter.

sandworm101 · 2026-06-06T14:57:58 1780757878

Most people are ok with slower. An AI that lets you edit a family picture, in say 30 seconds, locally is preferable to one that is instantaneous but requires you to submit that picture to examination/storage/training/sale in someone else's AI ecosystem. If i want to crop my ex out of family photos, i should not have to first give that photo to Microsoft. If want an LLM to write a book report for me, i dont want it also alerting my school. And if i write a memo for a client, and i want an LLM to check the spelling, i dont want that memo leaked either.

Pxtl · 2026-06-06T16:32:13 1780763533

I'd like to think so but the existence of Google and Apple and Microsoft's cloud based photo tools with phone integration suggests that's false.

You could run a pretty good home server on $50 of gear and yet we never saw any real adoption of OwnCloud/NextCloud style products as an alternative to Google Drive/Photos or Apple Cloud.

Why should LLM/Transformers be any different? Especially when you need a proper expensive GPU to run them instead of a Raspberry Pi?

thewebguyd · 2026-06-06T16:43:26 1780764206

Apple's photo tools run on device, and they'll probably ship more on device foundation models at WWDC too.

On-device AI is going to be important, I think. It doesn't have to take the form of a chatbot UI to be useful.

com2kid · 2026-06-06T17:44:30 1780767870

After the latest round of cloud storage price increases my non technical wife has been asking if we can do local backups instead...

parineum · 2026-06-06T15:47:25 1780760845

> Most people are ok with slower. An AI that lets you edit a family picture, in say 30 seconds, locally is preferable to one that is instantaneous but requires you to submit that picture to examination/storage/training/sale in someone else's AI ecosystem.

Maybe if you ask them that question, but if you show them two products, they'll definitely prefer the faster one. 30 seconds is a long time to watch a progress bar.

spwa4 · 2026-06-06T16:44:06 1780764246

Plus there's the other question. If this thing is slower ... what's the price? The desktop/mini-pc version of this is $3000, after all. At this performance level what is an acceptable price for the laptops?

People definitely aren't going to accept more expensive + slower ...

sandworm101 · 2026-06-06T17:46:37 1780767997

Fast and public, or slow and private. Not everyone wants, or is allowed to, share their data with the AI world. And do not doubt that every bit shared with an AI service will be used for training.

parineum · 2026-06-06T20:37:19 1780778239

The question here is about markets though. Not everyone wants x but if the vast majority of people want y, x is going to be niche and expensive.

You don't think the commercials of Google's AI photo features aren't going to have an impact on Apple users of their phones can do a worse version of that feature and it takes longer?

robotresearcher · 2026-06-06T18:59:35 1780772375

It’s completely technically possible to have cloud services where customer data is opaque to the provider. Some of Apple’s services are like this already, for example.

I think there’s a sweet spot currently with munging your data blindly on the server so that your client device battery still lasts all day.

Meanwhile Apple and others push on with making client side models more efficient so that eventually the server costs and complexities go away.

fg137 · 2026-06-07T11:48:01 1780832881

This.

If asked to choose between photo editing done within 3s using cloud provider vs an average of 30s using local compute, most consumers will choose the former without hesitation.

Most users' usage is also going to fall nicely in the free tier of a typical freemium pricing model, like ChatGPT today.

People who talk endlessly about local inference have no idea about user workflows and usability.

dominotw · 2026-06-06T20:12:27 1780776747

dont want to share my pics with "cloud services"

wolvesechoes · 2026-06-08T10:27:14 1780914434

You may not, but experience shows that most people are just fine sharing the most personal stuff not only with cloud services, but with hole world through anti-social media.

flatline · 2026-06-06T15:48:12 1780760892

The HN crowd is, by and large, not the target audience for his self promotion. I guarantee there is one and this is more or less effective.

smcleod · 2026-06-06T16:19:35 1780762775

Qwen 3.6 is far ahead of Gemma for most (but not all) things. I've deployed it out across a number of M5 MacBooks and it's genuinely useful for many tasks. It won't replace an Opus or current gen Sonnet sized model but it's still amazingly good for its size and probably as good as or just a bit before Sonnet 4 era. Far more reliable for tool calling, coding, agentic tasks and faster than the Gemma models especially with MTP.

zozbot234 · 2026-06-06T17:02:24 1780765344

Qwen 3.6 is a toy compared to DeepSeek V4 Flash or Pro. These models can now run on Apple Silicon hardware with as little as 32GB RAM for the Flash (with 2-bit quant, which is still quite capable) using SSD offloading, with just-about-reasonable performance for interactive use, and far better performance on longer contexts than Qwen (due to the more efficient KV cache/attention mechanisms in DeepSeek).

Very significant improvements may be viable for unattended inference via large-scale batches, which can reuse sparse experts and thereby mask some of the latency involved - this is quite unique to DeepSeek, again due to its efficient KV cache.

greenavocado · 2026-06-06T17:16:50 1780766210

Qwen 3.6 27B still curb stomps Deepseek V4 in coding

epolanski · 2026-06-06T17:50:06 1780768206

1. Deepseek V4 is still in preview (training is not finished)

2. Qwen is much more demanding and borderline unusable on consumer hardware because it's a dense model. The 27B parameters are active all time for each token. It's not a MoE architecture where a router activates only some of them.

3. Qwen doesn't like quantization at all.

kgeist · 2026-06-06T18:39:48 1780771188

I have to disagree with most claims. I run Qwen3.6-27b at 260k context and 40-60 tok/sec. It handles most coding problems as well as Sonnet 4.6 under OpenCode on our production tasks. (As an experiment, I run the same prompts for the same issues in parallel for Qwen 3.6 and Sonnet 4.6 and usually see little difference in performance). I see zero degradation from quantization in practice.

Settings: RTX 5090, 5-bit weights (Unsloth), FP8 KV cache.

Last time I tried running large MoEs on this PC, they had inferior quality at 2-3 bits compared to much smaller dense models at 5-6 bits, and were slower anyway.

zozbot234 · 2026-06-06T19:23:11 1780773791

A 260k context (close to the stock maximum for Qwen, though it's possible to extend it) will take ~16GB RAM for storing the KV cache, barring quantization tricks which severely degrade quality. That's a whole lot more than what DeepSeek requires for a similar context length, and makes it infeasible to batch multiple inferences together. This used to be the status quo for consumer inference, in fact it still is for models like Kimi and GLM (which can sometimes be smarter than even DeepSeek V4 Pro!) but we can also do better nowadays.

ColonelPhantom · 2026-06-06T20:45:48 1780778748

Deepseek V4 Flash still has 13B active params though? That is about half as many as Qwen3.6-27B (and much more than Qwen3.6-35B-A3B). Given that RAM (even on a base M4 or 'regular' Intel/AMD system) is like an order of magnitude faster than an SSD, even Qwen 27B running from RAM will be much faster than any Deepseek V4 model with SSD offloading. And the MoE will be much faster still.

Qwen 27B is also small enough to completely fit in a high-end consumer or mid-end pro GPU, like an RTX 5090 or Radeon PRO R9700. I found results claiming 30 tokens per second generation for 27B(-Q4_K_XL) on an R9700. I doubt you get more than 5 tokens per second doing SSD MoE streaming.

Even for relatively short contexts, I honestly already find the ~30B class MoE models to be only borderline acceptable in terms of speed on my laptop (Ryzen 7 7840U, 64 GB LPDDR5-6400), though I use Gemma 4 26B-A4B more than Qwen3.6 35B-A3B.

zozbot234 · 2026-06-06T21:05:09 1780779909

> even Qwen 27B running from RAM will be much faster than any Deepseek V4 model with SSD offloading.

If you have reasonable amounts of RAM to cache the most likely experts, that's not true at all. Qwen 27B is marginally faster on a nearly empty context, then falls behind as context length increases due to the different attention mechanisms. Prefill for Qwen is much faster, but you're still comparing vastly different model sizes and capabilities. DeepSeek Flash is the best deal overall.

> completely fit in a high-end consumer or mid-end pro GPU

Or you could fit the dense portion of a much more capable model and still take advantage of that hardware.

ColonelPhantom · 2026-06-06T21:42:49 1780782169

> the most likely experts

Is that how MoEs work? I though that an important constraint for MoEs is that experts need to be uniformly used to make sure they can be used effectively. If there is a 'common subset' that, if anything, sounds like a symptom of undertraining (i.e. the same trick will not work as well for Deepseek V4.1).

Also, even if your MoE hitrate is 90%, you still spend half your time waiting for the SSD, giving similar total speed to a 27B model!

Finally, it looks like Deepseek V4 is pretty much only runnable with antirez's ds4, and SSD streaming only works with Metal; but I would like to try what you say with llama.cpp which uses mmap to also potentially do SSD streaming. (I can maybe try the large Qwen3.5 MoEs?)

> as context length increases

What kind of context length do you consider reasonable, though? From what I know, all models (even frontier ones) start degrading once you pass a few hundred thousand tokens. So realistically, limiting context size might even improve quality, especially if you use token-efficient harnesses.

> Or you could fit the dense portion of a much more capable model and still take advantage of that hardware.

Your point about consumer hardware was that it would be "borderline unusable" when running Qwen 3.6 27B. However, you need much less hardware to run a 27B than DSv4 Flash. In addition, you can do the same 'trick' with low-end GPUs and small MoEs: my desktop with 32 GB DDR4-3200 and an RTX 2070 8GB can run the ~30B class MoEs at 20-30 tokens per second and similar speeds to my laptop.

zozbot234 · 2026-06-06T22:01:23 1780783283

> Is that how MoEs work?

For any given workload/session? Empirically, yes, that's what has been found across different models. There's quite a bit of predictability that makes caching helpful.

> Also, even if your MoE hitrate is 90%, you still spend half your time waiting for the SSD, giving similar total speed to a 27B model!

There are ways of masking some of that latency, though it requires some architecture-specific cleverness which is less directly applicable to a generic engine like llama.cpp.

> Finally, it looks like Deepseek V4 is pretty much only runnable with antirez's ds4, and SSD streaming only works with Metal

The llama.cpp folks are working on adding support, and the ds4 project is working on CUDA support for streaming inference, targeting the DGX Spark.

> From what I know, all models (even frontier ones) start degrading once you pass a few hundred thousand tokens.

DeepSeek V4 seems to do quite well on recall tasks even with large context. That's one plausible benefit of its compressed attention mechanism, compared to earlier models. Some degradation will likely still be there, but it's not necessarily obvious.

As for why people are calling Qwen 27B "borderline unusable" that may have to do with it being a dense model which makes for an increased compute intensity and pushes users towards discrete GPU platforms, since those tend to have the most compute overall as far as consumer hardware is concerned. I might agree that Qwen 27B is quite ideally tailored towards these platforms, but that does come with some limitations.

trollbridge · 2026-06-06T18:41:19 1780771279

You can run the 35B A3B model which is an MoE. Runs great on a 5090.

Pxtl · 2026-06-06T16:28:09 1780763289

I've got a Qwen 3.5 running on a 12GB 3060 and it's dumb as a stump but still smart enough to get some useful work done. Since it's my daily driver desktop I havent jumped to 3.6 since last time I did I quickly ran out of vram and locked the desktop environment.

But yeah, the Qwen line is pretty impressive on commodity hardware.

derefr · 2026-06-06T16:43:46 1780764226

I must be using LLMs very differently than y'all, because I can't think of a single thing I would rely on an LLM that's "dumb as a stump" to do for me.

To me, LLMs are for asking research questions + exploring design spaces + pointing at codebases to investigate bugs. And those all benefit from the model being as "smart" (in terms of both fluid intelligence and burned-in knowledge) as possible.

I'm guessing there exist problems where "intelligence past a certain point" doesn't matter, so these medium-sized models can match the performance of the bigger models. But what problems might those be?

Pxtl · 2026-06-06T21:05:38 1780779938

Things that are tedious but simple but I'm unfamiliar with.

"Go add a gh action to compile and deploy this thing and run its tests" is one I've found it's good at. Yes I know how to make a gh pipeline but it's always a hassle to remember what goes where.

Cranking out unit tests is okay. It's good at summarizing things so it's not half bad at writing jsdoc/xmldoc comments.

epolanski · 2026-06-06T17:48:35 1780768115

Qwen suffers quantization a lot, rendering it borderline unusable.

unmole · 2026-06-06T13:54:47 1780754087

> you may run some models locally if only from a cost perspective

I have a hard time believing running a model on a laptop will be cheaper than running it in a datacenter. Why wouldn't economies of scale apply here as with every other computation?

TylerE · 2026-06-06T16:25:47 1780763147

Because economy of scale isn't really the right metric here. A machine you were you were going to buy anyway essentially has a TCO of $0.

dofm · 2026-06-06T18:17:42 1780769862

AI models will pretty undeniably affect your electricity bill; yes you already own the computer, but it will cost more to run it if it's doing inference!

TylerE · 2026-06-06T21:28:22 1780781302

To a point, but we're talking a laptop, not a server farm. Even if you're going fullbore wide open 24/7 that's about $150/yr in electricity bills at average rates. Not quite nothing but in terms of AI costs that's pretty close to rounding to zero.

wazdra · 2026-06-06T14:25:57 1780755957

This is assuming that you'll be priced the fraction of computing that you consumed. But you are actually paying for their infrastructure, for the R&D (and also the computation that went into training the model) etc. It is not clear that, for your own small computations, this kind of costs are needed, but you will still pay your share in the investment the provider made so that they could serve everyone's computation needs.

hungryhobbit · 2026-06-06T15:58:24 1780761504

But, currently ... you're not. AI companies are operating at a loss, and are being subsidized by their investors.

Local may or may not be cheaper than remote now, depending on the details, but the factors you describe won't affect the math nearly as much as they will once that subsidization ends.

dannyw · 2026-06-07T01:12:48 1780794768

Not for API pricing. The latest models are not subsidised API wise anymore.

Qwen3.6 is practically indistinguishable to Sonnet 4.6 at least in my personal experience. And sonnet 4.6 is not that cheap.

wjnc · 2026-06-06T14:50:27 1780757427

In that analogy bigtech AI is currently investing in cleaner air for all of us? We _could_ breath it through their hose, but might as well breath it outside.

zozbot234 · 2026-06-06T17:08:04 1780765684

The datacenter setting has huge economies of scale for low-latency, just-in-time inference using extremely large models, but that's not the only viable use of AI. Batched, unattended inference of possibly smaller and weaker models, while theoretically viable in a datacenter setting, is far from the best use of that hardware. This is where local AI is at its best.

dgellow · 2026-06-06T14:01:07 1780754467

A laptop is really a pretty bad form factor to run LLMs. Worst cooling, more expensive memory that you cannot replace, resell value depreciating fast. It’s fine for tinkering, small scale research, and demos but it’s definitely niche.

The vision NVIDIA is selling is pure marketing IMHO

lrae · 2026-06-07T03:46:53 1780804013

Does it apply for every other computation? Purely for the computation part? You can host all kinds of things locally cheaper right now than in the cloud, no? (At least pre memory price hikes.) It does, of course, come with its downsides like availability/reliability, less convenience, scaling options,..., but purely the computing price - I don't see why it wouldn't be cheaper in the future - at least for some use cases.

itishappy · 2026-06-06T16:56:52 1780765012

It's cheaper for the AI provider to use your laptop instead of their datacenter.

jerf · 2026-06-06T16:04:50 1780761890

What "every other computation"? I seem to have a lot processing power at my disposal here, between my cell phones, laptops, gaming PCs, various other hardware devices.

You're going to need to analyze the problem much more deeply because it sound like the standards you are implicitly applying would result in "economically, everything should be centrally hosted" but that is clearly not the result that obtains. Even a modern mid-grade cell phone is no slouch; you may not be running a current-gen frontier AI on it but you certainly can do a lot of other rather intense things locally that would have been laughable 10 years ago, like suprisingly high powered games.

strictnein · 2026-06-06T17:57:28 1780768648

I also don't get why this twitter user is linked here, versus all the news articles about this new hardware that have been everywhere over the past number of days.

latch · 2026-06-07T05:53:38 1780811618

I also dislike his self-promotion, but his work _is_ well know and, as far as I know, well looked upon. I think he has more expertise and knowledge in this area than most (including what you'd find in the news).

strictnein · 2026-06-08T19:19:02 1780946342

Ahh, thanks, that explains things a little more. I wasn't familiar with the author, and his tweet just read like one of those people on Linkedin who regurgitates knowledge and passes it off as their own insight.

bespokedevelopr · 2026-06-06T16:07:35 1780762055

The security aspect is the main driver why I’m seeing so many businesses investing in local hardware. They know the models aren’t as good (caveat that they also can’t run Chinese models) and that’s ok. Places that really care about security and data governance already aren’t on the bleeding edge. They wait for the nice stable lts version, they lock down dev machines in frustrating ways and have lots of IT admin layers.

But they also want to taste the sweet fruit of AI so the only way to do this that a CISO will approve is on local air gapped hardware. It’s a niche but still a billion dollar niche.

thewebguyd · 2026-06-06T16:45:31 1780764331

Microsoft is working on this with their new execution containers (https://github.com/microsoft/mxc)

unstatusthequo · 2026-06-06T16:20:31 1780762831

I hope a family-level AI appliance is a thing later. Local non-cloud assistant that lives in the house, families interact via voice or phones or whatever. Knows the contextual family stuff you need, etc.

Pxtl · 2026-06-06T16:34:41 1780763681

We didn't get people buying family-level file servers for the family photo gallery and documents at any real scale, so i doubt we'll see similar for AI especially when the cost is that much higher for GPUs vs an SBC machine.

JMiao · 2026-06-06T20:02:12 1780776132

because nas hardware and software suck and everything else was a poorly executed subscription product...i think one was called helm, another was by early twitter alumni. imagine a home device that manages and maintains itself and is a joy to interact with.

Pxtl · 2026-06-06T21:02:36 1780779756

And why would the hypothetical "OwnAI" product be any different?

JMiao · 2026-06-06T21:37:09 1780781829

not automatically, but a meaningful step up in ease of use (managing photo/video backup from all family devices) without a subscription would be a solid foundation

sandworm101 · 2026-06-06T13:54:53 1780754093

Lots of people are already running AI locally. They are the people buying up all the consumer-grade nvidea gpus. What are they doing with them? Well, the same things people with home media or email servers are doing: stuff they dont want to share with the general public.

Zetaphor · 2026-06-06T14:23:57 1780755837

I want to reduce my dependency on companies like Google, OpenAI, and Anthropic. Aside from the concerns of data sharing I'm also not a fan of how they run their operations, for example Anthropic now using xAI's Colossus data center which is poisoning a marginalized community, or OpenAI getting in bed with the military.

Not everything I want to use an LLM for requires "PhD level intelligence", and increasingly I'm finding more uses that involve sharing my personal data.

Yesterday my local model helped me when looking for a doctor who is in-network for my insurance. I threw it a screenshot from the providers search results and it looked up reviews for all of them.

sandworm101 · 2026-06-06T15:06:37 1780758397

My local AI is currently upscaling an old british comedy from sub-dvd quality to 1k. (It is not availible other than on DVD.) It looks like it will take about a week for my pair of 5060s to chew through the task.

eszed · 2026-06-06T15:55:07 1780761307

Which show?

sandworm101 · 2026-06-06T17:09:58 1780765798

Chelmsford 123

I own the DVDs so I'm OK upscaling/editing my own copies for my own use. But if I ran the task on an ai service I would no doubt trigger copyright issues.

pratnala · 2026-06-06T15:17:10 1780759030

Which model are you running?

Zetaphor · 2026-06-06T16:00:02 1780761602

Qwen 3.6 35B-A3B and 27B both at Q8 on a Strix Halo machine

bredren · 2026-06-06T18:58:23 1780772303

> not even considering business security.

I suspect personal privacy and need to run AI workflows to handle the litany of administration tasks of a household will be what result in regular need for local AI.

Apple is already out front with this on a personal, individual level, but they are not obviously headed toward multiuser/family-level ~biz admin with a persistent server running local LLM.

epolanski · 2026-06-06T17:47:57 1780768077

DeepSeek Flash v4 is the leading local AI on 128GB machines, and DS4 is still in preview (training not finished), no?

Especially on Dwarfstar.

voidfunc · 2026-06-06T15:45:33 1780760733

> "Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers"

This made me laugh. I can only image how insufferable this person is to deal with.

falsemyrmidon · 2026-06-06T16:31:31 1780763491

> this guy puts this everywhere, gives me probably the inverse of what he is marketing for.

Do you think he's in mensa too?

cyanydeez · 2026-06-06T16:12:54 1780762374

128GB seems the sweet spot for local models. I can program and install most GitHub projects with opencode and QWEN 32b with mtp.

anyone whose addicted to token theoughput is losing the operational knowledge and offline capabilities.

if you arent moving to the AMD 395 or MACs then youre hitching aride on the expensive calory ride

throw1234567891 · 2026-06-06T16:24:04 1780763044

If you could buy a 256GB you’d be claiming that 256GB is a sweet spot. But I agree with you. Crack-tokens are not the future.

cyanydeez · 2026-06-06T16:57:59 1780765079

no, the fact that MACs and x86 and soon ARM are all going to have 128GB models in every sector, yeah, sure.

But watching everyone flounder because claude goes down or forcing you on API costs.

I'm programming things that'd take me days with a PC that, without OpenAI's VRAM shenagans, would cost you $2k.

It's more than just 'this is what I could do' it's definitely about 'this is what anyone could do with a new PC purchase'.

throw1234567891 · 2026-06-06T17:08:17 1780765697

You must be unaware that System76 was already selling 192GB machines, mac studios used to be 512GB max. The only reason why we don’t have them anymore is that we are in RAM shortage.

cyanydeez · 2026-06-06T17:40:41 1780767641

I'm aware you can have more. the term "SWEET SPOT" references a area that anyone/everyone can get to and isn't some magical expensive unicorn.

You're doing what the IT industry has been addicted to for decades: number goes up.

throw1234567891 · 2026-06-06T18:41:05 1780771265

> You're doing what the IT industry has been addicted to for decades: number goes up.

No, I have a hands on experience with bigger models, and understand the advantages of using them.

cyanydeez · 2026-06-06T20:16:06 1780776966

you mean you're addicted to not understanding anything you do. That's fine. The rest of us arn't going to experience the glory of api bills going up.

You also probably believe you need to 'escape the permanent underclass'

throw1234567891 · 2026-06-06T20:36:08 1780778168

You assume I use a subscription. There are other options but they require more than 128GB unified RAM. You also assume a lot about how I work. And those final assumptions about what and how I think of others speak more about your anxieties rather than what I think.

You assume a lot. Sometimes it’s good to simply ask a question.

speed_spread · 2026-06-06T18:48:00 1780771680

Those 192GB aren't unified memory though. 128GB on Mac or 395 can be used by both CPU and GPU. It's the GPU + large memory that opens up fast local LLM inteference.

throw1234567891 · 2026-06-06T18:56:40 1780772200

Yes, true. But if we had the ability to buy that much RAM in the laptop, everyone would be looking in that direction. Until this thing discussed here comes to the market, “we didn’t have computers with unified 128GB RAM either” (except of macs).

GeekyBear · 2026-06-06T16:51:37 1780764697

> However, it will make decent machines to play video games."

Where you will need games to be rewritten for ARM to get full performance, just like on Apple's M series chips.

jb1991 · 2026-06-06T16:40:14 1780764014

He’s just a braggart. When you see something like this in somebody’s personal bio on social media, it’s basically a banner that means “take everything I say in the context of me promoting myself.”

iLoveOncall · 2026-06-06T13:45:26 1780753526

> "Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.

Lol yeah seriously, that stinks "I ask AI to generate a huge amount of bullshit and upload it to pad irrelevant stats".

Absolute loser.

nkurz · 2026-06-06T14:30:49 1780756249

I agree that it sends the wrong symbol, but actually Daniel is great. He cares tremendously about doing work that is actually real-world useful. I've co-written a few papers with him, and he's really hard working and open to outside suggestions. The danger is that if you send him comments, he'll eventually manage to rope you into writing a new and improved version. Seriously, if you are a non-academic computer scientist with a good idea that you want to publish, he'd be incredibly open to working with you.

As to why he now has this on his blog? I also cringe when I read it. I presume someone told him he should self-promote more, and this is his lame attempt to do so. He's almost certainly the most cited person in his department, but it's entirely possible that none of his colleagues actually know this. Cut him some slack. Self-promotion is not his strength. He's a nerd's nerd, and not a marketer. I'll mention to him that his attempt here might be backfiring when I'm next in contact with him.

infecto · 2026-06-06T16:08:42 1780762122

I cringe calling it out but it just stood out as it was plastered everywhere and I actually have never seen his links before.

hgoel · 2026-06-06T16:46:26 1780764386

I kind of get it in the sense that every academic has to make themselves somewhat comfortable with self-promotion even if they don't like it. It's an important part of getting funding, but putting a blurb like that everywhere just hurts his credibility I think.

iLoveOncall · 2026-06-06T15:13:45 1780758825

> As to why he now has this on his blog?

He doesn't just have it on his blog, he has it EVERYWHERE. Sometimes 2 or 3 times on the same page.

dgacmu · 2026-06-06T16:53:18 1780764798

He's not a loser; he's done some really fun work that many people use daily. I've used his range mapping trick in multiple projects/papers. It's elegant.

It sounds like he's gotten bad advise about how to market himself /or/ this is being marketed to people who have bigger checks to write and whom he believes will be responsive to this kind of marketing. As an academic, it rubs me very wrong - I think it's detrimental to the field when we get into h-index stacking contests or citation count comparisons. But I don't know what incentives he's responding to, which seems important for putting this stuff in context.

(as an aside, it turns out that polars + fastexcel is about 10x faster than pandas + openpyxl for searching that dataset, if anyone else is curious what he was actually talking about. :)

netsharc · 2026-06-06T13:55:47 1780754147

I found his website, https://www.lemire.me/en/ , and the "2%" brag is the very first sentence, geez.

Being the top x% is what OnlyFans girls brag about, professor...

And it's not exactly brain surgery, is it? https://www.youtube.com/watch?v=THNPmhBl-8I

Zetaphor · 2026-06-06T14:13:21 1780755201

> Daniel Lemire’s blog is one of the top 50 most popular blogs on Hacker News, the standard tech news aggregation site.

Citation needed

nkurz · 2026-06-06T14:38:11 1780756691

https://refactoringenglish.com/tools/hn-popularity/

thg · 2026-06-06T15:58:42 1780761522

For posterity: It's rank 34 at the time of this comment

SkiFire13 · 2026-06-06T15:48:45 1780760925

That lines looks very cringe indeed, but the guy has some crazy good blogposts on SIMD stuff.

jayd16 · 2026-06-06T16:29:37 1780763377

Maybe they just mean from a "it can run a lot of DLSS" perspective.

SwtCyber · 2026-06-06T13:48:00 1780753680

I think the local-model use case is going to become less niche pretty quickly if the models keep getting smaller and more capable. Even if most people do not care about privacy or offline use, the cost argument is pretty strong