I’ve not kept up with Intel in a while, but one thing that stood out to me is these are all E cores— meaning no hyperthreading. Is something like this competitive, or preferred, in certain applications? Also does anyone know if there have been any benchmarks against AMDs 192 core Epyc CPU?
"Is something like this competitive, or preferred, in certain applications?"
They cite a very specific use case in the linked story: Virtualized RAN. This is using COTS hardware and software for the control plane for a 5G+ cell network operation. A large number of fast, low power cores would indeed suit such a application, where large numbers of network nodes are coordinated in near real time.
It's entirely possible that this is the key use case for this device: 5G networks are huge money makers and integrators will pay full retail for bulk quantities of such devices fresh out of the foundry.
> how do you get them off the shelf if you also need TB of memory
You make products for well capitalized wireless operators that can afford the prevailing cost of the hardware they need. For these operations, the increase in RAM prices is not a major factor in their plans: it's a marginal cost increase on some of the COTS components necessary for their wireless system. The specialized hardware they acquire in bulk is at least an order of magnitude more expensive than server RAM.
Intel will sell every one of these CPUs and the CPUs will end up in dual CPU SMP systems fully populated with 1-2 TB of DDR5-8000 (2-4GB/core, at least) as fast as they can make them.
I do likr the idea that capitalism can always ignore the broader base of consumers and just raise prices. Eventually, therell only be one viagra pill bought by trillionaires at 1$ million dollars.
In HPC, like physics simulation, they are preferred. There's almost no benefit of HT. What's also preferred is high cluck frequencies. These high core count CPUs nerd their clixk frequencies though.
It all depends on your exact workload, and I’ll wait to see benchmarks before making any confident claims, but in general if you have two threads of execution which are fine on an E-core, it’s better to actually put them on two E-cores than one hyperthreaded P-core.
Without the hyperthreading (E-cores) you get more consistent performance between running tasks, and cloud providers like this because they sell "vCPUs" that should not fluctuate when someone else starts a heavy workload.
Sort of. They can just sell even numbers of vCPUs, and dedicate each hyper-thread pair to the same tenant. That prevents another tenant from creating hyper-threading contention for you.
For those, wouldn't hyperthreading be a win? Some fraction of the time, you'd get evicted to the hyperthread that shares your L1 cache (and the hypervisor could strongly favor that).
I don't know the nitty-gritty of why, but some compute intensive tasks don't benefit from hyperthreading. If the processor is destined for those tasks, you may as well use that silicon for something actually useful.
It's a few things; mostly along the lines of data caching (i.e. hyper threading may mean that other thread needs a cache sync/barrier/etc).
That said I'll point to the Intel Atom - the first version and refresh were an 'in-order' where hyper-threading was the cheapest option (both silicon and power-wise) to provide performance, however with Silvermont they switched to OOO execution but ditched hyper threading.
I think some of why is size on die. 288 E cores vs 72 P cores.
Also, there's so many hyperthreading vulnerabilities as of late they've disabled on hyperthreaded data center boards that I'd imagine this de-risks that entirely.
For an application like a build server, the only metric that really matters is total integer compute per dollar and per watt. When I compile e.g a Yocto project, I don't care whether a single core compiles a single C file in a millisecond or a minute; I care how fast the whole machine compiles what's probably hundred thousands of source files. If E-cores gives me more compute per dollar and watt than P-cores, give me E-cores.
Of course, having fewer faster cores does have the benefit that you require less RAM... Not a big deal before, you could get 512GB or 1TB of RAM fairly cheap, but these days it might actually matter? But then at the same time, if two E-cores are more powerful than one hyperthreaded P-core, maybe you actually save RAM by using E-cores? Hyperthreading is, after all, only a benefit if you spawn one compiler process per CPU thread rather than per core.
EDIT: Why in the world would someone downvote this perspective? I'm not even mad, just confused
It's for building embedded Linux distros, and your typical Linux distro contains quite a lot of C++ and Rust code these days (especially if you include, say, a browser, or Qt). But you have parallelism across packages, so even if one core is busy doing a serial linking step, the rest of your cores are busy compiling other packages (or maybe even linking other packages).
That said, there are sequential steps in Yocto builds too, notably installing packages into the rootfs (it uses dpkg, opkg or rpm, all of which are sequential) and any code you have in the rootfs postprocessing step. These steps usually aren't a significant part of a clean build, but can be a quite substantial part of incremental builds.
That's finally set to be resolved with Nova Lake later this year, which will support AVX10 (the new iteration of AVX512) across both core types. Better very late than never.
E cores didn't just ruin P cores, it ruined AVX-512 altogether. We were getting so close to near-universal AVX-512 support; enough to bother actually writing AVX-512 versions of things. Then, Intel killed it.
I love the AVX512 support in Zen 5 but the lack of Valgrind support for many of the AVX512 instructions frustrates me almost daily. I have to maintain a separate environment for compiling and testing because of it.
There was someone at Intel working on AVX512 support in Valgrind. She is/was based in St Petersburg. Intel shuttered their Russian operations when Putin invaded Ukraine and that project stalled.
If anyone has the time and knowledge to help with AVX512 support then it would be most welcome. Fair warning, even with the initial work already done this is still a huge project.
I've seen scenarios where HT doesn't help, iirc very CPU-heavy things without much waiting on memory access. Which makes sense because the vcores are sharing the ALU.
Also have seen it disabled in academic settings where they want consistent performance when benchmarking stuff.
I could be doing something wrong, but I have not had any success with one shot feature implementations for any of the current models. There are always weird quirks, undesired behaviors, bad practices, or just egregiously broken implementations. A week or so ago, I had instructed Claude to do something at compile-time and it instead burned a phenomenal amount of tokens before yeeting the most absurd, and convoluted, runtime implementation—- that didn’t even work. At work I use it (or Codex) for specific tasks, delegating specific steps of the feature implementation.
The more I use the cloud based frontier models, the more virtue I find in using local, open source/weights, models because they tend to create much simpler code. They require more direct interaction from me, but the end result tends to be less buggy, easier to refactor/clean up, and more precisely what I wanted. I am personally excited to try this new model out here shortly on my 5090. If read the article correctly, it sounds like even the quantized versions have a “million”[1] token context window.
And to note, I’m sure I could use the same interaction loop for Claude or GPT, but the local models are free (minus the power) to run.
[1] I’m a dubious it won’t shite itself at even 50% of that. But even 250k would be amazing for a local model when I “only” have 32GB of VRAM.
WWII didn’t start overnight. The Sturmabteilung (SA), also known as “The Brownshirts,” have a strong similarity to what we’re seeing with ICE and CBP. The SA were Hitler’s enforcers before the SS, during the 1920s and early 1930s. They were eventually usurped by the SS during “Night of the Long Knives” where SA leadership were executed by the SS. Largely because Hitler had felt threatened by the power Ernst Röhm had amassed (among other reasons). And the SA, like ICE, was made up largely of untrained sycophants and thugs who enjoy violence. They committed violence, harassed citizens, and had no consequences for doing so. They were also instrumental in laying the foundation for the genocide and atrocities committed by the Nazi party.
It’s not a dishonor to their memories, or the atrocities committed, to call that out. It is not a dishonor to say there are stark and real similarities between the way the US is operating and treating civilians.
I personally find the opposite, IMHO it is dishonors their memories to refuse to acknowledge the similarities.
I’ve posted a comment similar to this one here before, and like how I ended it. I strongly encourage you to read about the history of Nazi Germany and how it came to happen. It wasn’t just a zero to death camps, it was 15 years in the making. That history is deeply shocking, as it is depressing, because the parallels and timelines are too similar for anything besides outright discomfort, sadness, and fear between it and the US. But without knowing it, we are ever more likely to repeat it.
One final thing to note: the US has a history of extreme violence, slave patrols and the treatment of non-whites of the 19th century were an inspiration for Hitler.
I think we’re at the peak, or close to it for these memory shenanigans. OpenAI who is largely responsible for the shortage, just doesn’t have the capital to pay for it. It’s only a matter of time before chickens come home to roost and the bill is due. OpenAI is promising hundreds of billions in capex but has no where near that cash on hand, and its cash flow is abysmal considering the spend.
Unless there is a true breakthrough, beyond AGI into super intelligence on existing, or near term, hardware— I just don’t see how “trust me bro,” can keep its spending party going. Competition is incredibly stiff, and it’s pretty likely we’re at the point of diminishing returns without an absolute breakthrough.
The end result is going to be RAM prices tanking in 18-24 months. The only upside will be for consumers who will likely gain the ability to run much larger open source models locally.
I’m not sure I get why this is better. Something like Tailscale makes it trivial to connect to your own machines and is likely more secure than this will be. Tailscale even has a free plan these days. Combine that with something like this that was shared on HN a few days ago: https://replay.software/updates/introducing-echo
Then you’re all in for like $3. What about webRTC makes this better?
Ah! Thanks for explaining that. I totally keep forgetting, to my own detriment, libghostty exists. It’s mighty cool to see it being used more and more to build cool new terminals (like yours and the mobile terminal that showed up here the other day).
I missed the mobile terminal and I've been hunting for a good one, did a search for past week but found nothing, if you had a link handy that would be great - thank you.
Honestly this is one of the biggest reasons I stick with Elixir. Between Elixir’s standard library, the BEAM/OTP, and Phoenix (with Ecto)—- I honestly have very few dependencies for web projects. I rarely, at this point, find the need to add anything to new projects except for maybe Mox (mocking library) and Faker (for generating bits of test data). And now that the Jason (JSON) library has been more or less integrated into OTP I don’t even have to pull it in. Elixir dev experience is truly unmatched (IMHO) these days.
> Oh yeah, and just wait until you see you have to pay the US taxes on your income too.
No, you don't.
You still have to file but you get "Federal Tax Credits" for income tax paid abroad and seeing how a EU country's income tax will almost certainly be higher than the US', you'll end up paying nothing.
There's also tax treaties to avoid double taxation in other ways.
I’ve seen plenty of videos covering it from expats stating they still do in fact pay taxes back to the US. Maybe the info is outdated or things have changed recently, but a cursory google makes it seems like that “No, you don’t,” isn’t true. It looks like the Federal Tax Credit only covers up to $130,000 per year of income. Then you pay on whatever you make over that (assuming you don’t have other credits).
> I’ve seen plenty of videos covering it from expats stating they still do in fact pay taxes back to the US.
"Expats" living in Europe?
I ask because "expat" usually refers to someone who moved to a lower cost of living country that may also have significantly lower income tax compared to the EU.
> It looks like the Federal Tax Credit only covers up to $130,000 per year of income.
$130k/yr is absolute bank in Europe.
From a quick Google search, that would put you well in the top 5% of earners in Berlin, just as an example.
So, this shouldn't be much of an issue.
Not a tax advice, but AFAIK, if you had to pay $1000 to US IRS, and already paid $800 to another country, then you owe US $200.
The country must have a tax treaty with US, so they exchange the info about your taxes in background. But many countries in EU has higher tax rates than US, then you owe $0.
I got a big laugh at the “only” part of that. I do have a sincere question about that number though, isn’t time relative? How would we know that number to be true or consistent? My incredibly naive assumption would be that with less matter time moves faster sort of accelerating; so, as matter “evaporates” the process accelerates and converges on that number (or close it)?
Times for things like "age of the universe" are usually given as "cosmic time" for this reason. If it's about a specific object (e.g. "how long until a day on Earth lasts 25 hours") it's usually given in "proper time" for that object. Other observers/reference frames may perceive time differently, but in the normal relativistic sense rather than a "it all needs to wind itself back up to be equal in the end" sense.
The local reference frame (which is what matters for proton decay) doesn't see an outside world moving slower or faster depending on how much mass is around it to any significant degree until you start adding a lot of mass very close around.
I don’t see how AI won’t end up running on personal devices. It’s like how mainframes were the original computing platform and then we had the PC revolution. If anything, I think Apple is uniquely positioned to pull the rug on a lot of these cloud models. It might take ten or 15 years, but eventually we’ll see an arms race to do so. There’s too much money on the table, and once cloud providers are tapped out the next logical step is home users. It also makes scaling a lot easier because you don’t need increasingly expensive, complex, and power hungry data centers.
It wasn’t that long ago (ignoring the current DRAM market shenanigans) that it was unthinkable to have a single machine with over terabyte of RAM and 192 physical cores. Now that’s absolutely doable in a single workstation. Heck even my comparatively paltry 96GB of RAM would’ve been absurd in 2010, now there are single prosumer GPUs with that.
With the rate of progress (and in the opposite direction, the physical limitations Intel/AMD/TSMC/ETC are bumping into), there's no guarantees about what a machine will look like a decade from now. But, simple logic applies: if the user's machine scales to X amounts of RAM, the hyperscaler's rack scales to X*Y RAM and assuming the performance/scaling relationship we've seen holds true, it will be correspondingly far smarter/better/powerful compared to the user's AI.
Maybe that won't matter when the user is asking it a 5th grade question, but for any more complex application of AI than "what's the weather" or "turn on a light", users should want a better AI, particularly if they don't have to pay for all that silicon sitting around unused in their machine for most of the day?
This argument would sound nearly identical if you made it in the 70s or early 80s about mainframes and personal computers.
It's not that mainframes (or supercomputers, or servers, or the cloud) stopped existing, it's that there was a "good enough" point where the personal computer was powerful enough to do all the things that people care about. Why would this be different?*
And aren't we all paying for a bunch of silicon that sits mostly unused? I have a full modern GPU in my Apple SoC capable of throwing a ridiculous number of polygons per second at the screen and I'm using it to display two terminal emulator windows.
* (I can think of a number of reasons why it would in fact turn out different, but none of them have to do with the limits of technology -- they are all about control or economic incentives)
It’s different because of the ubiquity of the internet and the financial incentives of the companies involved.
Right now you can get 20TB hard drives for cheap and setup your own NAS, but way more people spend money every month on Dropbox/iCloud/onedrive - people value convenience and accessibility over “owning” the product.
Companies also lean into this. Just consider Photoshop. It used to be a one-time purchase, then it became a cloud subscription, now virtually every new AI feature uses paid credits. Despite having that fast SoC, Photoshop will still throw your request to their cloud and charge you for it.
The big point still remains: by the time you can run that trillion parameter model at home, it’s old news. If the personal computer of the 80s was good enough, why’s nobody still using one? AI on edge devices will exist, but will forever remain behind data center AI.
Right now you can get 20TB hard drives for cheap and setup your own NAS, but way more people spend money every month on Dropbox/iCloud/onedrive - people value convenience and accessibility over “owning” the product.
Yes, this is a convenience argument, not a technical one. It's not that your PC doesn't have or could have more than enough storage -- it likely does -- it's that there are other factors that make you use Dropbox.
So now the question becomes: do we not believe that personal devices will ever become good enough to run a "good enough" LLM (technical barrier), or do we believe that other factors will make it seem less desirable to do so (social/financial/legal barrier)?
I think there's a very decent chance that the latter will be true, but the original argument was a technical one -- that good-enough LLMs will always require so much compute that you wouldn't want to run one locally even if you could.
If the personal computer of the 80s was good enough, why’s nobody still using one?
What people want to do changes with time, and therefore your PC XT will no longer hack it in the modern workplace, but the point is that from the point that a personal computer of any kind was good enough, people kept using personal computers. The parallel argument here would be that if there is a plateau where LLM improvement slows and converges with ability to run something good enough on consumer hardware, why would people not then just keep running those good enough models on their hardware? The models would get better with time, sure, but so would the hardware running them.
The original point that I was making was never purely a technical one. Performance, economics, convenience, and business trends all play a part in what I think will happen.
Even if LLM improvement slows, it’ll probably result in the same treadmill effect we see in other software.
Consider MS Office, Adobe Creative (Cloud), or just about any pro level software. The older versions aren’t really used, for various reasons, including performance, features, compatibility, etc. Why would LLMs, which seem to be on an even faster trajectory than conventional software, be any different? Users will want to continue upgrading, and in the case of AI, that’ll mean continuing to access the latest cloud model.
No doubt that someone can run gpt-oss-120b five years from now on device, but outside of privacy, why would they when you can get a faster, smarter answer (for free, likely) from a service?
reply