More

forrestthewoods · 2025-12-29T20:59:57 1767041997

WizTree is super great. Strong recommend.

forrestthewoods · 2025-12-29T19:34:14 1767036854

Yeah I agree. This is a classic “uninitialized variable has garbage memory value” bug. But it is not a “undefined nasal demons behavior” bug.

That said, we all learn this one! I spent like two weeks debugging a super rare desync bug in a multiplayer game with a P2P lockstep synchronous architecture.

Suffice to say I am now a zealot about providing default values all the time. Thankfully it’s a lot easier since C++11 came out and lets you define default values at the declaration site!

titzer · 2025-12-29T19:47:44 1767037664

I prefer language constructs define that new storage is zero-initialized. It doesn't prevent all bugs (i.e. application logic bugs) but at least gives deterministic results. These days it's zero cost for local variables and near-zero cost for fields. This is the case in Virgil.

andrewaylett · 2025-12-29T20:39:26 1767040766

That makes things worse if all-zero is not a valid value for the datatype. I'd much prefer a set-up that requires you to initialise explicitly. Rust, for example, has a `Default` trait that you can implement if there is a sensible default, which may well be all-zero. It also has a `MaybeUninit` holder which doesn't do any initialisation, but needs an `unsafe` to extract the value once you've made sure it's OK. But if you don't have a suitable default, and don't want/need to use `unsafe`, you have to supply all the values.

kevin_thibedeau · 2025-12-29T20:15:32 1767039332

C & C++ run on systems where it may not be zero cost. If you need low latency startup it could be a liability to zero out large chunks of memory.

ablob · 2025-12-29T21:11:43 1767042703

I think it's acceptable to leave an escape hatch for these situations instead of leaving it to easy to misunderstand nooks and crannies of the standard.

You don't want to zero out the memory? Slap a "foo = uninitialized" in there to have that exact behavior and get the here be demons sign for free.

forrestthewoods · 2025-12-29T21:26:02 1767043562

Yeah this issue is super obvious and non-controversial.

Uninitialized state is totally fine as an opt-in performance optimization. But having a well defined non-garbage default value should obviously be the default.

Did C fuck that up 50 years ago? Yeah probably. They should have known better even then. But that’s ok. It’s a historical artifact. All languages are full of them. We learn and improve!

1718627440 · 2025-12-29T22:37:41 1767047861

I don't know, I expect all variables to be uninitialized until proven otherwise. It makes it easier for me to reason about code, especially convoluted code. But I also like C a lot and actually explicitly invoke UB quite often, so there is that.

forrestthewoods · 2025-12-29T23:20:47 1767050447

I like C and it's great. I wish more people wrote C instead of C++. But there's a reason that literally no modern language makes this choice.

If uninitialization was opt-in you would still be free to "assume uninitialized until proven otherwise". But uninitialized memory is such a monumental catastrophic footgun that really is not a justifiable reason to make that default behavior. Which, again, is why no modern languages make that (terrible) design choice.

kevin_thibedeau · 2025-12-30T01:34:40 1767058480

There are non-standard mechanisms to control variable initialization. GCC has -ftrivial-auto-var-init=zero for zero-init of locals (with some caveats). For globals, you can link them into a different section than bss to disable zero-init.

1718627440 · 2025-12-29T23:42:37 1767051757

I am talking about random convoluted code, I did neither wrote nor control. The UB does not only help the compiler, it also helps me the reverse engineer, since I also can assume that an access without a previous write is either a bug, or I misinterpreted the control flow.

AlotOfReading · 2025-12-30T01:33:15 1767058395

You can assume whatever initialization you want when reading code, even if it's not in the standard. Is your concern that people would start writing code assuming zero-init behavior (as they already do)?

That purpose would be better served by reclassifying uninitialized reads as erroneous behavior, which they are for C++26 onwards. What useful purpose is served by having them be UB specifically?

1718627440 · 2025-12-30T09:11:29 1767085889

> Is your concern that people would start writing code assuming zero-init behavior (as they already do)?

Yes, I couldn't assume that such code can be deleted safely. Not sure, if people really rely on it, given that it doesn't work.

> erroneous behavior

So they finally did the thing and made the crazy optimizations illegal?

> If the execution of an operation is specified as having erroneous behavior, the implementation is permitted to issue a diagnostic and is permitted to terminate the execution of the program.

> Recommended practice: An implementation should issue a diagnostic when such an operation is executed. [Note 3: An implementation can issue a diagnostic if it can determine that erroneous behavior is reachable under an implementation-specific set of assumptions about the program behavior, which can result in false positives. — end note]

I don't get it at all. The implementation is already allowed to issue diagnostics as it likes including when the line number of the input file changes. In the case of UB it is also permitted to emit code, that terminates the program. This sounds all like saying nothing. The question is what the implementation is NOT allowed to do for erroneous behaviour, that would be allowed for undefined behaviour.

Also if they do this, does that mean that most optimizations are suddenly illegal?

Well, yeah the compiler can assume UB never happens, optimizes and that can sometimes surprise the programmer. But I the programmer also program based on that assumption. I don't see how defining all the UB serves me.

torstenvl · 2025-12-30T04:43:38 1767069818

UB doesn't mean there will be nasal demons. It means there can be nasal demons, if the implementation says so. It means the language standard does not define a behavior. POSIX can still define the behavior. The implementation can still define the behavior.

Plenty of things are UB just because major implementations do things wildly differently. For example:

    realloc(p, 0)

Having initialization be UB means that implementations where it's zero cost can initialize them to zero, or implementations designed for safety-critical systems can initialize them to zero, or what have you, without the standard forcing all implementations to do so.

masklinn · 2025-12-30T12:04:02 1767096242

> UB doesn't mean there will be nasal demons. It means there can be nasal demons, if the implementation says so.

Rather "if the implementation doesn't say otherwise".

Generally speaking compiler writers are not mustache-twirling villains stroking a white cat thinking of the most dastardly miscompilation they could implement as punishment. Rather they implement optimisation passes hewing as close as they can to the spec's requirements. Which means if you're out of the spec's guarantees you get whatever emergent behaviour occurs when the optimisation passes run rampant.

AlotOfReading · 2025-12-30T05:14:20 1767071660

All of that implementation freedom is also available if the behavior is erroneous instead. Having it defined as UB just gets you nasal demons, which incidentally this rule leads to on modern compilers. For example:

https://godbolt.org/z/ncaKGnoTb

forrestthewoods · 2025-12-30T07:18:44 1767079124

Yeah that’s just really bad language design. Which, again, literally no modern languages do because it’s just terrible horrible awful no good very bad design.

1718627440 · 2025-12-30T09:14:00 1767086040

It's describing rather than prescribing, which yeah isn't really design. Most modern languages don't even (plan to) have multiple implementations, much less a standard.

forrestthewoods · 2025-12-29T09:54:58 1767002098

Eigen is one of the worst libraries when it comes to both exe size and compile times. <shudder>

a_t48 · 2025-12-29T19:05:02 1767035102

In terms of compile times, boost geometry is somehow worse. You're encouraged to import boost/geometry.hpp, which includes every module, which stalls compile times by several seconds just to parse all the templates. It's not terrible if you include just the headers you need, but that's not the "default" that most people use.

forrestthewoods · 2025-12-29T20:58:18 1767041898

boost is on my “do not ever use ever oh my god what are you doing stop it” list. It’s so bad.

a_t48 · 2025-12-29T21:46:14 1767044774

Same.

forrestthewoods · 2025-12-29T09:51:16 1767001876

If you have 25gb of executables then I don’t think it matters if that’s one binary executable or a hundred. Something has gone horribly horribly wrong.

I don’t think I’ve ever seen a 4gb binary yet. I have seen instances where a PDB file hit 4gb and that caused problems. Debug symbols getting that large is totally plausible. I’m ok with that at least.

niutech · 2025-12-29T17:54:34 1767030874

Llamafile (https://llamafile.ai) can easily exceed 4GB due to containing LLM weights inside. But remember, you cannot run >4GB executable files on Windows.

wolfi1 · 2025-12-29T11:50:56 1767009056

I did, it was a Spring Boot fat jar with a NLP, I had to deploy it to the biggest instance AWS could offer, the costs were enormous

selkin · 2025-12-30T03:08:36 1767064116

Java bytecode is always dynamically linked.

wolfi1 · 2025-12-30T05:00:09 1767070809

still, if I remember correctly I had to reserve 6gig of memory so that the jvm could actually start

throwawaymobule · 2025-12-29T13:15:54 1767014154

A few ps3 games I've seen had 4GB or more binaries.

This was a problem because code signing meant it needed to be completely replaced by updates.

swiftcoder · 2025-12-29T14:46:44 1767019604

> A few ps3 games I've seen had 4GB or more binaries.

Is this because they are embedding assets into the binary? I find it hard to believe anyone was carrying around enough code to fill 4GB in the PS3 era...

throwawaymobule · 2025-12-29T20:28:56 1767040136

I assume so, there were rarely any other files on the disc in this case.

It varied between games, one of the battlefields (3 or bad company 2) was what I was thinking of. It generally improved with later releases.

The 4GB file size was significant, since it meant I couldn't run them from a backup on a fat32 usb drive. There are workarounds for many games nowadays.

loeg · 2025-12-29T17:17:02 1767028622

If you haven't seen a 25GB binary with debuginfo, you just aren't working in large, templated, C++ codebases. It's nothing special there.

forrestthewoods · 2025-12-29T19:18:09 1767035889

Not quite. I very much work in large, templated, C++ codebases. But I do so on windows where the symbols are in a separate file the way the lord intended.

forrestthewoods · 2025-12-29T02:41:44 1766976104

What would you prefer them to say?

perching_aix · 2025-12-29T03:42:05 1766979725

Evidence of no exploitations? It's usually hard to prove a negative, except when you have all the logs at your fingertips you can sift through. Unless they don't, of course. In which case the point stands: they don't actually know at this point in time, if they can even know about it at all.

Specifically, it looks like the exflitration primitive relies on errors being emitted, and those errors are what leak the data. They're also rather characteristic. One wouldn't reasonably expect MongoDB to hold onto all raw traffic data flowing in and out, but would absolutely expect them to have the error logs, at least for some time back.

saghm · 2025-12-29T04:28:36 1766982516

I feel like that's an issue not with what they said, but what they did. It would be better for them to have checked this quickly, but it would have been worse for them to have they did when they hadn't. What you're saying isn't wrong, but it's not really an answer to the question you're replying to.

forrestthewoods · 2025-12-29T04:17:58 1766981878

“No evidence of exploitation” is a pretty bog standard report I think? Made on Christmas Eve no less.

Do other CVE reports come with more strong statements? I’m not sure they do. But maybe you can provide some counter examples that meet your bar.

dwattttt · 2025-12-29T11:17:19 1767007039

> "No evidence of exploitation” is a pretty bog standard report

It is standard, yes. The problem with it as a statement is that it's true even if you've collected exactly zero evidence. I can say I don't have evidence of anyone being exploited, and it's definitely true.

perching_aix · 2025-12-29T04:23:00 1766982180

It's not really my bar, I just explored this on behalf of the person you were replying to because I found it mildly interesting.

It is also a pretty standard response indeed. But now that it was highlighted, maybe it does deserve some scrutiny? Or is saying silly, possibly misleading things okay if that's what everyone has always been doing?

forrestthewoods · 2025-12-27T20:40:34 1766868034

Timesync isn’t a nightmare at all. But it is a deep rabbit hole.

The best approach, imho, is to abandon the concept of a global time. All timestamps are wrt a specific clock. That clock will skew at a rate that varies with time. You can, hopefully, rely on any particular clock being monotonous!

My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time. The fewer stops the better.

I kinda don’t like PTP. Too complicated and requires specialized hardware.

This article only touches on one class of timesync. An entirely separate class is timesync within a device. Your phone is a highly distributed compute system with many chips each of which has their own independent clock source. It’s a pain in the ass.

You also have local timesync across devices such as wearables or robotics. Connecting to a PTP system with GPS and atomic clocks is not ideal (or necessary).

TicSync is cool and useful. https://sci-hub.se/10.1109/icra.2011.5980112

DannyBee · 2025-12-28T02:26:29 1766888789

"I kinda don’t like PTP. Too complicated and requires specialized hardware."

?????

I run PTP on everything from RPI's to you name it, over fiber, ethernet, etc.

The main thing hardware gives is filtration of PTP packets or hardware timestamping.

Neither is actually required, though some software has decided to require it.

Additionally, something like 99% of sold gigabit or better chipsets since 2012 support it (I210 et al)

forrestthewoods · 2025-12-28T05:34:11 1766900051

Robots and VR headsets and wearables and microcontrollers and sensors and trackers and Linux and Windows oh my!

bigfatkitten · 2025-12-28T10:47:58 1766918878

> I kinda don’t like PTP. Too complicated and requires specialized hardware.

At this stage, it's difficult to find an half-decent ethernet quality MAC that doesn't have PTP timestamping. It's not a particularly complicated protocol, either.

I needed to distribute PPS and 10MHz into a GNSS-denied environment, so last summer I designed a board to do this using 802.1AS gPTP with a uBlox LEA-M8T GNSS timing receiver, a 10MHz OCXO and an STM32F767 MCU. This took me about four weeks. Software is written in C, and the PTP implementation accounts for 1500 LOC.

RossBencina · 2025-12-28T03:14:52 1766891692

> I kinda don’t like PTP. Too complicated and requires specialized hardware.

In my view the specialised hardware is just a way to get more accurate transmission and arrival timestamps. That's useful whether or not you use PTP.

> My mental model is that you form a connected graph of clocks and this allows you to convert arbitrary timestamps from any clock to any clock. This is a lossy conversion that has jitter and can change with time.

This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock. I like the concept, but obviously it adds complexity to end-stations beyond what PTP requires (i.e. increases the hardware cost of embedded implementations). Such a system would also need to model the network topology, or control routing (as PTP does), because packets traversing different routes to the same host will experience different delay and jitter statistics.

> TicSync is cool

I hadn't seen this before, but I have implemented similar convex-hull based methods for clock recovery. I agree this is obviously a good approach. Thanks for sharing.

forrestthewoods · 2025-12-28T03:34:23 1766892863

> This sounds like the "peer to peer" equivalent to PTP. It would require every node to maintain state about it's estimate (skew, slew, variance) of every other clock.

Well, it requires having the conversion function for each edge in the traversed path. And such function needs to exist only at the location(s) performing the conversion.

> obviously it adds complexity to end-stations beyond what PTP requires

If you have PTP and it works then stick with it. If you’re trying to timesync a network of wearable devices then you don’t have PTP stamping hardware.

> because packets traversing different routes

Fair callout. It’s probably a more useful model for less internty use cases. Of which there are many!

For example when trying to timesync a collection of different sensors on different devices/microcontrollers.

Roboticists like CanBus and Ethercat. But even that is kinda overkill imho. TicSync can get you tens of microseconds of precision in user space.

forrestthewoods · 2025-12-27T20:09:46 1766866186

Ask me how I know you didn’t read the whole article!

forrestthewoods · 2025-12-27T02:19:01 1766801941

Because a single docker image can run multiple programs that have mutually exclusive dependencies?

Personally I never want program to ever touch global shared libraries ever. Yuck.

est · 2025-12-27T03:14:33 1766805273

> a single docker image can run multiple programs

You absolutely can. But it's not best practice.

https://docs.docker.com/engine/containers/multi-service_cont...

forrestthewoods · 2025-12-27T03:53:55 1766807635

God I hate docker so much. Running computers does not have to be so bloody complicated.

forrestthewoods · 2025-12-26T15:52:50 1766764370

> This entire blog is just a waste of time for anyone reading it.

Well that’s an extremely rude thing to say.

Personally I thought it was really interesting to read about a bunch of different projects all running into the same wall with Git.

I also didn’t realize that Git had issues with sparse checkouts. Or maybe author meant shallow? I forget.

forrestthewoods · 2025-12-25T17:12:02 1766682722

Is there a Clang based build for Windows? I’ve been slowly moving my Windows builds from MSVC to Clang. Which still uses the Microsoft STL implementation.

So far I think using clang instead of MSVC compiler is a strict win? Not a huge difference mind you. But a win nonetheless.