> In the steady state, a webserver would have almost no garbage collector activity
I recently wrote my own zero allocation HTTP server and while the above statement is possible to achieve, at some point you need to make a decision on how you handle pipelined requests that aren't resolved synchronously. Depending on your appetite for memory consumption per connection, this often leads to allocations in the general case, though custom memory pools can alleviate some of the burden.
I didn't see anything in the article about that case specifically, which would of been interesting to hear given it's one of the challenges I've faced.
Good point; I've decided to simply not support HTTP/1.1 pipelines, and to have a connection pooling layer for HTTP/2 instead that takes care of this.
In OxCaml, it has support for the effect system that we added in OCaml 5.0 onwards, which allows for a fiber to suspend itself and be restarted via a one-shot continuation. So it's possible to have a pipelined connection stash away a continuation for a response calculation and be woken up later on when it's ready.
All continuations have to be either discarded explicitly or resumed exactly once; this can lead to memory leaks in OCaml 5, but OxCaml has an emerging lifetime system that guarantees this is safe: https://oxcaml.org/documentation/parallelism/01-intro/ or https://gavinleroy.com/oxcaml-tutorial-icfp25/ for a taste of that. Beware though; it's cutting edge stuff and the interfaces are still emerging, but it's great fun if you don't mind some pretty hardcore ML typing ;-) When it all settles down it should be very ergonomic to use, but right now you do get some interesting type errors.
> So it's possible to have a pipelined connection stash away a continuation for a response calculation and be woken up later on when it's ready.
Ahh, that's interesting. I think you still run into the issue where you have a case like this:
1. You get 10 pipelined requests from a single connection with a post body to update some record in a Postgres table.
2. All 10 requests are independent and can be resolved at the same time, so you should make use of Postgres pipelining and send them all as you receive them.
3. When finishing the requests, you likely need the information provided in the request object. Lets assume it's a lot of data in the body, to the point where you've reached you per connection buffer limit. You either allocate here to unblock the read, or you block new reads, impacting response latency, until all requests are completed. The allocation is the better choice at that point but that heuristic decision engine with the goal of peak performance is definitely nuanced, if not complicated.
Its a cool problem space though, so always interested in learning how others attack it.
It is a cool problem space! What I'm doing is using a single buffer for body handling (since you dispatch that away and then reuse it for chunked encoding) so it never takes unbounded stack space. This might be a bit different in HTTP/3 where you can have multiple body transmissions multiplexing; I have to look into how this works (but it's UDP as well)
What we never need to do in OxCaml is to keep a giant body buffer list in the stack; with effects, we can fork the stack any time, so the request object is shared naturally. The only way to free the stack is to return from a function, but you can have a tree of these that share values earlier in the callchain.
Everyone else is using LLMs to assist their development, which makes it a lot harder to work without them, especially in just building enterprise apps. It doesn't feel like I'm creating something anymore. Rather, it feels like a fuzzy amalgamation of all developers in the training data are. Working with LLMs sometimes feels like information overload. When I see so much code scrolling past as the agent makes its changes, this can be exhausting. Reading this massive volume of code is exhausting. I don't like that the new "power tools" of software engineering mean that my career, our career, is now monetizable. I liked feeling like a craftsman, and that is lost.
I’m curious if you’ve tried using these tools the other way.
Whenever I’ve done experimenting I found the tab completion annoying and the agent got so much wrong I was basically fighting it at every step, but when I went back to VS Code and treated LM as a super fast inline stackoverflow—give me an example, look up this API, find my dumb syntax error—I could use it to support deep work/staying in the zone rather than supplant it, and the resulting code isn’t slop because I wrote it.
It seems to me a lot of developers are operating this way instead, treating the machine as an electric bicycle for lots of little boosts rather than FSD.
I have, and it's entirely my own fault, but the way my brain works is I can't justify doing something slower than I possibly could with other tools. I'm not one to completely give in to vibe coding. I do still very manually drive the LLM when I work, but I don't even feel like learning tech anymore.
I can't help but ask myself, what's the point when learning another programming language, or another library, or another paradigm, when a lot of this information and knowledge is encoded in the model weights of the LLM
No the OP, but I feel like using LLMs to code is much more like management than coding. And the "person" you're managing is a not very smart coder with severe memory issues.
My guess is because it’s turning development into a Red Queen’s Race [0] where everyone has to run faster and faster just to stay in the same place. If everyone else is using LLMs, how can you stay competitive without using them?
It encourages good designs but it does not make them easy to write, but that's somewhat the point. Its not trivial to design a safe API that pushes performance limits.
> just write a lockless triple buffer for efficient memory sharing and wrap the unsafe usage of pointers with a safe API
This isn't practical or pragmatic.
And I say this as someone who likes rust and develops in it every day.
> Here's where SierraDB diverges from traditional distributed databases: reads don't require quorum.
Would we say this is divergent? Cassandra, DynamoDB, and many others allow you to specify the consistency of reads at the request level.
> Here's where SierraDB diverges from traditional distributed databases: reads don't require quorum. Instead, each event stores a confirmation count in its metadata. When a write achieves quorum, a background process broadcasts this confirmation to all replicas, updating their local confirmation counts. This means any single node can serve consistent reads without network round-trips - a massive performance win.
I have no context outside of this blog post, but this seems actually divergent from the typical definition of consistency given its not linearizable. What systems benefit most from this low latency stale-but-ordered consistency guarantee?
There's been a massive talent exodus, especially among the principal and senior principal engineering roles, across all Amazon orgs since the RTO policies have been enforced. Its demoralizing to lose key engineers that you look up to and want to continue to learn from all because a few people far removed from the day to day make a bad call.
RTO in combination with Amazon being last place in AI innovation have led to departures of anyone that can leave, leaving.
I recently wrote my own zero allocation HTTP server and while the above statement is possible to achieve, at some point you need to make a decision on how you handle pipelined requests that aren't resolved synchronously. Depending on your appetite for memory consumption per connection, this often leads to allocations in the general case, though custom memory pools can alleviate some of the burden.
I didn't see anything in the article about that case specifically, which would of been interesting to hear given it's one of the challenges I've faced.