Hacker Newsnew | past | comments | ask | show | jobs | submit | andersmurphy's commentslogin

netBSD! ... o wait not linux... damn

Biggest displacement has to be commenting on HN.

Yeah, you really want directly connected NVME drives to your machine/VPS. It can make orders of magnitude difference.

I mean it has blob types. Which basically means you can implement any type you want. You can also trivially implement custom application functions to work on these blob types in your queries. [1]

- [1] https://sqlite.org/appfunc.html


The thing is sqlite can scale further vertically than most network databases. In some context's like writes and interactive transactions it outright scales further. [1]

That's before you even get into sharding sqlite.

[1] - https://andersmurphy.com/2025/12/02/100000-tps-over-a-billio...


Sqlite isn't the part that needs to scale in most cases, though. As soon as you need multiple servers to handle the traffic you're getting (serializing data, concatenating strings for HTML, lots of network throughout, or even just handling amounts of data that press you up against your memory limit), you're probably not going to have a great time with sqlite. Having multiple boxes talk to the same sqlite file is not something I've ever seen anyone do well at scale.

Yes, you can get by with one box for probably quite a while. But eventually a service of any significant size is going to need multiple boxes. Hell, even just having near-zero downtime deployments essentially requires it. Vertically scaling is generally a whole lot less cost effective than horizontal scaling (for rented servers), especially if your peak usage is much higher than off-hours use.


I'd argue the opposite vertical scaling us a whole lot more effective than horizontal scaling if your using a language that has both real threads and green/virtual threads (go or anything on the JVM). You get such insane bang for your buck these days even over provisioning is cheap. Hell direct NVME can easily give 10-100x vs the crappy network drives AWS provides.

Zero downtime deploys have been solved for single machines. But, even then I'd argue most businesses can have an hour of downtime a month. I mean that's the same reliability as AWS these days.

Really, there are a handful of cases where you need multiple servers:

- You're network limited (basically you're a CDN).

- You are drive limited you need to get data off dirves faster than their bandwidth.

- Some legal requirement.

This is before we get into how trivial it is to shard sqlite by region or customer company. You can even shard sqlite on the same machine if you need higher write throughput.


Is Postgres with "no network" running over a unix socket or an IP socket on the same machine?

Yes unix socket using the java 16 socket channels. Interestingly there was only a 5-10% improvement vs IP sockerts (with no ssl).

Wonder if that inspired: The tomb of the eaters - in caves of qud.

Forth


With a trend towards immutable single writer databases MMAP seems like a massive win.


Andy is very critical of using mmap in database implementations.


Andy's critiques are only valid on dedicated database servers.

https://www.symas.com/post/are-you-sure-you-want-to-use-mmap...

LMDB uses mmap and Andy recommends LMDB, in the very article this thread is about.


Why? Sqlite and LMDB make fantastic use of it. For anyone doing a single writer db it's a no brainer. It does so much for you and it does it very well. All the things you don't have to implement because it does it for you:

- Reading the data from disk

- Concurrency between different threads reading the same data

- Caching and buffer management

- Eviction of pages from memory

- Playing nice with other processes in the machine

Why would you not leverage it? It's such a great fit for scaling reads.


The strongest argument as far as I can see it is... the problem is you now lose control over all those things. It's a black box with effectively no knobs.

Anyways, read for yourself, Pavlo & Leis get into it in detail, and there's benchmarks:

https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf

https://db.cs.cmu.edu/mmap-cidr2022/


What am I missing? The transactional safety problem (the bulk of the paper) is solved simply with a single writer. Which is where you want to be anyway for efficient batching throughput (and isolation).

The other concerns seem to imply there are no other programs running on the same machine as the database. The minute that's not true (is it ever true?). Then OS will do a better job (as seen with LMDB etc).

I think it's telling that the paper focuses on mongoDB not LMDB.


Fun footnote: SQLite only got on board with mmap after I demonstrated how slow their code was without it. I.e., getting a 22x speedup by replacing SQLite's btree code with LMDB https://github.com/LMDB/sqlightning


Thank you for beating the mmap drum and LMDB! It's truly an incredible piece of tech.


“ It's such a great fit for scaling reads.”

And losing them.


How so? LMDB, boltdb/bbolt and sqlite (with mmap) are all rock solid. Just because mongodb used mmap badly does not make it any less valuable.


Feom what I remember if AWS loses your data they are basically give you some credits and that's it.


Yup, often orders of magnitude better.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: