I mean it has blob types. Which basically means you can implement any type you want. You can also trivially implement custom application functions to work on these blob types in your queries. [1]
The thing is sqlite can scale further vertically than most network databases. In some context's like writes and interactive transactions it outright scales further. [1]
Sqlite isn't the part that needs to scale in most cases, though. As soon as you need multiple servers to handle the traffic you're getting (serializing data, concatenating strings for HTML, lots of network throughout, or even just handling amounts of data that press you up against your memory limit), you're probably not going to have a great time with sqlite. Having multiple boxes talk to the same sqlite file is not something I've ever seen anyone do well at scale.
Yes, you can get by with one box for probably quite a while. But eventually a service of any significant size is going to need multiple boxes. Hell, even just having near-zero downtime deployments essentially requires it. Vertically scaling is generally a whole lot less cost effective than horizontal scaling (for rented servers), especially if your peak usage is much higher than off-hours use.
I'd argue the opposite vertical scaling us a whole lot more effective than horizontal scaling if your using a language that has both real threads and green/virtual threads (go or anything on the JVM). You get such insane bang for your buck these days even over provisioning is cheap. Hell direct NVME can easily give 10-100x vs the crappy network drives AWS provides.
Zero downtime deploys have been solved for single machines. But, even then I'd argue most businesses can have an hour of downtime a month. I mean that's the same reliability as AWS these days.
Really, there are a handful of cases where you need multiple servers:
- You're network limited (basically you're a CDN).
- You are drive limited you need to get data off dirves faster than their bandwidth.
- Some legal requirement.
This is before we get into how trivial it is to shard sqlite by region or customer company. You can even shard sqlite on the same machine if you need higher write throughput.
Why? Sqlite and LMDB make fantastic use of it. For anyone doing a single writer db it's a no brainer. It does so much for you and it does it very well. All the things you don't have to implement because it does it for you:
- Reading the data from disk
- Concurrency between different threads reading the same data
- Caching and buffer management
- Eviction of pages from memory
- Playing nice with other processes in the machine
Why would you not leverage it? It's such a great fit for scaling reads.
The strongest argument as far as I can see it is... the problem is you now lose control over all those things. It's a black box with effectively no knobs.
Anyways, read for yourself, Pavlo & Leis get into it in detail, and there's benchmarks:
What am I missing? The transactional safety problem (the bulk of the paper) is solved simply with a single writer. Which is where you want to be anyway for efficient batching throughput (and isolation).
The other concerns seem to imply there are no other programs running on the same machine as the database. The minute that's not true (is it ever true?). Then OS will do a better job (as seen with LMDB etc).
I think it's telling that the paper focuses on mongoDB not LMDB.
Fun footnote: SQLite only got on board with mmap after I demonstrated how slow their code was without it. I.e., getting a 22x speedup by replacing SQLite's btree code with LMDB https://github.com/LMDB/sqlightning
reply