The Wikipedia data dumps [0] are multistream bz2. This makes them relatively easy to partially ingest, and I'm happy to be able to remove the C dependency from the Rust code I have that deals with said dumps.
The same could be said of many things that, nonetheless, are still used by many, and will continue to be used by many for decades to come. A thing does not need to be best to justify someone wanting to make it a bit better.
“Best” is measured along a lot more axis than just performance. And you don’t always get to choose what format you use. It may be dictated to you by some 3rd party you can’t influence.
bzip2 is still pretty good if you want to optimize for:
- better compression ratio than gzip
- faster compression than many better-than-gzip competitors
- lower CPU/RAM usage for the same compression ratio/time
This is a niche, but it does crop up sometimes. The downside to bzip2 is that it is slow to decompress, but for write-heavy workloads, that doesn't matter too much.
So? If I need to consume a resource compressed using bz2, I'm not just going to sit around and wait for them to use zstd. I'm going to break out bz2. If I can use a modern rewrite that's faster, I'll take every advantage I can get.