From what I've seen, performance is still much worse than Clickhouse, that was always distributed, open source, data warehouse like and feature rich.
Why should I use timescale?
I'm really asking, I'm not being rhetorical.
Clickhouse and Timescale are different types of databases -- Clickhouse is a columnar store and Timescale is a row-oriented store that is specialized for time series data with some benefits of columnar stores[0].
Something like InfluxDB is a better thing to compare to TimescaleDB (and TimescaleDB does very well, though the benchmark was a bit old[1] and influx might have improved in the meantime).
Database types aside, what really gets me excited about Timescale is that it's just another Postgres extension. If you're already running a Postgres cluster for your OLTP workloads (web-app-y workloads) and have just a bit of fast-moving time series data (ex. logs, audit logs, event streams, etc), Timescale is only an extension away. You get the usual time-tested battle hardened Postgres, with all it's features and also support for your time series workloads. Yeah you could set up declarative partitioning yourself (it is a postgres feature after all) but why bother when Timescale has done the heavy lifting?
[EDIT] - see the response below -- the benchmark is up to date, and Timescale does even better against the purpose-built tool that is InfluxDB.
> Note: This study was originally published in August 2018, updated in June 2019 and last updated on 3 August 2020.
One thing we continually hear is that the familiarity, trust, ecosystem, maturity, etc. that folks love about Postgres is a huge boon and one deciding factor for their adoption of TimescaleDB.
Just to clarify: those benchmarks about InfluxDB vs. TimescaleDB were fully redone just 2 months ago (August 2020) with the latest versions of both, so should be quite up-to-date. In fact, since the ~year since our last benchmarking, TimescaleDB's performance _relative to InfluxDB_ only significantly increased.
"Version: TimescaleDB version 1.7.1, community edition, with PostgreSQL 12, InfluxDB version 1.8.0 Open Source Edition (the latest non-beta releases for both databases at the time of publishing)."
> One thing we continually hear is that the familiarity, trust, ecosystem, maturity, etc. that folks love about Postgres is a huge boon and one deciding factor for their adoption of TimescaleDB.
Yeah, I think this is huge, I also personally like that you get the extensibility of Postgres as well. Right now custom table access methods are still "cooking" but I think being able to combine a true postgres-native columnar access method with Timescale's benefits would be a game changer. There's also zheap which is still being worked on but if/when it lands postgres will be even better at OLTP workloads and possibly an even better base for Timescale to stand on.
> Just to clarify: those benchmarks about InfluxDB vs. TimescaleDB were fully redone just 2 months ago (August 2020) with the latest versions of both, so should be quite up-to-date. In fact, since the ~year since our last benchmarking, TimescaleDB's performance _relative to InfluxDB_ only significantly increased.
That's fantastic to hear -- I am already sold on Timescale since to get close to a purpose-built tool with a solution built on top of a more general platform is already very impressive, but I will be re-reading the article closely to get more details on the exact trade-offs.
I mean using Clickhouse for time-series, of course.
I understand your point on adding a new feature to your already existing Postgres solution.
It's kinda what I do by using MySQL engine and dictionaries with Clickhouse, I assume.
Yes -- but slightly different, but without the network hop!
MySQL engine for Clickhouse sounds like dblink[0] or foreign data wrappers(fdw)[1] in Postgres. Doing it with Postgres allows for way more flexibility (the data could be local or remote) in this case, and the data will be at home in Postgres, with all the stability, features, operational knowledge (and also bugs/warts of course) that come with Postgres.
You may never get 100% of the performance you'd get from a purpose-built database that doesn't make the choices Postgres makes but the idea of getting 80/90% of the way there, with only one thing to maintain is very exciting to me.
Native column-oriented data warehouses designed for OLAP queries will always be faster. There are multiple alternatives from Clickhouse to Redshift that will be faster.
Originally I didn't like Timescale because it didn't offer anything new but the product has improved greatly over the years. Today it's close on performance by using a custom column-oriented data layer that stores the actual chunks in PostgreSQL rows and has several time-related processing and analytical features (continuous aggregates, time bucketing, smoothing values, etc) that make it easier than doing it yourself in raw SQL.
One of the big advantages is that it allows you to use Postgres which means you can continue to use it as your main OLTP operational database as well. This avoids a lot of complicated polyglot issues like syncing datasets or using different querying systems with different syntax. It's one of the better examples of using Postgres as a data platform rather than a simple database.
There are other alternatives that combine this OLAP+OLTP functionality like Citus (another automatic sharding distributed database extension for Postgres), Vitess (automatic sharded mysql), TiDB (natively distributed mysql interface on top of key/value store), MemSQL (proprietary distributed mysql interface with ram-based rowstores and disk-based columnstores) and SQL Server (with hekaton column-stores, in-memory tables, and scale out).
/When I tried Clickhouse, I managed to segfault it with NULL pointer dereference error. Ok it is anecdotal, and maybe I just had bad luck, same could happen with Postgres etc etc... But anyway: it can be a deciding factor.
(And they fixed it quickly - issue 7955 on github) /