Getting started with TimescaleDB
TimescaleDB is an extension of PostgreSQL (it uses PostgreSQL as its underlying database engine) adding additional features specifically for handling time-series data. As such, the integration of TimescaleDB in the DBOD service should be seen as providing an enhancement to the existing PostgreSQL offering, rather than an entirely new database type.
TimescaleDB extends PostgreSQL with functionalities like hypertables, continuous aggregates, and more efficient storage mechanisms for time-series data like chunks and compression. While PostgreSQL itself can handle time-series data, TimescaleDB provides performance optimizations and ease-of-use features specifically for these types of workloads, making it similar in use cases to InfluxDB.
TimescaleDB Community Edition is now available in DBOD for PostgreSQL release 14 and higher. As described in our PG extensions documentation, if you would like to use TimescaleDB, please open us a SNOW ticket mentioning in which instance, database, and schema (if not the default one) you need it installed. We will install and enable it for you, as superuser privileges are required.
Please refer to the Timescale Docs for latest documentation and tutorials.
Bear in mind that TimescaleDB Community is made available under the Timescale License ("TSL"). To permit its usage as part of the DBOD service offering, CERN has entered into a Licensing and Publicity Agreement with Timescale Inc. This agreement includes the following key aspects relevant to users requesting TimescaleDB at CERN:
- License: CERN holds a non-exclusive, worldwide, limited license to use TimescaleDB Community Edition. This license also extends to personnel from collaborating institutions working on Approved Experiments at CERN. Usage outside of these experiments is not permitted.
- Term and Renewal: The agreement has an initial term of three years, beginning from September 24, 2024, and will auto-renew for two additional years unless either party gives a one-year notice of non-renewal.
- Early Termination: Either party may terminate the agreement if the other is in material breach of its terms and fails to remedy the breach within 30 days of receiving notice.
Initial configuration tuning and data loading
Because of their very fundamental aim of storing series of data to identify trends and patterns over time, Time Series data volumes can easily become significantly large. The accumulation of historical data is normally occurring gradually but can often imply a massive initial loading of data already gathered in a temporary buffer or alternative storage.
Large data volumes have direct impact on the storage space needs but they can also act as a amplifier on CPUs and memory requirements, depending on the level and type of expected activity.
PostgreSQL and TimescaleDB have a number of configuration parameters for controlling many different aspects allowing the tuning of the instance according to the data volumes and other requirements.
Estimating initial data size and growth rate
The first task to accomplish, when requesting a TimescaleDB instance, is to estimate the initial data size and the expected growth rate for the coming months and years together with the expected type and level of activity in terms of concurrent connections and queries (time window, complexity, joins, frequency, response time) and especially to estimate a reasonable working set and corresponding memory size i.e. the data and index size which should stay in memory in order to avoid a slow data retrieval, especially when the storage is based on hybrid spinning disks and not pure SSD.
A good way to estimate the initial load size and the future growth rate is by loading a significant and representative sample of real or synthetic data and verify the obtained results before and after compression and with different chunk size.
Allocate CPU, memory and storage according to requirements and tune PostgreSQL and timescaleDB parameters
Once the estimation of the initial data size and growth rate are translated in reasonably correct figures, the next step is to have the required resource in place. You can start with a smaller configuration and experiment with data ingestion and queries of a representative sample and request more when you collect evidence of poor performance due to lack of resources. After every change in resource allocation a proper tuning of configuration parameters, using the timescaledb tuning tool, must be executed in order to properly leverage and benefit from the resources allocated.
Identifying the best chunk size
Hypertables are expected to be partitioned in chunks. One of the main initial tuning effort should be aimed at 
identifying a chunk size which is a good compromise between easy management, continuos aggregate performance, compression 
rate and speed of data retrieval, possibly across all different type and level of activity but mostly optimizing the 
more important and critical queries.
This exercise can not be done properly until the sizing of the instance and a corresponding correct resources allocation are in place. Not enough resources and a poor configuration can produce very different results when testing different chunk sizes resulting in suboptimal choices.
You might need to load and reload data mamy times and this can result in significant WALs file production
More often than not, the initial phases of setting up an instance are the result of guess estimates and trials and errors. A cautious approach should be used to do things incrementally and not always loading the full amount of data, as this can easily produce massive amounts of (mostly useless) WALs files, which can affect the shared storage in terms of saturation and allocated space. The initial final loading should be ideally executed only after many relatively smaller loadings have been successfully and thouroughly checked under all aspects.
Use compression and load data in chronological order
Compression in TimescaleDB is often achieving rates up to 80-90%. Enabling this feature is paramount to reduce the storage footprint but, more importantly, to improve significantly query performance of historical data. Because of the way it works, it does not make sense to compress live data (i.e. the chunks where data is currently loading) as they would be continously uncompressed to allow insertions and later compressed again while generating huge amount of WALs. Fully loaded read only chunks are instead very good candidates BUT it is paramount to load data in chronological order to allow compression to kick in, when a chunk is complete, and avoid recurring compress/uncompress operations.