Storage Engines

LogTide uses Reservoir, a pluggable storage abstraction that lets you choose between TimescaleDB and ClickHouse for log and span storage. Both engines share the same API — your application code, queries, and dashboards work identically regardless of which engine you run.

Overview

One API, Two Engines

Reservoir is LogTide's internal storage layer (@logtide/reservoir). It provides a single, typed interface for all log and span operations — ingestion, querying, aggregation, deletion — and translates them into optimized native queries for the chosen engine.

App → Reservoir API → TimescaleDB (PostgreSQL)
                    → ClickHouse (columnar OLAP)

All log operations go through Reservoir: ingestion, search, filtering, aggregation, retention cleanup, alert evaluation, Sigma detection, correlation, dashboard stats, and admin monitoring. The engine choice is transparent to the rest of the platform.

Choosing an Engine

TimescaleDB

Default — recommended for most users

Single database for everything (logs + app data)
No extra services to run
Continuous aggregates for fast dashboards
Built-in compression (10x+ reduction)
Full PostgreSQL ecosystem (extensions, tooling)
Ideal for up to ~50K logs/sec

ClickHouse

High-volume & analytics-heavy workloads

Columnar storage — blazing-fast analytical queries
Handles 100K+ logs/sec with ease
Superior compression ratios (20-40x)
Native n-gram indexes for full-text search
Async inserts for maximum write throughput
Best when you already operate ClickHouse

Not Sure? Start with TimescaleDB

TimescaleDB is the default and requires zero extra configuration. It runs inside the same PostgreSQL instance that LogTide already uses for application data. You can always switch to ClickHouse later if your volume grows.

TimescaleDB (Default)

TimescaleDB is a PostgreSQL extension for time-series data. LogTide stores logs in a hypertable that is automatically partitioned by time, compressed, and pruned according to your retention policy.

Key Features

Hypertables: Automatic time-based partitioning (chunks). Queries only scan relevant chunks for the requested time range.

Continuous Aggregates: Pre-computed hourly and daily stats (logs_hourly_stats, logs_daily_stats) that power dashboard charts in milliseconds.

Compression: Chunks older than 1 day are automatically compressed, reducing storage by 10x or more.

Retention: Configurable per-organization. Uses drop_chunks() for efficient bulk deletion.

Full-text Search: PostgreSQL tsvector for word-based search, pg_trgm GIN index for substring search.

Batch Inserts: Uses UNNEST-based batch inserts for high-throughput ingestion.

TimescaleDB is included in the default docker-compose.yml — no extra setup needed.

ClickHouse

ClickHouse is a columnar OLAP database built for analytical queries at scale. When selected as the storage engine, LogTide creates the logs, spans, and traces tables in ClickHouse automatically on startup.

Key Features

Columnar Storage: Only reads the columns needed for each query, making aggregations and scans extremely fast.

MergeTree Engine: Logs are stored in a MergeTree table ordered by (project_id, time) for optimal query performance.

Async Inserts: Writes are batched asynchronously by ClickHouse for maximum throughput without blocking the application.

N-gram Indexes: ngrambf_v1 bloom filter indexes on the message column for fast full-text search.

PREWHERE: Reservoir uses PREWHERE clauses for partition-level filtering before reading data blocks.

Compression: Native LZ4/ZSTD compression achieves 20-40x reduction on log data.

ClickHouse Considerations

Extra service: ClickHouse runs as a separate container alongside PostgreSQL (which is still needed for app data like users, orgs, alert rules, etc.).
No continuous aggregates: Dashboard queries run against raw data. Still fast for most datasets thanks to columnar storage.
Async deletes: DELETE operations in ClickHouse are asynchronous mutations — they execute in the background and may not reflect immediately.

Configuration

The storage engine is selected via a single environment variable. All other settings are optional and have sensible defaults.

Engine Selection

# In your .env file

# Use TimescaleDB (default — no configuration needed)
STORAGE_ENGINE=timescale

# Or use ClickHouse
STORAGE_ENGINE=clickhouse

ClickHouse Environment Variables

Required only when STORAGE_ENGINE=clickhouse:

Variable	Description	Default
CLICKHOUSE_HOST	ClickHouse server hostname	localhost
CLICKHOUSE_PORT	HTTP interface port	8123
CLICKHOUSE_DATABASE	Database name	logtide
CLICKHOUSE_USERNAME	Authentication username	default
CLICKHOUSE_PASSWORD	Authentication password	(empty)

Docker Compose Setup

TimescaleDB (Default)

No changes needed — TimescaleDB is the default engine. Just run:

docker compose up -d

ClickHouse

The default docker-compose.yml includes a ClickHouse service behind a Docker Compose profile. To enable it:

# 1. Set the storage engine in your .env
STORAGE_ENGINE=clickhouse

# 2. Start with the ClickHouse profile
docker compose --profile clickhouse up -d

# With Fluent Bit log collection:
docker compose --profile clickhouse --profile logging up -d

This starts a clickhouse/clickhouse-server:24.1 container alongside the existing PostgreSQL instance. PostgreSQL is still required for application data (users, organizations, projects, alert rules, etc.) — only logs, spans, and traces are stored in ClickHouse.

Auto-Initialization

When using ClickHouse, Reservoir automatically creates the required tables (logs, spans, traces) on first startup. No manual migration is needed. For TimescaleDB, tables are managed by the existing Kysely migration system.

Switching Engines

Data Is Not Migrated Automatically

Changing the STORAGE_ENGINE variable only changes where new logs are written. Existing data stays in the previous engine. Use the migration script to copy historical logs, spans, and traces to the new engine.

TimescaleDB → ClickHouse

1. Stop the backend and worker: docker compose stop backend worker
2. Set STORAGE_ENGINE=clickhouse in your .env
3. Start with the ClickHouse profile: docker compose --profile clickhouse up -d
4. Reservoir creates the ClickHouse tables on startup
5. Run the migration script to copy historical data
6. Start backend and worker — new logs flow into ClickHouse

ClickHouse → TimescaleDB

1. Stop the backend and worker
2. Set STORAGE_ENGINE=timescale (or remove the variable entirely)
3. Run the migration script to copy historical data
4. Restart: docker compose up -d
5. Optionally stop the ClickHouse container if no longer needed

Data Migration

LogTide includes a built-in migration script that copies logs, spans, and traces between engines. It preserves log IDs so that existing references (detection events, exceptions, error groups) continue to work after migration.

What Gets Migrated

Only the 3 tables managed by Reservoir are migrated:

logs — core log data (IDs preserved)
spans — OpenTelemetry spans
traces — aggregated trace records

Everything else (users, organizations, projects, alert rules, Sigma rules, incidents, etc.) stays in PostgreSQL and is not affected.

Basic Usage

# From the backend directory
cd packages/backend

# Auto-detect direction (current STORAGE_ENGINE → the other)
pnpm migrate:storage

# Explicit direction
pnpm migrate:storage -- --from timescale --to clickhouse

# Dry run — show record counts without migrating
pnpm migrate:storage -- --dry-run

Options

Flag	Description	Default
--from <engine>	Source engine	current STORAGE_ENGINE
--to <engine>	Destination engine	the other engine
--dry-run	Show record counts only, don't write	false
--project-id <uuid>	Migrate a single project	all projects
--batch-size <n>	Records per batch	5000
--skip-validation	Skip post-migration count check	false

Resumable

The migration is safe to interrupt and resume. It tracks how many records already exist in the destination and picks up where it left off. If you stop the script mid-way (Ctrl+C), just run the same command again to continue.

ID Preservation

Log IDs are preserved during migration. This is important because detection events, exceptions, and error groups reference logs by ID. Without ID preservation, those cross-references would break silently after switching engines.

Example Output

Storage Migration: timescale → clickhouse
==================================================

Found 3 projects across 2 organizations

[1/3] api-service (acme-corp)
  Logs:   source=1,234,567 | dest=500,000 | to migrate=734,567
  Spans:  source=456,789   | dest=0       | to migrate=456,789
  Traces: source=12,345    | dest=0       | to migrate=12,345

  Logs: 100.0% | 734,567/734,567 | done in 2m 24s
  Spans: 100.0% | 456,789/456,789 | done in 58s
  Traces: 100.0% | 12,345/12,345 | done in 4s
  Validation: logs OK | spans OK | traces OK

[2/3] web-app (acme-corp)
  Already migrated.

Summary
==================================================
Projects: 3/3 completed
Logs:     2,500,000 migrated
Spans:    800,000 migrated
Traces:   25,000 migrated
Time:     5m 12s
Errors:   0

How Reservoir Works

Reservoir is an internal package (@logtide/reservoir) that abstracts all log and span I/O behind a unified TypeScript API. Here's what it handles:

Log Operations

ingest() — Batch insert logs
query() — Search with filters, pagination
aggregate() — Time-bucketed stats
count() — Filtered log counts
distinct() — Unique field values
topValues() — Top N values by count
getById() / getByIds() — Direct lookups
deleteByTimeRange() — Retention cleanup

Span & Trace Operations

ingestSpans() — Batch insert spans
upsertTrace() — Create/update trace records
querySpans() — Search spans
queryTraces() — Search traces
getSpansByTraceId() — Trace detail view
getServiceDependencies() — Service map
healthCheck() — Engine connectivity
getEngineType() — Runtime engine info

Engine-Specific Optimizations

Reservoir doesn't just wrap SQL — each engine has its own query translator that generates optimized native queries:

Feature	TimescaleDB	ClickHouse
Batch insert	UNNEST($1::text[], ...)	INSERT ... FORMAT JSONEachRow
Full-text search	tsvector @@ tsquery	hasToken() / positionCaseInsensitive()
Substring search	pg_trgm GIN index	ngrambf_v1 bloom filter
Time filtering	Chunk pruning	PREWHERE partition key
Aggregation	Continuous aggregates	Raw GROUP BY (columnar scan)
Deletion	drop_chunks() / DELETE	ALTER TABLE DELETE (async)

Engine Branching

Some features have engine-specific fast paths. For example, dashboard charts use TimescaleDB continuous aggregates when available, but fall back to raw Reservoir queries on ClickHouse. The application detects the engine at runtime via reservoir.getEngineType() and chooses the optimal code path automatically.

Edit this page on GitHub