LogTide

Storage Engines

LogTide uses Reservoir, a pluggable storage abstraction that lets you choose between TimescaleDB and ClickHouse for log and span storage. Both engines share the same API — your application code, queries, and dashboards work identically regardless of which engine you run.

Overview

One API, Two Engines

Reservoir is LogTide's internal storage layer (@logtide/reservoir). It provides a single, typed interface for all log and span operations — ingestion, querying, aggregation, deletion — and translates them into optimized native queries for the chosen engine.

App → Reservoir API → TimescaleDB (PostgreSQL)
                    → ClickHouse (columnar OLAP)

All log operations go through Reservoir: ingestion, search, filtering, aggregation, retention cleanup, alert evaluation, Sigma detection, correlation, dashboard stats, and admin monitoring. The engine choice is transparent to the rest of the platform.

Choosing an Engine

TimescaleDB

Default — recommended for most users

  • Single database for everything (logs + app data)
  • No extra services to run
  • Continuous aggregates for fast dashboards
  • Built-in compression (10x+ reduction)
  • Full PostgreSQL ecosystem (extensions, tooling)
  • Ideal for up to ~50K logs/sec
ClickHouse

High-volume & analytics-heavy workloads

  • Columnar storage — blazing-fast analytical queries
  • Handles 100K+ logs/sec with ease
  • Superior compression ratios (20-40x)
  • Native n-gram indexes for full-text search
  • Async inserts for maximum write throughput
  • Best when you already operate ClickHouse
Not Sure? Start with TimescaleDB

TimescaleDB is the default and requires zero extra configuration. It runs inside the same PostgreSQL instance that LogTide already uses for application data. You can always switch to ClickHouse later if your volume grows.

TimescaleDB (Default)

TimescaleDB is a PostgreSQL extension for time-series data. LogTide stores logs in a hypertable that is automatically partitioned by time, compressed, and pruned according to your retention policy.

Key Features

Hypertables: Automatic time-based partitioning (chunks). Queries only scan relevant chunks for the requested time range.

Continuous Aggregates: Pre-computed hourly and daily stats (logs_hourly_stats, logs_daily_stats) that power dashboard charts in milliseconds.

Compression: Chunks older than 1 day are automatically compressed, reducing storage by 10x or more.

Retention: Configurable per-organization. Uses drop_chunks() for efficient bulk deletion.

Full-text Search: PostgreSQL tsvector for word-based search, pg_trgm GIN index for substring search.

Batch Inserts: Uses UNNEST-based batch inserts for high-throughput ingestion.

TimescaleDB is included in the default docker-compose.yml — no extra setup needed.

ClickHouse

ClickHouse is a columnar OLAP database built for analytical queries at scale. When selected as the storage engine, LogTide creates the logs, spans, and traces tables in ClickHouse automatically on startup.

Key Features

Columnar Storage: Only reads the columns needed for each query, making aggregations and scans extremely fast.

MergeTree Engine: Logs are stored in a MergeTree table ordered by (project_id, time) for optimal query performance.

Async Inserts: Writes are batched asynchronously by ClickHouse for maximum throughput without blocking the application.

N-gram Indexes: ngrambf_v1 bloom filter indexes on the message column for fast full-text search.

PREWHERE: Reservoir uses PREWHERE clauses for partition-level filtering before reading data blocks.

Compression: Native LZ4/ZSTD compression achieves 20-40x reduction on log data.

ClickHouse Considerations
  • Extra service: ClickHouse runs as a separate container alongside PostgreSQL (which is still needed for app data like users, orgs, alert rules, etc.).
  • No continuous aggregates: Dashboard queries run against raw data. Still fast for most datasets thanks to columnar storage.
  • Async deletes: DELETE operations in ClickHouse are asynchronous mutations — they execute in the background and may not reflect immediately.

Configuration

The storage engine is selected via a single environment variable. All other settings are optional and have sensible defaults.

Engine Selection

# In your .env file

# Use TimescaleDB (default — no configuration needed)
STORAGE_ENGINE=timescale

# Or use ClickHouse
STORAGE_ENGINE=clickhouse

ClickHouse Environment Variables

Required only when STORAGE_ENGINE=clickhouse:

Variable Description Default
CLICKHOUSE_HOST ClickHouse server hostname localhost
CLICKHOUSE_PORT HTTP interface port 8123
CLICKHOUSE_DATABASE Database name logtide
CLICKHOUSE_USERNAME Authentication username default
CLICKHOUSE_PASSWORD Authentication password (empty)

Docker Compose Setup

TimescaleDB (Default)

No changes needed — TimescaleDB is the default engine. Just run:

docker compose up -d

ClickHouse

The default docker-compose.yml includes a ClickHouse service behind a Docker Compose profile. To enable it:

# 1. Set the storage engine in your .env
STORAGE_ENGINE=clickhouse

# 2. Start with the ClickHouse profile
docker compose --profile clickhouse up -d

# With Fluent Bit log collection:
docker compose --profile clickhouse --profile logging up -d

This starts a clickhouse/clickhouse-server:24.1 container alongside the existing PostgreSQL instance. PostgreSQL is still required for application data (users, organizations, projects, alert rules, etc.) — only logs, spans, and traces are stored in ClickHouse.

Auto-Initialization

When using ClickHouse, Reservoir automatically creates the required tables (logs, spans, traces) on first startup. No manual migration is needed. For TimescaleDB, tables are managed by the existing Kysely migration system.

Switching Engines

Data Is Not Migrated Automatically

Changing the STORAGE_ENGINE variable only changes where new logs are written. Existing data stays in the previous engine. Use the migration script to copy historical logs, spans, and traces to the new engine.

TimescaleDB → ClickHouse

  1. 1. Stop the backend and worker: docker compose stop backend worker
  2. 2. Set STORAGE_ENGINE=clickhouse in your .env
  3. 3. Start with the ClickHouse profile: docker compose --profile clickhouse up -d
  4. 4. Reservoir creates the ClickHouse tables on startup
  5. 5. Run the migration script to copy historical data
  6. 6. Start backend and worker — new logs flow into ClickHouse

ClickHouse → TimescaleDB

  1. 1. Stop the backend and worker
  2. 2. Set STORAGE_ENGINE=timescale (or remove the variable entirely)
  3. 3. Run the migration script to copy historical data
  4. 4. Restart: docker compose up -d
  5. 5. Optionally stop the ClickHouse container if no longer needed

Data Migration

LogTide includes a built-in migration script that copies logs, spans, and traces between engines. It preserves log IDs so that existing references (detection events, exceptions, error groups) continue to work after migration.

What Gets Migrated

Only the 3 tables managed by Reservoir are migrated:

  • logs — core log data (IDs preserved)
  • spans — OpenTelemetry spans
  • traces — aggregated trace records

Everything else (users, organizations, projects, alert rules, Sigma rules, incidents, etc.) stays in PostgreSQL and is not affected.

Basic Usage

# From the backend directory
cd packages/backend

# Auto-detect direction (current STORAGE_ENGINE → the other)
pnpm migrate:storage

# Explicit direction
pnpm migrate:storage -- --from timescale --to clickhouse

# Dry run — show record counts without migrating
pnpm migrate:storage -- --dry-run

Options

Flag Description Default
--from <engine> Source engine current STORAGE_ENGINE
--to <engine> Destination engine the other engine
--dry-run Show record counts only, don't write false
--project-id <uuid> Migrate a single project all projects
--batch-size <n> Records per batch 5000
--skip-validation Skip post-migration count check false
Resumable

The migration is safe to interrupt and resume. It tracks how many records already exist in the destination and picks up where it left off. If you stop the script mid-way (Ctrl+C), just run the same command again to continue.

ID Preservation

Log IDs are preserved during migration. This is important because detection events, exceptions, and error groups reference logs by ID. Without ID preservation, those cross-references would break silently after switching engines.

Example Output

Storage Migration: timescale → clickhouse
==================================================

Found 3 projects across 2 organizations

[1/3] api-service (acme-corp)
  Logs:   source=1,234,567 | dest=500,000 | to migrate=734,567
  Spans:  source=456,789   | dest=0       | to migrate=456,789
  Traces: source=12,345    | dest=0       | to migrate=12,345

  Logs: 100.0% | 734,567/734,567 | done in 2m 24s
  Spans: 100.0% | 456,789/456,789 | done in 58s
  Traces: 100.0% | 12,345/12,345 | done in 4s
  Validation: logs OK | spans OK | traces OK

[2/3] web-app (acme-corp)
  Already migrated.

Summary
==================================================
Projects: 3/3 completed
Logs:     2,500,000 migrated
Spans:    800,000 migrated
Traces:   25,000 migrated
Time:     5m 12s
Errors:   0

How Reservoir Works

Reservoir is an internal package (@logtide/reservoir) that abstracts all log and span I/O behind a unified TypeScript API. Here's what it handles:

Log Operations
  • ingest() — Batch insert logs
  • query() — Search with filters, pagination
  • aggregate() — Time-bucketed stats
  • count() — Filtered log counts
  • distinct() — Unique field values
  • topValues() — Top N values by count
  • getById() / getByIds() — Direct lookups
  • deleteByTimeRange() — Retention cleanup
Span & Trace Operations
  • ingestSpans() — Batch insert spans
  • upsertTrace() — Create/update trace records
  • querySpans() — Search spans
  • queryTraces() — Search traces
  • getSpansByTraceId() — Trace detail view
  • getServiceDependencies() — Service map
  • healthCheck() — Engine connectivity
  • getEngineType() — Runtime engine info
Engine-Specific Optimizations

Reservoir doesn't just wrap SQL — each engine has its own query translator that generates optimized native queries:

Feature TimescaleDB ClickHouse
Batch insert UNNEST($1::text[], ...) INSERT ... FORMAT JSONEachRow
Full-text search tsvector @@ tsquery hasToken() / positionCaseInsensitive()
Substring search pg_trgm GIN index ngrambf_v1 bloom filter
Time filtering Chunk pruning PREWHERE partition key
Aggregation Continuous aggregates Raw GROUP BY (columnar scan)
Deletion drop_chunks() / DELETE ALTER TABLE DELETE (async)
Engine Branching

Some features have engine-specific fast paths. For example, dashboard charts use TimescaleDB continuous aggregates when available, but fall back to raw Reservoir queries on ClickHouse. The application detects the engine at runtime via reservoir.getEngineType() and chooses the optimal code path automatically.