LogTide
Operations
Medium
· SaaS, E-commerce, DevOps, Fintech

Real-Time Log Alerting

Configure intelligent log-based alerts with LogTide. Threshold alerts, anomaly detection, and notification routing to Slack and email.

Threshold-based alerts Email & webhook notifications Sigma detection rules Alert fatigue reduction

Good alerting tells you about problems before your users do. Bad alerting wakes you up at 3 AM for nothing. This guide shows how to build effective log-based alerting with LogTide that catches real issues without alert fatigue.

The Alerting Problem

Too Many Alerts

🔔 3:01 AM - Error count > 5 in service: worker (5 errors)
🔔 3:01 AM - Error count > 5 in service: worker (6 errors)
🔔 3:02 AM - Error count > 5 in service: api (7 errors)
🔔 3:02 AM - Error count > 5 in service: worker (8 errors)
🔔 3:03 AM - Error count > 5 in service: api (12 errors)
... 47 more alerts ...

Result: Engineer mutes notifications, misses the real incident next week.

Too Few Alerts

(silence)

User report at 9 AM: “We haven’t been able to log in since midnight.”

Result: 9-hour outage, SLA violation, angry customers.

The Right Balance

Effective alerting has three properties:

  1. Actionable - every alert requires human action
  2. Timely - fires within minutes of the issue
  3. Contextual - tells you what to investigate

LogTide Alerting Features

LogTide provides two alerting mechanisms:

  1. Alert Rules - Threshold-based alerts on log volume, error rates, or patterns
  2. Sigma Detection Rules - Pattern-based security detection (brute force, anomalies)

Both support:

  • Email notifications
  • Webhook notifications (Slack, PagerDuty, Teams, etc.)
  • Configurable time windows and thresholds

Implementation

1. Alert Rules via API

Create alert rules programmatically:

# Create an alert rule
curl -X POST "http://logtide.internal:8080/api/v1/alerts" \
  -H "Authorization: Bearer YOUR_SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "organizationId": "your-org-id",
    "projectId": "your-project-id",
    "name": "High Error Rate - API",
    "enabled": true,
    "service": "api",
    "level": ["error", "critical"],
    "threshold": 50,
    "timeWindow": 5,
    "emailRecipients": ["[email protected]"],
    "webhookUrl": "https://hooks.slack.com/services/xxx/yyy/zzz"
  }'

2. Essential Alert Rules

Here are the alert rules every production system should have:

Error Rate Alert (per service)

{
  "name": "High Error Rate - API",
  "enabled": true,
  "service": "api",
  "level": ["error", "critical"],
  "threshold": 50,
  "timeWindow": 5,
  "emailRecipients": ["[email protected]"],
  "webhookUrl": "https://hooks.slack.com/services/xxx"
}

Critical Error Alert (any service)

{
  "name": "Critical Errors",
  "enabled": true,
  "level": ["critical"],
  "threshold": 1,
  "timeWindow": 1,
  "emailRecipients": ["[email protected]", "[email protected]"],
  "webhookUrl": "https://hooks.slack.com/services/xxx"
}

Service Down Alert

{
  "name": "Health Check Failures",
  "enabled": true,
  "service": "health",
  "searchQuery": "critical health check failure",
  "threshold": 3,
  "timeWindow": 5,
  "emailRecipients": ["[email protected]"],
  "webhookUrl": "https://hooks.slack.com/services/xxx"
}

3. Slack Webhook Integration

LogTide sends webhook payloads you can route to Slack:

{
  "alert": {
    "name": "High Error Rate - API",
    "threshold": 50,
    "timeWindow": 5,
    "currentCount": 73
  },
  "triggeredAt": "2025-02-01T03:15:00Z",
  "service": "api",
  "level": ["error", "critical"],
  "sampleLogs": [
    {
      "timestamp": "2025-02-01T03:14:58Z",
      "level": "error",
      "message": "Database connection timeout",
      "metadata": { "host": "db-primary", "timeout_ms": 5000 }
    }
  ]
}

To integrate with Slack, use an incoming webhook URL from your Slack workspace settings.

4. PagerDuty Integration via Webhook

Route critical alerts to PagerDuty by using a webhook middleware:

// webhook-router/index.ts
import express from 'express';

const app = express();
app.use(express.json());

app.post('/logtide-alert', async (req, res) => {
  const alert = req.body;

  // Route based on severity
  if (alert.alert.name.includes('Critical')) {
    // PagerDuty for critical alerts
    await fetch('https://events.pagerduty.com/v2/enqueue', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        routing_key: process.env.PAGERDUTY_ROUTING_KEY,
        event_action: 'trigger',
        dedup_key: `logtide-${alert.alert.name}-${alert.service}`,
        payload: {
          summary: `${alert.alert.name}: ${alert.alert.currentCount} events in ${alert.alert.timeWindow}m`,
          severity: 'critical',
          source: 'LogTide',
          custom_details: alert,
        },
      }),
    });
  }

  // Slack for all alerts
  await fetch(process.env.SLACK_WEBHOOK_URL!, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      text: `🚨 *${alert.alert.name}*\nService: ${alert.service}\nCount: ${alert.alert.currentCount} (threshold: ${alert.alert.threshold})\nTime: ${alert.triggeredAt}`,
    }),
  });

  res.json({ ok: true });
});

app.listen(3001);

5. Sigma Rules for Security Alerts

LogTide’s built-in Sigma support detects security patterns:

# Brute force detection
title: Brute Force Login Attempt
status: stable
level: high
logsource:
  category: authentication
detection:
  selection:
    message|contains: "login failed"
  timeframe: 5m
  condition: selection | count() > 10
tags:
  - attack.credential_access
  - attack.t1110
# Privilege escalation
title: Suspicious Admin Role Assignment
status: stable
level: critical
logsource:
  category: audit
detection:
  selection:
    action: "role.update"
    new_state|contains: "admin"
  condition: selection
tags:
  - attack.privilege_escalation
  - attack.t1078

Import Sigma rules via the LogTide UI at /dashboard/security/sigma.

Alert Design Patterns

Pattern 1: Tiered Severity

Don’t treat all errors the same:

TierConditionNotificationResponse
P1 (Critical)Service down, data loss, security breachPagerDuty + Slack + EmailWake up on-call immediately
P2 (High)Error rate > 5x normal, degraded performanceSlack + EmailInvestigate within 30 minutes
P3 (Medium)Elevated errors, non-critical failuresSlack channelInvestigate during business hours
P4 (Low)Warnings, unusual patternsWeekly digestReview in next sprint

Pattern 2: Alert on Symptoms, Not Causes

// ❌ BAD: Alert on cause (too specific, many false positives)
{
  "name": "Database Error",
  "searchQuery": "ECONNREFUSED",
  "threshold": 1,
  "timeWindow": 1
}

// ✅ GOOD: Alert on symptom (catches the real impact)
{
  "name": "API Error Rate High",
  "service": "api",
  "level": ["error"],
  "threshold": 50,
  "timeWindow": 5
}

Pattern 3: Percentage-Based Thresholds

Absolute thresholds break when traffic changes. Use percentages when possible:

// Alert setup: Monitor error percentage
// Check every 5 minutes
async function checkErrorRate(service: string) {
  const total = await logtide.count({
    service,
    from: '-5m',
  });

  const errors = await logtide.count({
    service,
    level: 'error',
    from: '-5m',
  });

  const errorRate = total > 0 ? (errors / total) * 100 : 0;

  if (errorRate > 5) { // > 5% error rate
    logger.critical('Error rate threshold exceeded', {
      service,
      errorRate: errorRate.toFixed(2),
      totalRequests: total,
      errorCount: errors,
    });
  }
}

Pattern 4: Alert Deduplication

Avoid alert storms by deduplicating:

// Use consistent dedup keys
// LogTide will only fire once per unique combination
// within the time window
{
  "name": "Service Error Rate",
  "service": "api",        // Scoped to service
  "timeWindow": 5,         // 5-minute window
  // Only one alert per service per 5 minutes
}

Reducing Alert Fatigue

Step 1: Audit Current Alerts

List all your alerts and classify:

AlertLast TriggeredActionable?Keep/Modify/Delete
Error > 5DailyNo (too sensitive)Modify: threshold to 50
CPU > 90%NeverN/ADelete
5xx rate > 1%WeeklyYesKeep
Disk > 80%MonthlyYesKeep

Step 2: Apply the SRE Alert Framework

For each alert, ask:

  1. Does this alert require human action? If no, delete it.
  2. Can this wait until business hours? If yes, make it P3/P4.
  3. Is the threshold set correctly? Tune based on historical data.
  4. Is the alert well-documented? Add a runbook link.

Step 3: Implement Alert Routing

Not every alert needs to page someone:

P1 (Critical) → PagerDuty → Phone call
P2 (High)     → Slack #incidents → Investigate within 30m
P3 (Medium)   → Slack #alerts → Business hours
P4 (Low)      → Email digest → Weekly review

Alert Configuration Checklist

  • Essential Alerts

    • Error rate per critical service
    • Critical/fatal errors (any service)
    • Health check failures
    • Authentication failures (brute force)
    • Deployment events (verify after deploy)
  • Notification Routing

    • P1 alerts → PagerDuty (phone)
    • P2 alerts → Slack #incidents
    • P3 alerts → Slack #alerts
    • P4 alerts → Email digest
  • Alert Quality

    • Every alert has a runbook
    • Thresholds tuned to avoid false positives
    • Alert fatigue audit quarterly
    • On-call handoff process documented
  • Security Alerts (Sigma)

    • Brute force detection enabled
    • Privilege escalation monitoring
    • Suspicious IP detection
    • After-hours admin activity

Common Pitfalls

1. “Alert on every error”

You’ll get 500 alerts per day and ignore them all.

Solution: Alert on error rates, not individual errors.

2. “Set thresholds once and forget”

As traffic grows, static thresholds become useless.

Solution: Review thresholds quarterly. Consider percentage-based thresholds.

3. “No runbook for alerts”

An alert without a runbook just says “something is wrong, figure it out.”

Solution: Every alert should link to a runbook with investigation steps.

4. “Same notification for everything”

If everything pages, nothing is important.

Solution: Tier your alerts. Only P1 should wake someone up.

Next Steps


Ready to set up intelligent alerting?

Frequently Asked Questions

Does LogTide support real-time alerting on log events?

Yes. LogTide provides threshold-based alert rules that evaluate log volume, error rates, and search patterns within configurable time windows. When a threshold is crossed, LogTide immediately delivers notifications via email and webhooks, enabling routing to Slack, PagerDuty, Microsoft Teams, or any webhook-compatible tool.

How does LogTide help reduce alert fatigue for on-call engineers?

LogTide recommends alerting on error rates within a time window rather than on every individual error, and supports per-service scoping so alerts only fire when a meaningful volume of errors accumulates. Combining tiered severity levels with deduplication within the alert time window prevents the same issue from generating dozens of duplicate notifications.

What is Sigma rule support in LogTide alerting?

LogTide has built-in support for Sigma detection rules, an open standard for log-based threat detection. You can import Sigma rules directly through the LogTide dashboard to detect patterns such as brute force login attempts or suspicious privilege escalation, without writing custom alerting logic from scratch.

Can LogTide route critical alerts to PagerDuty and lower-priority alerts to Slack?

Yes. LogTide fires alerts to a webhook endpoint of your choice, and you can build a lightweight webhook router that inspects the alert payload and forwards it to PagerDuty for critical events while sending all other alerts to a Slack channel. The alert payload includes the alert name, service, threshold, current count, and sample log entries to give responders immediate context.