> ## Documentation Index
> Fetch the complete documentation index at: https://docs.logfleet.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Performance & Capacity Planning

> Hardware requirements, benchmarks, and tuning guidelines for LogFleet deployments

## Overview

LogFleet is designed for high-throughput edge deployments. This guide covers hardware requirements, performance benchmarks, capacity planning, and tuning recommendations for different scales.

<Note>
  All benchmarks were conducted on standard hardware configurations. Your results may vary based on log complexity, network conditions, and workload patterns.
</Note>

***

## Hardware Requirements

### Minimum Requirements (Development/Testing)

For local development and small-scale testing:

| Component | Specification |
| --------- | ------------- |
| CPU       | 2 cores       |
| RAM       | 4 GB          |
| Storage   | 20 GB SSD     |
| Network   | 10 Mbps       |

```bash theme={null}
# Docker resource limits for development
docker run -d \
  --cpus="2" \
  --memory="4g" \
  logfleet/edge-agent
```

### Recommended (Single Edge Location)

For production single-location deployments handling typical retail/IoT workloads:

| Component | Specification                  | Notes                                       |
| --------- | ------------------------------ | ------------------------------------------- |
| CPU       | 4 cores (Intel i5/AMD Ryzen 5) | Vector benefits from multiple cores         |
| RAM       | 8 GB                           | 4 GB for Vector, 2 GB for Loki, 2 GB OS     |
| Storage   | 100 GB NVMe SSD                | Scales with retention period                |
| Network   | 100 Mbps                       | For metric shipping and on-demand streaming |

**Expected throughput**: 10,000-50,000 logs/second

### Production (High-Volume Location)

For high-volume locations (large retail stores, manufacturing floors):

| Component | Specification           | Notes                                   |
| --------- | ----------------------- | --------------------------------------- |
| CPU       | 8 cores (Intel i7/Xeon) | Enables parallel processing             |
| RAM       | 16 GB                   | Larger buffers, more concurrent queries |
| Storage   | 500 GB NVMe SSD         | 30-day retention at high volume         |
| Network   | 1 Gbps                  | Burst capacity for streaming            |

**Expected throughput**: 50,000-200,000 logs/second

### Enterprise (3-Node Cluster)

For mission-critical deployments requiring high availability:

| Component | Per Node  | Total Cluster                |
| --------- | --------- | ---------------------------- |
| CPU       | 8 cores   | 24 cores                     |
| RAM       | 32 GB     | 96 GB                        |
| Storage   | 1 TB NVMe | 3 TB (with replication)      |
| Network   | 10 Gbps   | Dedicated management network |

**Expected throughput**: 500,000+ logs/second with HA

***

## Performance Benchmarks

### Log Ingestion Throughput

Measured on recommended single-location hardware (4 cores, 8 GB RAM):

| Log Size  | Throughput    | CPU Usage | Memory |
| --------- | ------------- | --------- | ------ |
| 256 bytes | 85,000 logs/s | 65%       | 2.1 GB |
| 512 bytes | 62,000 logs/s | 72%       | 2.4 GB |
| 1 KB      | 45,000 logs/s | 78%       | 2.8 GB |
| 4 KB      | 18,000 logs/s | 85%       | 3.2 GB |

### Log-to-Metric Extraction

Vector's `log_to_metric` transform performance:

| Metrics per Log | Throughput Impact | CPU Overhead |
| --------------- | ----------------- | ------------ |
| 1 metric        | -5%               | +8%          |
| 3 metrics       | -12%              | +15%         |
| 5 metrics       | -18%              | +22%         |
| 10 metrics      | -28%              | +35%         |

<Tip>
  Keep metric extractions under 5 per log for optimal performance. Use aggregation for high-cardinality data.
</Tip>

### Query Latency (Loki)

Query performance on 7-day retention with 50GB data:

| Query Type                        | Latency (p50) | Latency (p99) |
| --------------------------------- | ------------- | ------------- |
| Simple filter (`{service="api"}`) | 45ms          | 180ms         |
| Regex match (`\|~ "error"`)       | 120ms         | 450ms         |
| JSON parsing (`\| json`)          | 200ms         | 800ms         |
| Aggregation (`count_over_time`)   | 350ms         | 1.2s          |
| Full-text search                  | 500ms         | 2.5s          |

### Network Bandwidth

Metric shipping bandwidth (compressed, to cloud):

| Locations | Metrics/min | Bandwidth |
| --------- | ----------- | --------- |
| 10        | 6,000       | 50 KB/s   |
| 100       | 60,000      | 500 KB/s  |
| 1,000     | 600,000     | 5 MB/s    |
| 10,000    | 6,000,000   | 50 MB/s   |

**Log streaming bandwidth** (when enabled):

* Typical: 1-10 MB/s per location
* Peak: 50-100 MB/s during incident investigation

***

## Capacity Planning

### Storage Calculator

Estimate storage requirements based on your workload:

```
Daily Storage = (logs_per_second × avg_log_size × 86400) ÷ compression_ratio

Where:
- compression_ratio ≈ 5-10x for Loki (typical logs)
- Add 20% overhead for indexes
```

**Example calculations:**

| Logs/sec | Avg Size | Retention | Raw Data | Compressed |
| -------- | -------- | --------- | -------- | ---------- |
| 1,000    | 512 B    | 7 days    | 302 GB   | 45 GB      |
| 5,000    | 512 B    | 7 days    | 1.5 TB   | 225 GB     |
| 10,000   | 256 B    | 14 days   | 2.4 TB   | 360 GB     |
| 50,000   | 256 B    | 7 days    | 8.6 TB   | 1.3 TB     |

### Memory Sizing

```
Minimum RAM = Vector (1.5 GB) + Loki (1 GB) + OS (1 GB) + Buffer (20%)

Recommended RAM = Vector (3 GB) + Loki (3 GB) + OS (2 GB) + Query Cache (2 GB)
```

**Memory scaling guidelines:**

| Throughput   | Vector | Loki | Total Recommended |
| ------------ | ------ | ---- | ----------------- |
| 10K logs/s   | 2 GB   | 2 GB | 6 GB              |
| 50K logs/s   | 4 GB   | 4 GB | 12 GB             |
| 100K logs/s  | 6 GB   | 6 GB | 16 GB             |
| 200K+ logs/s | 8 GB   | 8 GB | 24 GB             |

### CPU Sizing

```
Base CPU = 2 cores (Vector) + 1 core (Loki) + 1 core (OS)

Scale factor:
- +1 core per 25K logs/s above baseline
- +1 core per 3 metric extractions
- +2 cores if using complex transforms (grok, VRL scripts)
```

***

## Tuning Guidelines

### Vector Configuration

Optimize Vector for your workload:

```yaml theme={null}
# vector.yaml - High throughput configuration
data_dir: /var/lib/vector

# Increase buffer sizes for high volume
buffer:
  type: disk
  max_size: 5368709120  # 5 GB

# HTTP source tuning
sources:
  http_logs:
    type: http_server
    address: "0.0.0.0:8080"
    # Increase for high concurrency
    keepalive:
      max_connection_age_secs: 300
    # Batch incoming requests
    framing:
      method: newline_delimited

# Batch sink writes
sinks:
  loki:
    type: loki
    endpoint: "http://loki:3100"
    batch:
      max_bytes: 10485760  # 10 MB
      max_events: 100000
      timeout_secs: 5
    # Compression
    compression: snappy
    # Request tuning
    request:
      concurrency: 10
      rate_limit_num: 100
      retry_max_duration_secs: 300
```

### Loki Configuration

Optimize Loki for edge deployments:

```yaml theme={null}
# loki-config.yaml - Edge optimized
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  # Increase for high query load
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600

ingester:
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  # Tune chunk sizing
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  max_chunk_age: 1h
  chunk_target_size: 1572864  # 1.5 MB
  # Memory optimization
  max_transfer_retries: 0
  wal:
    enabled: true
    dir: /loki/wal

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
  filesystem:
    directory: /loki/chunks

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

limits_config:
  # Ingestion limits
  ingestion_rate_mb: 50
  ingestion_burst_size_mb: 100
  per_stream_rate_limit: 10MB
  per_stream_rate_limit_burst: 30MB
  # Query limits
  max_query_parallelism: 32
  max_query_series: 10000
  max_entries_limit_per_query: 50000
  # Retention
  retention_period: 168h  # 7 days

query_range:
  align_queries_with_step: true
  cache_results: true
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 500
```

### OS-Level Tuning

For high-throughput Linux deployments:

```bash theme={null}
# /etc/sysctl.d/99-logfleet.conf

# Network tuning
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1

# File descriptor limits
fs.file-max = 2097152
fs.nr_open = 2097152

# Memory tuning
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2

# Apply changes
sudo sysctl -p /etc/sysctl.d/99-logfleet.conf
```

```bash theme={null}
# /etc/security/limits.d/99-logfleet.conf
* soft nofile 1048576
* hard nofile 1048576
* soft nproc 65535
* hard nproc 65535
```

***

## Monitoring & Alerting

### Key Metrics to Monitor

| Metric              | Warning | Critical | Action                          |
| ------------------- | ------- | -------- | ------------------------------- |
| CPU usage           | >70%    | >90%     | Scale up or reduce transforms   |
| Memory usage        | >75%    | >90%     | Increase RAM or reduce buffers  |
| Disk usage          | >70%    | >85%     | Reduce retention or add storage |
| Ingestion rate drop | >20%    | >50%     | Check sources and network       |
| Query latency p99   | >2s     | >5s      | Optimize queries or add cache   |
| Buffer backpressure | >50%    | >80%     | Scale sink capacity             |

### Vector Metrics Endpoint

```yaml theme={null}
# Enable internal metrics
sources:
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 15

sinks:
  prometheus:
    type: prometheus_exporter
    inputs: ["internal_metrics"]
    address: "0.0.0.0:9598"
```

**Key Vector metrics:**

* `vector_component_received_events_total` - Ingestion rate
* `vector_buffer_events` - Buffer pressure
* `vector_component_sent_events_total` - Output rate
* `vector_component_errors_total` - Error rate

### Loki Metrics

Loki exposes Prometheus metrics at `/metrics`:

**Key Loki metrics:**

* `loki_ingester_chunks_stored_total` - Storage growth
* `loki_request_duration_seconds` - Query latency
* `loki_ingester_memory_chunks` - Memory pressure
* `loki_distributor_bytes_received_total` - Ingestion rate

### Sample Prometheus Alerts

```yaml theme={null}
groups:
  - name: logfleet
    rules:
      - alert: HighCPUUsage
        expr: avg(rate(process_cpu_seconds_total[5m])) > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Disk space below 15% on {{ $labels.instance }}"

      - alert: IngestionDrop
        expr: rate(loki_distributor_bytes_received_total[5m]) < 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Log ingestion dropped significantly"

      - alert: HighQueryLatency
        expr: histogram_quantile(0.99, rate(loki_request_duration_seconds_bucket[5m])) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Query latency p99 > 5s"
```

***

## Scaling Strategies

### Vertical Scaling

When to scale up a single node:

| Symptom               | Solution                          |
| --------------------- | --------------------------------- |
| CPU consistently >80% | Add cores or upgrade CPU          |
| Memory pressure / OOM | Add RAM, reduce buffers           |
| Disk I/O bottleneck   | Upgrade to NVMe, add RAID         |
| Query timeouts        | Add RAM for cache, faster storage |

### Horizontal Scaling (Multi-Node)

When to deploy a cluster:

* **High availability requirement** - Deploy 3+ nodes with replication
* **Throughput >200K logs/s** - Distribute ingestion load
* **Multi-tenant isolation** - Separate workloads
* **Geographic distribution** - Regional edge clusters

```yaml theme={null}
# Example 3-node cluster topology
Node 1 (Ingester):
  - Vector (primary)
  - Loki Ingester

Node 2 (Ingester):
  - Vector (replica)
  - Loki Ingester

Node 3 (Query):
  - Loki Querier
  - Grafana
```

***

## Best Practices

<AccordionGroup>
  <Accordion title="Right-size your hardware">
    Start with recommended specs and monitor for 2 weeks before scaling. Over-provisioning wastes resources; under-provisioning causes data loss.
  </Accordion>

  <Accordion title="Use SSDs, not HDDs">
    Loki's write patterns require fast random I/O. NVMe SSDs provide 10-100x better performance than spinning disks.
  </Accordion>

  <Accordion title="Set retention limits">
    Always configure retention limits to prevent disk exhaustion. Ring buffer semantics ensure oldest logs are deleted first.
  </Accordion>

  <Accordion title="Batch sink writes">
    Configure Vector sinks to batch writes. Larger batches reduce network overhead and improve throughput.
  </Accordion>

  <Accordion title="Limit metric cardinality">
    High-cardinality labels (user IDs, request IDs) explode storage. Use log fields for high-cardinality data, labels for low-cardinality.
  </Accordion>

  <Accordion title="Monitor buffer backpressure">
    Buffer backpressure indicates sinks can't keep up. Investigate sink bottlenecks before increasing buffer sizes.
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="Edge Agent Setup" icon="server" href="/guides/edge-agent-setup">
    Deploy the LogFleet agent
  </Card>

  <Card title="Custom Metrics" icon="chart-bar" href="/guides/custom-metrics">
    Extract metrics from logs
  </Card>

  <Card title="Troubleshooting" icon="wrench" href="/guides/troubleshooting">
    Debug common issues
  </Card>

  <Card title="Code Examples" icon="code" href="/guides/code-examples">
    Integration examples
  </Card>
</CardGroup>
