Skip to main content

Overview

LogFleet is designed for high-throughput edge deployments. This guide covers hardware requirements, performance benchmarks, capacity planning, and tuning recommendations for different scales.
All benchmarks were conducted on standard hardware configurations. Your results may vary based on log complexity, network conditions, and workload patterns.

Hardware Requirements

Minimum Requirements (Development/Testing)

For local development and small-scale testing:
ComponentSpecification
CPU2 cores
RAM4 GB
Storage20 GB SSD
Network10 Mbps
# Docker resource limits for development
docker run -d \
  --cpus="2" \
  --memory="4g" \
  logfleet/edge-agent
For production single-location deployments handling typical retail/IoT workloads:
ComponentSpecificationNotes
CPU4 cores (Intel i5/AMD Ryzen 5)Vector benefits from multiple cores
RAM8 GB4 GB for Vector, 2 GB for Loki, 2 GB OS
Storage100 GB NVMe SSDScales with retention period
Network100 MbpsFor metric shipping and on-demand streaming
Expected throughput: 10,000-50,000 logs/second

Production (High-Volume Location)

For high-volume locations (large retail stores, manufacturing floors):
ComponentSpecificationNotes
CPU8 cores (Intel i7/Xeon)Enables parallel processing
RAM16 GBLarger buffers, more concurrent queries
Storage500 GB NVMe SSD30-day retention at high volume
Network1 GbpsBurst capacity for streaming
Expected throughput: 50,000-200,000 logs/second

Enterprise (3-Node Cluster)

For mission-critical deployments requiring high availability:
ComponentPer NodeTotal Cluster
CPU8 cores24 cores
RAM32 GB96 GB
Storage1 TB NVMe3 TB (with replication)
Network10 GbpsDedicated management network
Expected throughput: 500,000+ logs/second with HA

Performance Benchmarks

Log Ingestion Throughput

Measured on recommended single-location hardware (4 cores, 8 GB RAM):
Log SizeThroughputCPU UsageMemory
256 bytes85,000 logs/s65%2.1 GB
512 bytes62,000 logs/s72%2.4 GB
1 KB45,000 logs/s78%2.8 GB
4 KB18,000 logs/s85%3.2 GB

Log-to-Metric Extraction

Vector’s log_to_metric transform performance:
Metrics per LogThroughput ImpactCPU Overhead
1 metric-5%+8%
3 metrics-12%+15%
5 metrics-18%+22%
10 metrics-28%+35%
Keep metric extractions under 5 per log for optimal performance. Use aggregation for high-cardinality data.

Query Latency (Loki)

Query performance on 7-day retention with 50GB data:
Query TypeLatency (p50)Latency (p99)
Simple filter ({service="api"})45ms180ms
Regex match (|~ "error")120ms450ms
JSON parsing (| json)200ms800ms
Aggregation (count_over_time)350ms1.2s
Full-text search500ms2.5s

Network Bandwidth

Metric shipping bandwidth (compressed, to cloud):
LocationsMetrics/minBandwidth
106,00050 KB/s
10060,000500 KB/s
1,000600,0005 MB/s
10,0006,000,00050 MB/s
Log streaming bandwidth (when enabled):
  • Typical: 1-10 MB/s per location
  • Peak: 50-100 MB/s during incident investigation

Capacity Planning

Storage Calculator

Estimate storage requirements based on your workload:
Daily Storage = (logs_per_second × avg_log_size × 86400) ÷ compression_ratio

Where:
- compression_ratio ≈ 5-10x for Loki (typical logs)
- Add 20% overhead for indexes
Example calculations:
Logs/secAvg SizeRetentionRaw DataCompressed
1,000512 B7 days302 GB45 GB
5,000512 B7 days1.5 TB225 GB
10,000256 B14 days2.4 TB360 GB
50,000256 B7 days8.6 TB1.3 TB

Memory Sizing

Minimum RAM = Vector (1.5 GB) + Loki (1 GB) + OS (1 GB) + Buffer (20%)

Recommended RAM = Vector (3 GB) + Loki (3 GB) + OS (2 GB) + Query Cache (2 GB)
Memory scaling guidelines:
ThroughputVectorLokiTotal Recommended
10K logs/s2 GB2 GB6 GB
50K logs/s4 GB4 GB12 GB
100K logs/s6 GB6 GB16 GB
200K+ logs/s8 GB8 GB24 GB

CPU Sizing

Base CPU = 2 cores (Vector) + 1 core (Loki) + 1 core (OS)

Scale factor:
- +1 core per 25K logs/s above baseline
- +1 core per 3 metric extractions
- +2 cores if using complex transforms (grok, VRL scripts)

Tuning Guidelines

Vector Configuration

Optimize Vector for your workload:
# vector.yaml - High throughput configuration
data_dir: /var/lib/vector

# Increase buffer sizes for high volume
buffer:
  type: disk
  max_size: 5368709120  # 5 GB

# HTTP source tuning
sources:
  http_logs:
    type: http_server
    address: "0.0.0.0:8080"
    # Increase for high concurrency
    keepalive:
      max_connection_age_secs: 300
    # Batch incoming requests
    framing:
      method: newline_delimited

# Batch sink writes
sinks:
  loki:
    type: loki
    endpoint: "http://loki:3100"
    batch:
      max_bytes: 10485760  # 10 MB
      max_events: 100000
      timeout_secs: 5
    # Compression
    compression: snappy
    # Request tuning
    request:
      concurrency: 10
      rate_limit_num: 100
      retry_max_duration_secs: 300

Loki Configuration

Optimize Loki for edge deployments:
# loki-config.yaml - Edge optimized
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  # Increase for high query load
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600

ingester:
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  # Tune chunk sizing
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  max_chunk_age: 1h
  chunk_target_size: 1572864  # 1.5 MB
  # Memory optimization
  max_transfer_retries: 0
  wal:
    enabled: true
    dir: /loki/wal

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
  filesystem:
    directory: /loki/chunks

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

limits_config:
  # Ingestion limits
  ingestion_rate_mb: 50
  ingestion_burst_size_mb: 100
  per_stream_rate_limit: 10MB
  per_stream_rate_limit_burst: 30MB
  # Query limits
  max_query_parallelism: 32
  max_query_series: 10000
  max_entries_limit_per_query: 50000
  # Retention
  retention_period: 168h  # 7 days

query_range:
  align_queries_with_step: true
  cache_results: true
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 500

OS-Level Tuning

For high-throughput Linux deployments:
# /etc/sysctl.d/99-logfleet.conf

# Network tuning
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1

# File descriptor limits
fs.file-max = 2097152
fs.nr_open = 2097152

# Memory tuning
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2

# Apply changes
sudo sysctl -p /etc/sysctl.d/99-logfleet.conf
# /etc/security/limits.d/99-logfleet.conf
* soft nofile 1048576
* hard nofile 1048576
* soft nproc 65535
* hard nproc 65535

Monitoring & Alerting

Key Metrics to Monitor

MetricWarningCriticalAction
CPU usage>70%>90%Scale up or reduce transforms
Memory usage>75%>90%Increase RAM or reduce buffers
Disk usage>70%>85%Reduce retention or add storage
Ingestion rate drop>20%>50%Check sources and network
Query latency p99>2s>5sOptimize queries or add cache
Buffer backpressure>50%>80%Scale sink capacity

Vector Metrics Endpoint

# Enable internal metrics
sources:
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 15

sinks:
  prometheus:
    type: prometheus_exporter
    inputs: ["internal_metrics"]
    address: "0.0.0.0:9598"
Key Vector metrics:
  • vector_component_received_events_total - Ingestion rate
  • vector_buffer_events - Buffer pressure
  • vector_component_sent_events_total - Output rate
  • vector_component_errors_total - Error rate

Loki Metrics

Loki exposes Prometheus metrics at /metrics: Key Loki metrics:
  • loki_ingester_chunks_stored_total - Storage growth
  • loki_request_duration_seconds - Query latency
  • loki_ingester_memory_chunks - Memory pressure
  • loki_distributor_bytes_received_total - Ingestion rate

Sample Prometheus Alerts

groups:
  - name: logfleet
    rules:
      - alert: HighCPUUsage
        expr: avg(rate(process_cpu_seconds_total[5m])) > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.15
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Disk space below 15% on {{ $labels.instance }}"

      - alert: IngestionDrop
        expr: rate(loki_distributor_bytes_received_total[5m]) < 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Log ingestion dropped significantly"

      - alert: HighQueryLatency
        expr: histogram_quantile(0.99, rate(loki_request_duration_seconds_bucket[5m])) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Query latency p99 > 5s"

Scaling Strategies

Vertical Scaling

When to scale up a single node:
SymptomSolution
CPU consistently >80%Add cores or upgrade CPU
Memory pressure / OOMAdd RAM, reduce buffers
Disk I/O bottleneckUpgrade to NVMe, add RAID
Query timeoutsAdd RAM for cache, faster storage

Horizontal Scaling (Multi-Node)

When to deploy a cluster:
  • High availability requirement - Deploy 3+ nodes with replication
  • Throughput >200K logs/s - Distribute ingestion load
  • Multi-tenant isolation - Separate workloads
  • Geographic distribution - Regional edge clusters
# Example 3-node cluster topology
Node 1 (Ingester):
  - Vector (primary)
  - Loki Ingester

Node 2 (Ingester):
  - Vector (replica)
  - Loki Ingester

Node 3 (Query):
  - Loki Querier
  - Grafana

Best Practices

Start with recommended specs and monitor for 2 weeks before scaling. Over-provisioning wastes resources; under-provisioning causes data loss.
Loki’s write patterns require fast random I/O. NVMe SSDs provide 10-100x better performance than spinning disks.
Always configure retention limits to prevent disk exhaustion. Ring buffer semantics ensure oldest logs are deleted first.
Configure Vector sinks to batch writes. Larger batches reduce network overhead and improve throughput.
High-cardinality labels (user IDs, request IDs) explode storage. Use log fields for high-cardinality data, labels for low-cardinality.
Buffer backpressure indicates sinks can’t keep up. Investigate sink bottlenecks before increasing buffer sizes.

Next Steps