Skip to content

Performance Overhead and Benchmarks

Relevant source files

The following files were used as context for generating this wiki page:

eCapture utilizes eBPF uprobes and Traffic Control (TC) filters to intercept plaintext data from userspace libraries and kernel network buffers. This page details the performance model, benchmarking methodology, and tuning strategies for production deployments.

Performance Model

The total overhead of eCapture is the sum of kernel-space interception costs and userspace processing costs.

1. Interception Overhead (Kernel-space)

When a probe is attached to a function (e.g., SSL_read in OpenSSL), the following sequence occurs:

  • Uprobe Entry/Exit: The CPU executes a trap, context switches to the kernel, and executes the eBPF program. This typically costs 1-2μs per event docs/performance-benchmarks.md:9.
  • eBPF Program Execution: The logic within the C program (e.g., PID filtering, data copying). This is highly optimized and costs approximately 0.1-0.5μs per event docs/performance-benchmarks.md:10.
  • Data Copying: eBPF programs use bpf_probe_read or bpf_core_read to copy data from userspace memory into BPF maps or buffers kern/tc.h:22-28.

2. Data Transport and Userspace Cost

Data Flow and Entity Mapping

The following diagram illustrates the relationship between kernel-space BPF entities and userspace processing components.

Diagram: Performance Data Path

Sources: kern/tc.h:58-63, kern/ecapture.h:127-146, docs/performance-benchmarks.md:151, docs/example-outputs.md:39


Benchmarking Methodology

To evaluate the impact on a production system, eCapture recommends using wrk against a target HTTPS server.

Metrics for Evaluation

MetricDescriptionTool
Throughput ImpactReduction in Requests Per Second (RPS)wrk
Latency PenaltyIncrease in p99/p50 response timeswrk --latency
CPU OverheadCPU consumption of the target process vs eCapturepidstat -p <pid> 1
Event Loss RateFrequency of "lost X events" in eCapture logsecapture stdout

Sources: docs/performance-benchmarks.md:77-85

Benchmark Execution Flow

Sources: docs/performance-benchmarks.md:46-73


Tuning and Optimization

1. Filtering (Scope Reduction)

The most effective way to reduce overhead is to minimize the number of events processed by the kernel.

  • PID/UID Filtering: Use --pid or --uid to ensure passes_filter() returns false for irrelevant processes, preventing data copying to the perf buffer kern/ecapture.h:127-146.
  • Cgroup Filtering: Available on kernels >= 4.18 via target_cgroup_id kern/common.h:73.

2. Perf Buffer Tuning

If the userspace reader cannot keep up with the kernel, events are dropped.

3. Event Reordering

Under high load, events from different CPUs may arrive out of order in userspace. eCapture implements a perfLagReorder mechanism to buffer and sort events by monotonic timestamp before processing internal/probe/base/perf_reorder_test.go:59-77.


Production Performance Checklist

Before deploying eCapture in a high-traffic production environment, verify the following:

  1. Kernel Support: Ensure kernel is >= 5.8 to use CAP_BPF and CAP_PERFMON for more granular resource control docs/minimum-privileges.md:7-16.
  2. Scope: Always use the --pid flag to isolate the capture to the specific application under investigation docs/performance-benchmarks.md:154.
  3. Output Mode: Prefer keylog mode for long-term monitoring; use text or pcapng only for short-term debugging docs/performance-benchmarks.md:155.
  4. Monitoring: Monitor eCapture's own CPU usage and log output for "lost events" messages docs/performance-benchmarks.md:80-84.

Sources: docs/performance-benchmarks.md:141-158, docs/minimum-privileges.md:29-34

Performance Overhead and Benchmarks has loaded