Straight Chaos injects real faults into your stack — latency, packet loss, partitions, kills — straight at the network layer with eBPF. Find the failure before your users do.
Pre-built fault scenarios that map to how production actually breaks — applied at the layer where it hurts.
Add 50ms or 5s of delay to any flow between services. Catch the timeouts and retries that cascade into outages.
tc / eBPFDrop, duplicate, or scramble packets on a target interface. Prove your retransmit logic actually works.
XDP dropSplit a cluster in two and watch the split-brain. The classic distributed-systems nightmare, on demand.
conntrackPin CPU, eat memory, or hard-kill a pod mid-request. Test the autoscaler and the on-call runbook at once.
cgroup v2Drop the agent in, define a blast radius, and run. It watches your steady-state metrics and aborts the second things go past the line you drew.
Define your steady-state SLOs. The moment an experiment pushes a metric past threshold, the fault is rolled back automatically — no manual kill switch sprint.
Scope every experiment to a percentage of traffic, a single AZ, or one service. Start at 1% in prod and ramp with confidence.
Turn ad-hoc fire drills into recurring, scored experiments. Track resilience as a number that goes up over time.
Fail the pipeline when a new build can't survive a known fault. Resilience regressions caught before they ship.
Every fault, who ran it, what it touched, and how the system responded — timestamped and exportable for the post-mortem.
Faults run in the kernel, not a proxy. Microsecond precision, near-zero overhead, and access to layers app-level tools can't reach.
Spin up your first experiment in minutes. Free for solo engineers and small teams.