Today’s advance distributed software systems must be tested for potential weaknesses and faults. Chaos engineering is the process of testing a distributed computing system to ensure that it can tolerate unexpected disruptions. It relies on concepts underlying chaos theory, which focus on random and unpredictable behavior. If you are interested in knowing more about Chaos Engineering and History please refer this article from Gremlin
In this article we will discuss about various categories of attack and some usecases.
Resource Attack
Generate load across CPU, Memory and Storage devices
Help in preparation for sudden load change, validating auto scaling, test monitoring and alerting config. Its like preparing our system for Black Friday sale in advance.
CPU Attack
CPU attack sends heavy traffic on system which can help to identify stability and performance undrer stress. We can also validate auto scaling and alerting mechanism.
Memory Attack
Memory leak is top reason for "Out Of Memory" in production. Memory leaks happens when application consume more memory resources than release. This attack will help to validate hypothesis for memory intensive work load like in-memory cache, machine learning model. It will also help in cloud migration by simulating auto-scaling configuration.
Disk Attack
Disk attacks are often used to simulate reading or writing a large data set, such as a restored backup, replicated database. It can also help in identifying loop holes in automatic disc cleanup process.
I/O Attack
An IO attack can help you prepare for slower storage solutions by simulating their performance. This attack help to validate disk heavy work load (batch process which read/write from disk) and effectiveness of in-memory cache.
State Attack
State attacks change the state of your environment by terminating processes, shutting down or restarting hosts, and changing the system clock. This lets you prepare your systems for unexpected changes in your environment such as power outages, node failures, clock drift, or application crashes.
Process Killer Attack
Process killer attacks allow teams to terminate a specific process or set of processes. This will ensure watch-dog effectiveness for application/service restart and testing leader re-election in clustered work load.
Shutdown Attack
This is similar to chaos monkey where entire host is shutdown which enable team to build highly resilient system. This will help to validate DR scenarios like automatic work load migration, replication and high availability of clustered workload.
Time Travel Attack
Time travel attacks allow you to change the system clock. This lets you prepare for scenarios such as Daylight Savings Time (DST), clock drift between hosts, and expiring SSL/TLS certificates.
Network Attack
Network attacks let you simulate unhealthy network conditions including dropped connections, high latency, packet loss, and DNS outages. This lets you build applications that are resilient to unreliable network conditions.
Blackhole Attack
Blackhole attacks help you simulate outages by dropping network traffic between services. This lets you uncover hard dependencies, test fallback and failover mechanisms, and prepare your applications for unreliable networks. We can also validate monitoring and alerting mechanism for cluster.
Latency Attack
Latency is the amount of time taken for a network request to travel from one network endpoint to another. The Latency attack injects a delay into outbound network traffic, letting you validate your system’s responsiveness under slow network conditions. This will also help in circuit breaker configuration for retry and timeout threshold.
DNS Attack
Recently we have seen Akmai DNS failure caused many popular becoming un-reachable. More info
here The DNS attack simulates a DNS outage by blocking network access to DNS servers. This lets you prepare for DNS outages, test your fallback DNS servers, and validate DNS resolver configurations.
Packet Loss Attack
This attack is very helpful for streaming services, such as live video or multiplayer gaming which rely on a high throughput of data. When there is network congestion, many packets are queued and some packages may loss due to queue capacity threshold on your hardware. Packet Loss attacks let you replicate this condition and simulate the end user experience and configuration of replay meachanism for better user experience.
In next article we will discuss about other Chaos Engineering concepts.