Apache Kafka Use cases

- March 14, 2021

Kafka was created first in the tech labs of LinkedIn – the world’s biggest network of professionals. Apache Kafka is the most popular open-source stream-processing software for collecting, processing, storing, and analyzing data at scale. Most known for its excellent performance, low latency, fault tolerance, and high throughput, it's capable of handling thousands of messages per second.

Website activity tracking(E commerce)

Website activity (product views, product searches) is published to customer activity topics and becomes sourced to real-time processing, offline analytics to tool like Google’s BigQuery, Azure CosmosDB, RedShift.

Real time Fraud detection

Juniper Research estimates online FDP (fraud detection and prevention) spending is all set to touch $9.3 billion by 2022. Real-time data streaming helps to spot anomalies – that is peculiar and abnormal instances that deviate from the usual way of operation. These anomalies can be classified as fraud or errors. Apache Kafka helps in large volume data ingestion. It is easily scalable which helps manage any volume of data for real-time fraud detection.

Online Gaming

A key challenge in infrastructure monitoring for online gaming is the required elasticity. Kafka allows for the number of partitions to be increased on the fly, so more Brokers can be added to handle extra partitions for a Topic. Each additional Broker and partition brings a performance increase. Kafka also maintains replicas of each partitions data across the nodes.

Storage for back pressure and decoupling of services

It is not just a messaging system! Kafka also stores data as long as you want. For instance, a Kafka topic is stored for a few days for log analytics. Another Kafka topic is stored for years to analyze customer and payment transactions of the past.

Log aggregation

Log aggregation systems typically collects physical log files from servers and puts them in a central server (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of logs or event data as a stream of messages. Application can push data on specific Kafka topics which can be aggregated and processed later by consumer.

Stream processing

Apache Kafka is so good for real-time data streaming that it even used by giant corporations like Twitter, Uber, Pinterest, Netflix, Tumblr, PayPal and many others. Many users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and then aggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing.

Commit Log

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage.

More information and some exciting use cases can found on official website, Click Here.

Search This Blog

Knowledge Cafe

Observability Done Right: Best Practices and Anti-Patterns for Effective System Monitoring

Apache Kafka Use cases

Popular posts from this blog

Chain of responsibility using Spring @Autowired List

Iterate Through a HashMap

Under the Hood: Understanding the Gossip Protocol in Apache Cassandra