Stephane Maarek

In our era of big data, your IT infrastructure may be taxed by the influx of data from a wide variety of sources. On top of that, customers demand to see their data in “real time” without any lag time, so your servers need to process and display data quickly. Apache Kafka is a new technology, developed in 2011, that allows you to do just that.

Apache Kafka is a distributed streaming platform that enables companies to create real-time data feeds. It’s used by companies like Uber, Twitter, Airbnb, Yelp, and over 30% of today’s Fortune 500 companies. For example, by integrating diverse kinds of data such as likes, page clicks, searches, orders, shopping carts, and inventory, Apache Kafka can help feed data in real time into a predictive analytics engine to analyze customer behavior.

Now that Apache Kafka has reached a stable 1.0 version, more companies are adopting the technology as the backbone of their IT infrastructure. Increasingly, CTOs are prioritizing enabling more real-time architecture and reducing the wait time on data availability. Apache Kafka-related questions on Google Search and tech forums like StackOverflow and Github have also skyrocketed in recent years—signaling it as a trending hot topic. Moreover, Apache Kafka is the 4th hottest tech skill in 2018 trending on Udemy.

So what are the benefits of Apache Kafka, why should your company adopt it, and what skills will your IT team need to successfully implement it?

4 Key Benefits of Apache Kafka

As companies deliver an increasing amount of data from different sources (e.g. website, user interactions, financial transactions) to a wide range of target systems (e.g. databases, analytics, email systems), developers have to write integrations for each one. So for example, if you have 4 source systems and 6 target systems, your IT team developers would have to write code for 24 integrations. This is a cumbersome process, not to mention, a slow and error-prone way to deliver data. Here are the four key benefits of using Apache Kafka.

1. Apache Kafka acts as a buffer so your systems won’t crash

Previously, data transformations from external source systems were done in batches often at night. Apache Kafka solves this slow, multi-step process by acting as an intermediary receiving data from source systems and then making this data available to target systems in real time. What’s more, your systems won’t crash because Apache Kafka is its own separate set of servers (called an Apache Kafka cluster).

2. Reduces the need for multiple integrations

Essentially, Apache Kafka reduces the need for multiple integrations–as all your data goes through Apache Kafka. Rather than your developers coding multiple integrations so you can harvest data from different systems, you only have to create one integration with Apache Kafka for each producing system and each consuming system.

3. Low latency and high throughput

By decoupling your data streams, Apache Kafka lets you consume data when you want it. Without the need for slow integrations, Apache Kafka decreases latency (or how long it takes for each data point to load) to a mere 10 milliseconds (~10x decrease or more compared to other integrations). This means you can deliver data quickly and in real time. Apache Kafka can also horizontally scale to hundreds of brokers (or servers) within a cluster to manage big data.

Some companies have a high load of millions of data points per second going through Kafka. For example, Uber uses Kafka to feed car position data into their surge pricing computation model in real time.

4. Everyone can access data

As all your data is centralized in Apache Kafka, access to data for any team becomes easier. For example, in the past, your fraud team may have had to engage with the web team to get a specific type of user data since they were run on different target systems. Now your fraud team will be able to access the user data directly via Apache Kafka, alongside other feeds such as financial data or website interactions. Simple, right?

Apache Kafka skills your IT team needs

Once you understand the benefits and decide to adopt Apache Kafka, your IT team will need to acquire key skills to set up and manage Apache Kafka at your organization. Here are some of the critical skills your team will need.

How to learn, set up, and configure Apache Kafka. Apache Kafka is already built, open source, and free. So it’s more about first acquiring the skills, then setting up Apache Kafka and configuring it for your systems. My course Apache Kafka Series: Learn Apache Kafka for Beginners is a good place for your team to start learning the technology. I cover the Apache Kafka ecosystem, how some target architectures may look like, as well as fundamental concepts of Kafka like topics, partitions, replication, brokers, producers, consumer groups, Zookeeper, delivery semantics, and more. My course also offers hands-on practice so your team can gain some practical experience using Apache Kafka.

Once you’re ready, I recommend my more advanced course that teaches Kafka Cluster Setup and Administration. In addition, I also offer consulting services to help companies design, set up, and configure Apache Kafka.

Java Programming. While client libraries exist to interact with Apache Kafka using most programming languages (Python, Go, Javascript, etc.), Apache Kafka works best when using a Java Virtual Machine type of language, such as Java or Scala. Therefore it is important for your team to be able to code in Java when writing their integrations.

Kafka Streams and Kafka Connect. If you want to simplify integrations, your team will also need some Kafka-specific skills like Kafka Streams and Kafka Connect. These are the more advanced Kafka concepts and frameworks your team will need to build reliable and production ready integrations over time. As a consultant, I usually show how to build one or two integrations, but your team would have to scale that for the rest of the integrations.

Migrating to Apache Kafka: start small

Don’t migrate your whole system to Apache Kafka at once. Instead, start with a small non-critical project. For example, don’t change the backbone of your financial systems, but change something less important such as your email notification system. Second, one of the biggest mistakes I see is companies spend months trying to build a reliable Apache Kafka cluster. Instead, I would recommend starting with managed services or hiring a consultant to set up a small project on Apache Kafka. This enables you to get started right away on the development side and helps make the case for why Apache Kafka is critical for your company. From there, you will be able to scale, onboard more data and projects, and enable your company to react to events in real time more effectively.

Top courses in Apache Kafka

Apache Kafka for absolute beginners
Learning Journal - Prashant Kumar Pandey
4.6 (406)
Apache Kafka Series - Kafka Cluster Setup & Administration
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (1,684)
Bestseller
Apache Kafka Series - Confluent Schema Registry & REST Proxy
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (1,336)
Highest Rated
Apache Kafka Series - Kafka Monitoring & Operations
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (434)
Apache Kafka Series - Learn Apache Kafka for Beginners v2
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (15,382)
Bestseller
Apache Kafka Series - Kafka Streams for Data Processing
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.5 (2,406)
Bestseller
Apache Kafka Series - Kafka Connect Hands-on Learning
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.6 (2,173)
Apache Kafka Series - Kafka Security (SSL SASL Kerberos ACL)
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate, Gerd Koenig
4.6 (729)
Apache Kafka Series - KSQL on ksqlDB for Stream Processing !
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate, Simon Aubury
4.6 (502)
Apache Kafka - Real-time Stream Processing (Master Class)
Learning Journal - Prashant Kumar Pandey
4.6 (383)

More Apache Kafka Courses

Apache Kafka students also learn

Empower your team. Lead the industry.

Get a subscription to a library of online courses and digital learning tools for your organization with Udemy for Business.

Request a demo

Courses by Stephane Maarek

Amazon EC2 Master Class (with Auto Scaling & Load Balancer)
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (1,418)
Bestseller
Apache Kafka Series - Learn Apache Kafka for Beginners v2
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (15,382)
Bestseller
Apache Kafka Series - Confluent Schema Registry & REST Proxy
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (1,336)
Highest Rated
Apache Kafka Series - Kafka Cluster Setup & Administration
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (1,684)
Bestseller
Apache Kafka Series - Kafka Streams for Data Processing
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.5 (2,406)
Bestseller
AWS Lambda and the Serverless Framework - Hands On Learning!
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.5 (3,703)
Bestseller
Apache Kafka Series - Kafka Security (SSL SASL Kerberos ACL)
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate, Gerd Koenig
4.6 (729)
Complete Guide to Protocol Buffers 3 [Java, Golang, Python]
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (1,316)
Bestseller
Apache Kafka Series - Kafka Connect Hands-on Learning
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.6 (2,173)
gRPC [Java] Master Class: Build Modern API & Micro services
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.6 (612)
Highest Rated
AWS CloudFormation Master Class
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (4,991)
Bestseller
gRPC [Golang] Master Class: Build Modern API & Microservices
Stephane Maarek | AWS Certified Solutions Architect & Developer Associate
4.7 (1,222)
Highest Rated

Courses by Stephane Maarek | AWS Certified Solutions Architect & Developer Associate