This 64th edition of the Kafka Monthly Digest covers what happened in the Apache Kafka community in May 2023.
For last month’s digest, see Kafka Monthly Digest: April 2023.
Releases
There are two releases in progress: 3.5.0 and 3.4.1.
3.5.0
The release process for 3.5.0 continued. Due to Kafka Summit London and KAFKA-14980, the stabilization period lasted a bit longer than the original 2 weeks. On May 22, I published 3.5.0 RC0, but a couple of blocker issues (KAFKA-15010, and PR-13748) were identified. I'll publish another RC once they are fixed. You can find the release plan with all the details in the wiki.
3.4.1
Luke Chen published 3.4.1 RC0 on May 17 but KAFKA-14862 introduced a regression in Streams. So Luke published RC1 on May 22 but an issue with licenses was found. RC2 was published on May 24 but again the community found an issue with the backport of KAFKA-14857. Finally Luke published RC3 on May 26. The vote is still on-going. You can find all details in the release plan in the wiki.
Kafka Improvement Proposals
Last month, the community submitted 13 KIPs (KIP-925 to KIP-937, 924 was skipped). I'll highlight a few of them:
- KIP-925: Rack aware task assignment in Kafka Streams: Rack aware partition assignment was added for consumers as part of KIP-881. This KIP proposes a similar mechanism for Streams tasks. In the case of Streams, this could help reduce cross rack traffic which is often costly in term of latency and money.
- KIP-928: Making Kafka resilient to log directories becoming full: Today if the storage volume of a log directory fills up on a broker, the broker shuts down. Once that happens, there are no mechanisms to free up space using Kafka APIs or tooling. This KIP proposes a mechanism to recover by keeping brokers running in a limited mode when they hit an
IOException
due toNo space left on device
so that topic retention policies and topic deletions can still be acted and let administrators free up space. - KIP-932: Queues for Kafka: Queues are a popular abstraction to efficiently distribute a workload consisting of tasks that can be processed concurrently. While in Kafka you can use partitions to distribute work, it comes with a few significant drawbacks such as the coupling between the number of partitions and consumers. This KIP aims at bringing queue semantics and APIs to Kafka. It introduces the concept of "share groups", as an alternative to consumer groups, to have multiple consumers share a partition and allow per-message acknowledgement.
- KIP-934: Add DeleteTopicPolicy and KIP-935: Extend AlterConfigPolicy with existing configuration: Broker side policies allow validating administrative operations such as creating topics and altering configurations. These 2 KIPs intend to extend this feature by supporting topic deletions and providing more details when validating configuration updates.
- KIP-936: Throttle number of active PIDs: Since Kafka 3.0,
KafkaProducer
clients are idempotent by default. For each idempotent producer, brokers keep track of its Producer Id to uniquely identify that instance. For that reason, rapidly creating new producer instances can cause memory pressure on brokers and lead toOutOfMemory
errors. This KIP proposes a new type of quota to limit the rate at which new producer Ids can be created in order to protect brokers.
Community Releases
- strimzi-kafka-operator 0.35: Strimzi is a Kubernetes Operator for running Kafka. It's the last version to support Kubernetes 1.19 and 1.20. StrimziPodSets are now always used and support for StatefulSets has been removed.
Blogs
I selected some interesting blog articles that were published last month:
- A Practical Guide to Build Data Streaming from MySQL to Elasticsearch Using Kafka Connectors
- Is sequential IO dead in the era of the NVMe drive?
- Queues for Kafka
- Tales of Kafka at Cloudflare: Lessons Learnt on the Way to 1 Trillion Messages
To learn more about Kafka, visit Red Hat Developer's Apache Kafka topic page.