This 70th edition of the Kafka Monthly Digest covers what happened in the Apache Kafka community in November 2023.
For last month’s digest, see Kafka Monthly Digest: October 2023.
Releases
There are 3 releases in progress, 3.7.0, 3.6.1 and 3.5.2:
3.7.0
The release process for Kafka 3.7.0 continued and KIP freeze happened on November 22. The next milestone is feature freeze on December 6. The release date is still targeted for January 2023. You can find the release plan in the wiki.
3.6.1
On November 13, I volunteered to run the 3.6.1 bugfix release. This updates the ZooKeeper and Netty dependencies to address CVEs and also contains over 20 fixes. I published RC0 on November 24, the vote is currently ongoing. For more details, check the release plan on the wiki.
3.5.2
Luke Chen volunteered to run the 3.5.2 bugfix release. In addition of updating a few dependencies to address CVEs this will also contain a few fixes. The first release candidate was published on November 21, the vote is currently ongoing. You can find the release plan on the wiki.
Kafka Improvement Proposals
Last month, the community submitted 13 KIPs (KIP-997 to KIP-1009). Crossing the 1000 KIP mark is a significant milestone. Since KIP-1 in January 2015, that's almost 10 KIPs per month on average for 9 years! I'll highlight a few of the KIPs created in November:
- KIP-1004: Enforce tasks.max property in Kafka Connect: The
tasks.max
configuration allows users to specify the maximum number of tasks they want to run for a connector. However Connect does not currently force connectors to respect this value. While pretty much all connectors follow this rule (and it's usually a bug when they don't), this KIP proposes adding a new configuration,tasks.max.enforce
which when set totrue
will fail connectors creating more tasks thantasks.max
. - KIP-1008: ParKa - the Marriage of Parquet and Kafka: Kafka stores data on disk in its own binary format. Producers and consumers use the exact same format and this enables brokers to handle data very efficiently. This KIP proposes using Apache Parquet as the storage format. The motivation is that Parquet is a column-oriented data file format so it could provide better a compression ratio and Parquet's built-in column encryption could be used to provide field-level encryption.
There are also quite a few KIPs about Tiered Storage including:
- KIP-1002: Fetch remote segment indexes at once: This KIP's goal is to improve the
RemoteStorageManager
API to allow a more efficient handling of indexes. As these files tend to be small being able to upload/retrieve them at once would improve latencies in many scenarios. - KIP-1003: Signal next segment when remote fetching: When consumers want to read data that has been moved to remote storage, log segments first have to be retrieved by brokers. This KIP aims at making it easier for
RemoteStorageManager
implementations to determine the next segments to load and allow prefetching them to improve latency.
Community Releases
- Sarama 1.42: Sarama is a pure Golang Kafka client. There are no new major features but this release fixes a number of bugs across all clients.
- Jikkou 0.32: Jikkou is a tool to deploy and manage Kafka clusters and resources like topics, quotas and consumer groups. This new release adds support for restarting connectors and deleting consumer group offsets.
- spring-kafka 3.1.0: This version improves the handling of unprocessable records and also fixes a handful of bugs.
Blogs
I selected some interesting blog articles that were published last month:
- Defense Against the Dark Art of Rebalancing in Kafka Streams
- Developing Kafka client applications: A simple consumer
- A Deep Dive Into Sending With librdkafka
- Creating a data warehouse with Apache Doris, Kafka, and Debezium
To learn more about Kafka, visit Red Hat Developer's Apache Kafka topic page.