There’s a part in my internal body clock that worries about Kafka messages, especially production Kafka messages, especially LOSING Kafka messages…. Even when I know that the retention policies work perfectly well and do as they are told I still wake up and worry. If you maintain a Kafka cluster then you’ll understand. When it comes to messages you will do anything to make sure they don’t vanish.
So just to confirm my assumptions and reduce the usage of Nytol, let’s try it out.
Kafka Topic Retention
Message retention is based on time, message size or both of those things. I don’t know the internals of other company’s cluster configuration but time is widely used. Log retention is based on either hours, minutes or milliseconds.
In terms of priority to be actioned by the cluster milliseconds will win, always. You can set all three but the lowest unit size will be used.
retention.ms
is greater than retention.minutes
which is greater than retention.hours
.
Where possible I advise you use retention.ms
and have proper control.
A Prototype Example
- Create a topic with a retention time of 3 minutes.
- Send a message to the topic with an obvious time in the payload.
- Alter the topic configuration and add another 30 minutes of retention time.
- Have a cup of tea.
- Consume the message after the original three minute period and see if it’s still there.
- Celebrate with another cup of tea.
Create a Topic
$ bin/kafka-topics --zookeeper localhost:2181 --create --topic rtest2 --partitions 1 --replication-factor 1 --config retention.ms=180000
Send a Message
Once again, standard tools win here. Just a plain text message being sent to the topic. I typed in the JSON, there’s nothing fancy here.
$ bin/kafka-console-producer --broker-list localhost:9092 --topic rtest2 >{"name":"This is Jase, this was sent at 16:15"}
The message is now in the topic log and will be deleted just after 16:18. But I’m now going to extend the retention period to preserve that message a little longer.
Alter the Topic Retention
With the kafka-configs
command you can inspect any of the topic configs, along with that you can alter them too. So I’m going to alter the retention.ms and set it to 30 minutes (30 * 60 * 1000 = 1,800,000).
$ bin/kafka-configs --alter --zookeeper localhost:2181 --add-config retention.ms=1800000 --entity-type topics --entity-name rtest2 Completed Updating config for entity: topic 'rtest2'.
Have a Cup of Tea
If everything were to go horribly wrong then it’s going to be about now. So a tea is in order.
Check the Topic by Consuming Messages
Running the consumer from the earliest offset should bring back the original message.
$ bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning {"name":"This is Jase, this was sent at 16:15"} Processed a total of 1 messages
Okay that’s worked perfectly well (as expected), let’s try it again because I’m basically paranoid when it comes to these things. I’ll add the date this time for added confirmation.
$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning Fri 3 Apr 16:24:18 BST 2020 {"name":"This is Jase, this was sent at 16:15"} Processed a total of 1 messages
Looking good. And I’m going to do it again because I want to make sure…..
$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning Fri 3 Apr 16:24:50 BST 2020 {"name":"This is Jase, this was sent at 16:15"}
Celebrate Again
The kettle is on. Time for another tea.