For the majority of users the defaults are there and they kind of work, your messages are small and there’s enough volume on the box to be able to relax. If you are working on local development then there’s a good chance you don’t even consider such things, once things go live though then it’s a different matter.
Message Retention
There are two methods that are available to you for setting retention of messages on Kafka, firstly by time a message is in the log and then by log size.
log.retention.hours, log.retention.minutes, log.retention.ms
Yes there are three but they all do the same thing. How long messages are retained in the log by time. The default is 168 hours (which is seven days). You can use either hours, minutes or milliseconds as they all set the same thing. If more than one setting is present then the lowest unit size is used.
log.retention.bytes
You can retain messages expressed as a the total number of bytes of the messages in the log. The retention bytes is set and applies to per partition so a topic with three partitions and a log.retention.bytes of 1GB is 3GB bytes retained at the very most. If you increased the partition count by one on the topic for example then the retention bytes will then increase to 4GB.
The two types of log retention, size and time, can be used together. If both are set then messages are removed when the either of the settings are satisfied. If you have a retention time of 1 day and 2GB retention in size then the log rules will be applied if you have over 2GB before the one day period is up.
message.max.bytes
Producers are limited to the size of messages they can produce. The default is 1mb in size, if a producer is sent a message over that then it will not be accepted. The setting refers to the compressed size of the message, so the message itself can be over the set size unpressed.
While you can set Kafka to use larger message sizes this does have performance impact across the network and I/O throughput. So it’s worth sitting down with a pen and paper (or a spreadsheet) to gauge the average message sizes and adjust the settings accordingly.