So ask your Kafka developer: set compression.type=snappy and tell me what happens. Trying it and measuring the benefit has become a no-brainer. In my experience these are edge cases and the majority of pipelines can benefit from compression. Additionally, compression adds some pipeline latency, and it is ineffective to recompress if the data is already compressed. If the data has more than one consumer, then sending compressed data to each consumer results in further egress savings. However the compute overhead to compress and decompress, on balance, must be less than the cost to store the data with multiple replicas (including the network traffic for those replicas). I added " almost" above because there are some considerations: it is not entirely free of consequences. So: Kafka System owners: you can halve the storage and network usage on your brokers by turning on built-in compression. Four compression algorithms are built in: snappy, gzip, lz4 and zstd. The default compression bit is none - so the consumer knows it doesn't need to decompress the batch. Compression is transparent to apps: all that you have to do is to change the configuration property compression.type from none to another value. It adds a bit at the start of the batch so that the consumer knows which compression algorithm to use when it receives the batch. Here is how it works: the producer client batches up a bunch of messages before compressing the batch. This is a configuration file change: you do not need to make any code changes. When compression.type is not set to none, then compression happens transparently with no impact on how data flows through your pipelines. kafka-python is best used with newer brokers (0.9+), but is backwards-compatible with older versions (to 0.8.0). kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators). In my opinion there is almost no reason not to use compression. Python client for the Apache Kafka distributed stream processing system. Producer compression saves you money: if you are using fully-managed Confluent Cloud then it could dramatically reduce the ingress MB/sec for your producers. In almost all cases this is wasting resources: storage, network and compute: if you care about efficiency of energy and materials, or if you would simply like a lower %-Used for your filesystems, then I'm about to do my good deed for the day, so read on.Ĭompression and decompression was added to the Kafka client almost a decade ago, in Sept 2013 (v0.81) but remains widely under-used. I frequently encounter Kafka systems running with producers configured with compression.type =none.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |