一些关于Kafka的理解,和踩到的坑。
Kafka简述
(待补)
遇到的问题
Leader election during rolling update.
Observation
Kafka cluster在rolling update的时候收到request,会返回一个exception
:
1 | org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Going to request metadata update now |
Kafka log:
1 | WARN [org.apache.kafka.clients.producer.internals.Sender] [Producer clientId=<client-id>] Got error produce response with correlation id <id> on topic-partition <partition-id>, retrying (9 attempts left). Error: NOT_LEADER_FOR_PARTITION |
求:如何避免?如何解决?
Analysis
通过触发底层配置更新,Kafka cluster会进行rolling update。在update过程中,当roll到原先的leader时,集群会出现一个没有leader、需要重新选举的状态。
在这个进行选举的时间间隙里面,整个集群确实是一个无leader状态,所以会报错。
Proposal
Kafka提供了集群内部的retry机制,调用方法是配置ProducerConfig.RETRIES_CONFIG
和ProducerConfig.RETRY_BACKOFF_MS_CONFIG
。附上官方JavaDoc:
1 | private static final String RETRIES_DOC = "Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error." |
除了集群内部的retry,正如上文JavaDoc所说,还可以在client端进行resend。
Other issue
其实这件事还没有解决,即便是internal retry + client resend,还是有可能会出现报错导致request写不进去的情况,目前解决办法是。。。继续增加客户端resend的次数!
It is Likely That The Consumer Was Kicked Out Of The Group
Observation
Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group
先贴几个别人的文章
KAFKA Says: It is Likely That The Consumer Was Kicked Out Of The Group | Hacker Noon
kafka 0.10.1一些使用经验 - 简书 (jianshu.com)
INVALID_FETCH_SESSION_EPOCH
Observation
一台新装好的机器,放个一两天之后就用不了了,报错如下
[org.apache.kafka.clients.FetchSessionHandler] [Consumer clientId=xxxxxxxxxxxxxxxxxx-a2c15833-af73-4c69-a515-652d42fa6da1-StreamThread-1-consumer, groupId=xxxxxxxxxxxxxxxxxxxx] Node 1 was unable to process the fetch request with (sessionId=866458856, epoch=3657): INVALID_FETCH_SESSION_EPOCH.
Analysis
网上一堆互相复制粘贴的博文说直接升版本。垃圾信息。