对于Kafka Connect连接器或连接器任务失败或遇到错误的情况,是否有任何警报选项?
Are there any alerting options for scenarios where a Kafka Connect Connector or a Connector task fails or experiences errors?
我们正在运行Kafka Connect,它运行良好,但是我们有一些错误需要手动跟踪和发现.通常,在人们注意到问题之前,它一直处于错误状态一周.
We have Kafka Connect running, it runs well, but we've had errors that need to be manually traced and discovered. And often, it's been in an error state for a week before a human notices a problem.
推荐答案自从撰写/回答了这篇文章以来,Kafka Connect开始提供自己的官方指标.Apache Kafka Connect以旧版JMX格式提供指标.
Since this post was written/answered, Kafka Connect began providing its own official metrics. The Apache Kafka Connect provides metrics in legacy JMX format.
如果您使用Confluent Kafka Connect舵表( github/confluentinc/cp-helm-charts/tree/master/charts/cp-kafka-connect ),其中包括Prometheus指标导出器.
If you use the Confluent Kafka Connect Helm Charts (github/confluentinc/cp-helm-charts/tree/master/charts/cp-kafka-connect), they include a Prometheus metrics exporter.
我从Confluent Helm Chart Prometheus图表中的 cp_kafka_connect_connect_connect_connector_metrics {status ="running"} 进行监视和警报,但是有很多变化.
I monitor and alert on cp_kafka_connect_connect_connector_metrics{status="running"} from the Confluent Helm Chart Prometheus chart, but there are many variations to that.
通常,对于任何自动监视+警报设置,通常都首选使用官方的Kafka Connect指标.写这篇文章并回答时,此选项无法使用.
Using the official Kafka Connect metrics is generally preferable for any automated monitoring + alerting setup. This option wasn't available back when this post was written + answered.
仅供参考,Kafka仍然没有公开滞后指标,因此您仍然需要第三方选项来监视和提醒滞后.
FYI, Kafka still doesn't expose lag metrics, so you still need third party options to monitor and alert on lag.
更多推荐
Kafka Connect警报选项?
发布评论