我正在使用spark 1.5.2.我需要使用kafka作为流源运行spark流作业.我需要阅读kafka中的多个主题,并对每个主题进行不同的处理.
I am using spark 1.5.2. I need to run spark streaming job with kafka as the streaming source. I need to read from multiple topics within kafka and process each topic differently.
推荐答案
我进行了以下观察,以防它对某人有所帮助:
I made the following observations, in case its helpful for someone:
创建多个流将以两种方式提供帮助:1.您无需应用筛选操作即可以不同方式处理不同的主题. 2.您可以并行读取多个流(与之相比,单个流一个接一个).为此,有一个未记录的配置参数spark.streaming.concurrentJobs*.因此,我决定创建多个流.
Creating multiple streams would help in two ways: 1. You don't need to apply the filter operation to process different topics differently. 2. You can read multiple streams in parallel (as opposed to one by one in case of single stream). To do so, there is an undocumented config parameter spark.streaming.concurrentJobs*. So, I decided to create multiple streams. sparkConf.set("spark.streaming.concurrentJobs", "4");
更多推荐
Spark:并行处理多个Kafka主题
发布评论