问题描述
限时送ChatGPT账号..按照文档中的示例:https://clickhouse.yandex/docs/en/table_engines/kafka/
我使用 Kafka 引擎创建了一个表和一个将数据推送到MergeTree 表的物化视图.
这里是我的表的结构:
CREATE TABLE 游戏(用户 ID UInt32,活动类型 UInt8,金额 Float32,CurrencyId UInt8,日期字符串) 引擎 = Kafka('XXXX.eu-west-1pute.amazonaws:9092,XXXX.eu-west-1pute.amazonaws:9092,XXXX.eu-west-1pute.amazonaws:9092', '游戏', 'click-1', 'JSONEachRow', '3');创建表tests.games_transactions(日期,用户 ID UInt32,金额 Float32,CurrencyId UInt8,时间值日期时间,活动类型 UInt8) 引擎 = MergeTree(day, (day, UserId), 8192);创建物化视图tests.games_consumer到tests.games_transactionsAS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) 作为日期,UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) 作为时间值,活动类型从 default.games;
在 Kafka 主题中,我每秒收到大约 150 条消息.
一切正常,表中更新的部分数据延迟很大,绝对不是实时的.
似乎只有当我达到 65536 条新消息准备在 Kafka 中消费时,数据才会从 Kafka 发送到表中
我应该设置一些特定的配置吗?
我尝试从 cli 更改配置:
SET max_insert_block_size=1048设置 max_block_size=655设置 stream_flush_interval_ms=750
但是没有改善
我应该更改任何特定配置吗?
在创建表之前,我应该更改上述配置吗?
ClickHouse github 上有一个问题- https://github/yandex/ClickHouse/issues/2169.>
基本上你需要设置 max_block_size (http://clickhouse-docs.readthedocs.io/en/latest/settings/settings.html#max-block-size) 之前创建表,否则将无法工作.
我使用了覆盖 users.xml 的解决方案:
<个人资料><默认><max_block_size>100</max_block_size></默认></个人资料></yandex>
我删除了我的表和数据库并重新创建了它们.它对我有用.现在表可能每 100 条记录更新一次.
Following the example from the documentation: https://clickhouse.yandex/docs/en/table_engines/kafka/
I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table.
Here the structure of my tables:
CREATE TABLE games (
UserId UInt32,
ActivityType UInt8,
Amount Float32,
CurrencyId UInt8,
Date String
) ENGINE = Kafka('XXXX.eu-west-1pute.amazonaws:9092,XXXX.eu-west-1pute.amazonaws:9092,XXXX.eu-west-1pute.amazonaws:9092', 'games', 'click-1', 'JSONEachRow', '3');
CREATE TABLE tests.games_transactions (
day Date,
UserId UInt32,
Amount Float32,
CurrencyId UInt8,
timevalue DateTime,
ActivityType UInt8
) ENGINE = MergeTree(day, (day, UserId), 8192);
CREATE MATERIALIZED VIEW tests.games_consumer TO tests.games_transactions
AS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) as day, UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) as timevalue, ActivityType
FROM default.games;
In the Kafka topic I am getting around 150 messages per second.
Everything is fine, a part that the data are updated in the table with a big delay, definitely not in real time.
Seems that the data are sent from Kafka to the table only when I reach 65536 new messages ready to consume in Kafka
Should I set some particular configuration?
I tried to change the configurations from the cli:
SET max_insert_block_size=1048
SET max_block_size=655
SET stream_flush_interval_ms=750
But there was no improvement
Should I change any particular configuration?
Should I have changed the above configurations before to create the tables?
There is an issue for this on ClickHouse github - https://github/yandex/ClickHouse/issues/2169.
Basically you need to set max_block_size (http://clickhouse-docs.readthedocs.io/en/latest/settings/settings.html#max-block-size) before table is created, otherwise it will not work.
I used the solution with overriding users.xml:
<yandex>
<profiles>
<default>
<max_block_size>100</max_block_size>
</default>
</profiles>
</yandex>
I deleted my table and db and recreated them. It has worked for me. Now may tables get updated every 100 records.
这篇关于ClickHouse Kafka 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论