我们目前正在评估Cassandra作为分析应用程序的数据存储。该计划是在Cassandra中转储原始数据,然后在其上主要运行聚合查询。看看CQL,它似乎不支持一些传统的SQL操作符,如:
- 典型的聚合函数,如平均,总和,
- 我没有找到任何可以帮助达到上述目的的操作在文档中。还检查是否有任何钩子提供诸如扩展的功能。比如在Mongodb中的数据库map-reduce中,或者在关系数据库中的用户定义函数。
人们谈论的是付费的Datastax企业版,这也不是通过纯Cassandra,而是通过Hadoop-Hive-Pig-Hadoop等独立组件实现的。或者有一些建议,因为Cassandra写入速度快,因此在将数据转储到数据库之前,需要进行所需的预聚合。
它看起来像是太多的开销,至少对于基本的东西需要。我缺少一些根本的东西吗?
非常感谢您的帮助。
解决方案在cassandra中作为 CASSANDRA-4914 的一部分,该版本在2.2.0- rc1版本。
We are currently evaluating Cassandra as the data store for an analytical application. The plan was to dump raw data in Cassandra and then run mainly aggregation queries over it. Looking at CQL, it does not seem to support some traditional SQL operators like:
- Typical aggregation functions like average, sum, count-Distinct etc.
- Groupby-having operators
I did not find anything that can help achieve the above in the documentation. Also checked if there were any hooks for providing such functions as extensions. Say like in database map-reduce in Mongodb, or user-defined-functions in Relational DBs.
People do talk about the paid Datastax Enterprise Edition, and that too achieves this not via plain Cassandra, but through separate components like Hadoop-Hive-Pig-Hadoop etc. Or there are suggestions about doing needed pre-aggregations before dumping data to the DB since Cassandra writes are fast.
It looked like too much of overheads, at least for basic stuff we need. Am I missing something fundamental here?
Would highly appreciate help on this.
解决方案Aggregation is available in cassandra as part of CASSANDRA-4914 which is available in the 2.2.0-rc1 release.
更多推荐
Cassandra CQL中的聚合查询
发布评论