我正在使用POC来展示Cassandra的工作原理。我以Digg为例。我想创建一个数据模型,让我:
I'm working on a POC to showcase how Cassandra works. I took Digg as an example. I wanted to create a data model that'll let me:
1)添加链接 2)添加一个链接到用户收藏列表。 3)为链接附加预定标签
1) Add links 2) Add a link to a user favorite list. 3) Attached predetermined tags to links
我想出了两个列族:
链接
Links
- url是关键字
- id(a generated uuid)
- 用户(添加者)
- favCount $ b
- upCount(不喜欢该用户的用户数)
- downCount(不喜欢该用户的用户数)
- url is the key
- id (a generated uuid)
- user (who added it)
- favCount (no of users who favorited the link)
- upCount (no of users who liked it)
- downCount (no of users who disliked it)
UserFavs
- b $ b
- id(与用户已收藏的许多ID相同)
这适用于上面的要求#1和#2,但是当我来到#3它变得更加棘手。我可以添加像java,languages,architecture这样的标签作为列名,在Links列族中有空值。但是查询将需要很长时间,让我们说如果我找到所有标记在java下的链接。
This works fine for requirements #1 and #2 above, but when I come to #3 it gets trickier. I can add tags like 'java', 'languages', 'architecture' as column names with empty values in the Links column family. But querying will take a long time, let's say if I were to find out all the links that were tagged under 'java'.
任何人都可以提出一些如何实现的想法。
Can anyone throw some ideas of how this can be implemented.
如果我不清楚
感谢您,Kumar
推荐答案您可以创建辅助索引,即以标签为关键字的列族。每行包含该特定标记的所有链接。注意,这可能导致非常宽的行(即具有许多列),其中每一行将被存储在单个cassandra节点上。
You could create a secondary index, i.e. a column family keyed on tag. Each row contains all the links for that particular tag. Note that this may result in very wide rows (i.e. with many columns) each of which will be stored on a single cassandra node. You might want a scheme to split these up if they get very large.
请参阅 www.datastax/docs/0.7/data_model/cfs_as_indexes
或 pkghosh.wordpress/2011/03/02/cassandra -secondary-index-patterns /
或google cassandra二级索引
更多推荐
Cassandra数据模型
发布评论