使用案例
当用户访问我的网站时,他们将面临一个类似SO的搜索框。他们可以使用计划文本搜索结果。 问题,封闭的问题,和java等。搜索的功能有点不同,因为它会尝试尽可能多的数据库模式,而不是一个直全文搜索。所以问题只会搜索问题,而不是答案(可能不适用于SO case,这里只是一个例子),封闭问题将返回关闭的问题,和java问题将返回与和java等相关的问题。
When a user goes to my website, they will be confronted with a search box much like SO. They can search for results using plan text. " questions", "closed questions", " and java", etc.. The search will function a bit different that SO, in that it will try to as much as possible of the schema of the database rather than a straight fulltext search. So " questions" will only search for questions as opposed to answers (probably not applicable to SO case, just an example here), "closed questions" will return questions that are closed, " and java" questions will return questions that relate to and java and nothing else.
问题
我不太熟悉的话,但我基本上想做一个关键字到SQL驱动的搜索。我知道数据库的模式,我也可以数据库数据库。我想知道任何现有的方法存在之前,我试图实现这一点。我想这个问题是针对所述问题的好设计。
I'm not too familiar with the words but I basically want to do a keyword to SQL driven search. I know the schema of the database and I also can datamine the database. I want to know any current approaches there that existing out already before I try to implement this. I guess this question is for what is a good design for the stated problem.
建议
我建议的解决方案到目前为止看起来像这样
My proposed solution so far looks something like this
想法/建议/链接?
推荐答案我运行一个数字音乐商店,搜索,根据关键字的出现次数和产品出现的模式对关键字进行加权,例如。具有不同的列,例如艺术家,标题或出版商。
I run a digital music store with a "single search" that weights keywords based on their occurrences and the schema in which Products appear, eg. with different columns like "Artist", "Title" or "Publisher".
产品也与相册和播放列表相关,但为了更简单的说明,
Products are also related to albums and playlists, but for simpler explanation, I will only elaborate on the indexing and querying of Products' Keywords.
关键字 code> table - 每个可能被搜索的每个单词的加权表(因此,它被引用到某处),每个记录具有以下数据:
Keywords table - a weighted table for every word that could possibly be searched for (hence, it is referenced somewhere) with the following data for each record:
- 关键字ID(不是单词),
- 词本身
- li>
- 重量
ProductKeywords 对于每个产品字段(或列)中每个记录引用的每个关键字,每个记录都有以下数据:
ProductKeywords table - a weighted table for every keyword referenced by any of a product's fields (or columns) with the following data for each record:
- 产品ID, / li>
- 关键字ID,
- 权重
加权值表示字词出现的频率。具有较低权重的匹配关键字更独特,并且更有可能是正在搜索的关键字。以这种方式,经常出现的词语被自动向下加权,例如。 the,a或I。
The weighting value is an indication of how often the words occurs. Matching keywords with a lower weight are "more unique" and are more likely to be what is being searched for. In this way, words occurring often are automatically "down-weighted", eg. "the", "a" or "I". However, it is best to strip out atomic occurrences of those common words before indexing.
我使用整数进行加权,但使用十进制值将提供更多的通用性,可能与。
I used integers for weighting, but using a decimal value will offer more versatility, possibly with slightly slower sorting.
每当任何产品字段更新时,艺术家或标题(不会经常发生),数据库触发器在事务中重新索引产品的关键字:
Whenever any product field is updated, eg. Artist or Title (which does not happen that often), a database trigger re-indexes the product's keywords like so inside a transaction:
查询
Querying
更多推荐
SQL搜索的关键字
发布评论