使用SQL确定文本字段的字数统计

编程入门 行业动态 更新时间:2024-10-19 22:44:14
本文介绍了使用SQL确定文本字段的字数统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我最近一直在研究某些数据库搜索功能,希望获得一些信息,例如每个文档的平均单词数(例如数据库中的文本字段).到目前为止,我发现的唯一一件事(没有在DB之外进行选择语言的处理)是

I've recently been working on some database search functionality and wanted to get some information like the average words per document (e.g. text field in the database). The only thing I have found so far (without processing in language of choice outside the DB) is:

SELECT AVG(LENGTH(content) - LENGTH(REPLACE(content, ' ', '')) + 1) FROM documents

这似乎可行*,但您还有其他建议吗?我目前正在使用MySQL 4(希望很快将其升级到该应用程序的版本5),但也对通用解决方案感兴趣.

This seems to work* but do you have other suggestions? I'm currently using MySQL 4 (hope to move to version 5 for this app soon), but am also interested in general solutions.

谢谢!

*我可以想象这是确定它的一种很粗糙的方法,因为它也没有考虑内容等中的HTML.对于这个特定的项目来说还可以,但是还有更好的方法吗?

* I can imagine that this is a pretty rough way to determine this as it does not account for HTML in the content and the like as well. That's OK for this particular project but again are there better ways?

更新:要定义更好"的意思是:更准确,执行效率更高或更正确"(易于维护,良好实践等).对于我可用的内容,上面的查询对于该项目来说足够快且准确,但是将来我可能需要类似的内容(所以我问).

Update: To define what I mean by "better": either more accurate, performs more efficiently, or is more "correct" (easy to maintain, good practice, etc). For the content I have available, the query above is fast enough and is accurate for this project, but I may need something similar in the future (so I asked).

推荐答案

MySQL的文本处理功能不足以满足您的需求.存储功能是一个选项,但可能会很慢.在MySQL中处理数据的最佳选择是添加用户定义功能.如果您仍要构建较新版本的MySQL,则还可以添加原生功能.

The text handling capabilities of MySQL aren't good enough for what you want. A stored function is an option, but will probably be slow. Your best bet to process the data within MySQL is to add a user defined function. If you're going to build a newer version of MySQL anyway, you could also add a native function.

正确"的方法是在DB外部处理数据,因为DB是用于存储而不是处理,并且任何繁重的处理都可能给DBMS带来过多的负担.此外,在MySQL外部计算字数使更改字数定义变得更加容易.更改文档后,如何将单词计数存储在数据库中并进行更新?

The "correct" way is to process the data outside the DB since DBs are for storage, not processing, and any heavy processing might put too much of a load on the DBMS. Additionally, calculating the word count outside of MySQL makes it easier to change the definition of what counts as a word. How about storing the word count in the DB and updating it when a document is changed?

示例存储功能:

DELIMITER $$ CREATE FUNCTION wordcount(str LONGTEXT) RETURNS INT DETERMINISTIC SQL SECURITY INVOKER NO SQL BEGIN DECLARE wordCnt, idx, maxIdx INT DEFAULT 0; DECLARE currChar, prevChar BOOL DEFAULT 0; SET maxIdx=char_length(str); SET idx = 1; WHILE idx <= maxIdx DO SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]'; IF NOT prevChar AND currChar THEN SET wordCnt=wordCnt+1; END IF; SET prevChar=currChar; SET idx=idx+1; END WHILE; RETURN wordCnt; END $$ DELIMITER ;

更多推荐

使用SQL确定文本字段的字数统计

本文发布于:2023-11-11 07:14:29,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1577741.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字段   字数   文本   SQL

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!