Cassandra:差异b / w TEXT(VARCHAR)和ASCII(Cassandra: Difference b/w TEXT(VARCHAR) and ASCII)

编程入门 行业动态 更新时间:2024-10-27 04:35:58
Cassandra:差异b / w TEXT(VARCHAR)和ASCII(Cassandra: Difference b/w TEXT(VARCHAR) and ASCII)

我理解text和varchar是别名,它们存储UTF-8字符串。 怎么样的ASCII,在文档中说“US-ASCII字符串”? 除了编码之外有什么区别?

有任何尺寸差异吗? 当我存储大字符串(~500KB)时,这两者之间是首选吗?

I understand that text and varchar are aliases, which store UTF-8 strings. What about ASCII, which in the documentation says "US-ASCII character string"? What's the difference besides encoding?

Is there any size difference? Is the a preferred choice between these two when I'm storing large strings (~500KB)?

最满意答案

关于这个anwer :

如果数据是一段文本,例如Java中的String,它在运行时以UTF-16编码,但在使用文本类型在Cassandra中序列化时,则使用UTF-8。 UTF-16总是每个字符使用2个字节,有时使用4个字节,但UTF-8是空间有效的,并且取决于字符长度可以是1,2,3或4个字节。

这意味着有CPU工作来序列化这些数据以进行编码/解码。 同样取决于例如158786464563的文本,数据将以12个字节存储。 这意味着使用更多空间和更多IO。

注意cassandra提供遵循US-ASCII字符集的ascii类型,并且每个字符始终使用1个字节。


有任何尺寸差异吗?

当我存储大字符串(~500KB)时,这两者之间是首选吗?

因为ascii比UTF-8更节省空间,UTF-8比UTF-16更节省空间。 同样,所有事情都取决于您如何序列化/编码/解码这些数据。 如需更多退房,请选择“ascii-encoding-over-utf-8是什么 ”

Regarding this anwer:

If the data is a piece of text, for example a String in Java, which is encoded in UTF-16 in the runtime, but when serialized in Cassandra with text type then UTF-8 is used. UTF-16 always use 2 bytes per character and sometime 4 bytes, but UTF-8 is space efficient and depending on the character can be 1, 2, 3 or 4 bytes long.

That mean that there's CPU work to serialize such data for encoding/decoding purpose. Also depending on the text for example 158786464563, data will be stored with 12 bytes. That means more space is used and more IO as well.

Note cassandra offers the ascii type that follows the US-ASCII character set and is always using 1 byte per character.


Is there any size difference?

Yes

Is the a preferred choice between these two when I'm storing large strings (~500KB)?

Yes

Because ascii is more space efficient than UTF-8 and UTF-8 is more space efficient than UTF-16. Again all of the things depends how you are serializing/encoding/decoding those data. For more check-out this "what-is-the-advantage-of-choosing-ascii-encoding-over-utf-8"

更多推荐

本文发布于:2023-07-16 00:06:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1120966.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:差异   TEXT   Cassandra   Difference   ASCII

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!