我怎样截断一个字符串最多有N个字符?(How can I truncate a string to have at most N characters?)

编程入门 行业动态 更新时间:2024-10-23 22:24:48
我怎样截断一个字符串最多有N个字符?(How can I truncate a string to have at most N characters?)

String.truncate(usize)的预期方法失败,因为它不考虑Unicode字符(考虑到Rust将字符串视为Unicode,这是令人困惑的)。

let mut s = "ボルテックス".to_string(); s.truncate(4);

线程''恐慌'断言失败:self.is_char_boundary(new_len)'

此外, truncate修改原始字符串,这并不总是需要。

我提出的最好的方法是转换为char并收集到String 。

fn truncate(s: String, max_width: usize) -> String { s.chars().take(max_width).collect() }

例如

fn main() { assert_eq!(truncate("ボルテックス".to_string(), 0), ""); assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ"); assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス"); assert_eq!(truncate("hello".to_string(), 4), "hell"); }

然而这感觉非常沉重。

The expected approach of String.truncate(usize) fails because it doesn't consider Unicode characters (which is baffling considering Rust treats strings as Unicode).

let mut s = "ボルテックス".to_string(); s.truncate(4);

thread '' panicked at 'assertion failed: self.is_char_boundary(new_len)'

Additionally, truncate modifies the original string, which is not always desired.

The best I've come up with is to convert to chars and collect into a String.

fn truncate(s: String, max_width: usize) -> String { s.chars().take(max_width).collect() }

e.g.

fn main() { assert_eq!(truncate("ボルテックス".to_string(), 0), ""); assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ"); assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス"); assert_eq!(truncate("hello".to_string(), 4), "hell"); }

However this feels very heavy handed.

最满意答案

确保你阅读并理解了delnan的观点 :

Unicode很复杂。 您确定要将char (对应于代码点)作为单元而不是字形集群吗?

这个答案的其余部分假设你有充分的理由使用char而不是graphemes

考虑到Rust将字符串视为Unicode,这令人费解

这是不正确的; Rust将字符串视为UTF-8 。 在UTF-8中,每个代码点都映射到可变数量的字节。 没有O(1)算法将“6个字符”转换为“N个字节”,所以标准库不会将其隐藏起来。

您可以使用char_indices按字符char_indices字符串并获取该字符的字节索引:

fn truncate(s: &str, max_chars: usize) -> &str { match s.char_indices().nth(max_chars) { None => s, Some((idx, _)) => &s[..idx], } } fn main() { assert_eq!(truncate("ボルテックス", 0), ""); assert_eq!(truncate("ボルテックス", 4), "ボルテッ"); assert_eq!(truncate("ボルテックス", 100), "ボルテックス"); assert_eq!(truncate("hello", 4), "hell"); }

这也会返回一个片段,您可以根据需要选择移入新分配,或者在适当位置更改String :

// May not be as efficient as inlining the code... fn truncate_in_place(s: &mut String, max_chars: usize) { let bytes = truncate(&s, max_chars).len(); s.truncate(bytes); } fn main() { let mut s = "ボルテックス".to_string(); truncate_in_place(&mut s, 0); assert_eq!(s, ""); }

Make sure you read and understand delnan's point:

Unicode is freaking complicated. Are you sure you want char (which corresponds to code points) as unit and not grapheme clusters?

The rest of this answer assumes you have a good reason for using char and not graphemes.

which is baffling considering Rust treats strings as Unicode

This is not correct; Rust treats strings as UTF-8. In UTF-8, every code point is mapped to a variable number of bytes. There's no O(1) algorithm to convert "6 characters" to "N bytes", so the standard library doesn't hide that from you.

You can use char_indices to step through the string character by character and get the byte index of that character:

fn truncate(s: &str, max_chars: usize) -> &str { match s.char_indices().nth(max_chars) { None => s, Some((idx, _)) => &s[..idx], } } fn main() { assert_eq!(truncate("ボルテックス", 0), ""); assert_eq!(truncate("ボルテックス", 4), "ボルテッ"); assert_eq!(truncate("ボルテックス", 100), "ボルテックス"); assert_eq!(truncate("hello", 4), "hell"); }

This also returns a slice that you can choose to move into a new allocation if you need to, or mutate a String in place:

// May not be as efficient as inlining the code... fn truncate_in_place(s: &mut String, max_chars: usize) { let bytes = truncate(&s, max_chars).len(); s.truncate(bytes); } fn main() { let mut s = "ボルテックス".to_string(); truncate_in_place(&mut s, 0); assert_eq!(s, ""); }

更多推荐

本文发布于:2023-04-28 02:55:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1329806.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:最多   字符串   字符   characters   string

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!