我怎样截断一个字符串最多有N个字符？(How can I truncate a string to have at most N characters?)

String.truncate(usize)的预期方法失败，因为它不考虑Unicode字符（考虑到Rust将字符串视为Unicode，这是令人困惑的）。

let mut s = "ボルテックス".to_string(); s.truncate(4);

线程''恐慌'断言失败：self.is_char_boundary（new_len）'

此外， truncate修改原始字符串，这并不总是需要。

我提出的最好的方法是转换为char并收集到String 。

fn truncate(s: String, max_width: usize) -> String { s.chars().take(max_width).collect() }

例如

fn main() { assert_eq!(truncate("ボルテックス".to_string(), 0), ""); assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ"); assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス"); assert_eq!(truncate("hello".to_string(), 4), "hell"); }

然而这感觉非常沉重。

The expected approach of String.truncate(usize) fails because it doesn't consider Unicode characters (which is baffling considering Rust treats strings as Unicode).

let mut s = "ボルテックス".to_string(); s.truncate(4);

thread '' panicked at 'assertion failed: self.is_char_boundary(new_len)'

Additionally, truncate modifies the original string, which is not always desired.

The best I've come up with is to convert to chars and collect into a String.

fn truncate(s: String, max_width: usize) -> String { s.chars().take(max_width).collect() }

e.g.

fn main() { assert_eq!(truncate("ボルテックス".to_string(), 0), ""); assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ"); assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス"); assert_eq!(truncate("hello".to_string(), 4), "hell"); }

However this feels very heavy handed.

最满意答案

确保你阅读并理解了delnan的观点：

Unicode很复杂。您确定要将char （对应于代码点）作为单元而不是字形集群吗？

这个答案的其余部分假设你有充分的理由使用char而不是graphemes 。

考虑到Rust将字符串视为Unicode，这令人费解

这是不正确的; Rust将字符串视为UTF-8 。在UTF-8中，每个代码点都映射到可变数量的字节。没有O(1)算法将“6个字符”转换为“N个字节”，所以标准库不会将其隐藏起来。

您可以使用char_indices按字符char_indices字符串并获取该字符的字节索引：

fn truncate(s: &str, max_chars: usize) -> &str { match s.char_indices().nth(max_chars) { None => s, Some((idx, _)) => &s[..idx], } } fn main() { assert_eq!(truncate("ボルテックス", 0), ""); assert_eq!(truncate("ボルテックス", 4), "ボルテッ"); assert_eq!(truncate("ボルテックス", 100), "ボルテックス"); assert_eq!(truncate("hello", 4), "hell"); }

这也会返回一个片段，您可以根据需要选择移入新分配，或者在适当位置更改String ：

// May not be as efficient as inlining the code... fn truncate_in_place(s: &mut String, max_chars: usize) { let bytes = truncate(&s, max_chars).len(); s.truncate(bytes); } fn main() { let mut s = "ボルテックス".to_string(); truncate_in_place(&mut s, 0); assert_eq!(s, ""); }

Make sure you read and understand delnan's point:

Unicode is freaking complicated. Are you sure you want char (which corresponds to code points) as unit and not grapheme clusters?

The rest of this answer assumes you have a good reason for using char and not graphemes.

which is baffling considering Rust treats strings as Unicode

This is not correct; Rust treats strings as UTF-8. In UTF-8, every code point is mapped to a variable number of bytes. There's no O(1) algorithm to convert "6 characters" to "N bytes", so the standard library doesn't hide that from you.

You can use char_indices to step through the string character by character and get the byte index of that character:

fn truncate(s: &str, max_chars: usize) -> &str { match s.char_indices().nth(max_chars) { None => s, Some((idx, _)) => &s[..idx], } } fn main() { assert_eq!(truncate("ボルテックス", 0), ""); assert_eq!(truncate("ボルテックス", 4), "ボルテッ"); assert_eq!(truncate("ボルテックス", 100), "ボルテックス"); assert_eq!(truncate("hello", 4), "hell"); }

This also returns a slice that you can choose to move into a new allocation if you need to, or mutate a String in place:

// May not be as efficient as inlining the code... fn truncate_in_place(s: &mut String, max_chars: usize) { let bytes = truncate(&s, max_chars).len(); s.truncate(bytes); } fn main() { let mut s = "ボルテックス".to_string(); truncate_in_place(&mut s, 0); assert_eq!(s, ""); }

更多推荐

我怎样截断一个字符串最多有N个字符？(How can I truncate a string to have at most N characters?)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表