String.truncate(usize)的预期方法失败,因为它不考虑Unicode字符(考虑到Rust将字符串视为Unicode,这是令人困惑的)。
let mut s = "ボルテックス".to_string(); s.truncate(4);线程''恐慌'断言失败:self.is_char_boundary(new_len)'
此外, truncate修改原始字符串,这并不总是需要。
我提出的最好的方法是转换为char并收集到String 。
fn truncate(s: String, max_width: usize) -> String { s.chars().take(max_width).collect() }例如
fn main() { assert_eq!(truncate("ボルテックス".to_string(), 0), ""); assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ"); assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス"); assert_eq!(truncate("hello".to_string(), 4), "hell"); }然而这感觉非常沉重。
The expected approach of String.truncate(usize) fails because it doesn't consider Unicode characters (which is baffling considering Rust treats strings as Unicode).
let mut s = "ボルテックス".to_string(); s.truncate(4);thread '' panicked at 'assertion failed: self.is_char_boundary(new_len)'
Additionally, truncate modifies the original string, which is not always desired.
The best I've come up with is to convert to chars and collect into a String.
fn truncate(s: String, max_width: usize) -> String { s.chars().take(max_width).collect() }e.g.
fn main() { assert_eq!(truncate("ボルテックス".to_string(), 0), ""); assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ"); assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス"); assert_eq!(truncate("hello".to_string(), 4), "hell"); }However this feels very heavy handed.
最满意答案
确保你阅读并理解了delnan的观点 :
Unicode很复杂。 您确定要将char (对应于代码点)作为单元而不是字形集群吗?
这个答案的其余部分假设你有充分的理由使用char而不是graphemes 。
考虑到Rust将字符串视为Unicode,这令人费解
这是不正确的; Rust将字符串视为UTF-8 。 在UTF-8中,每个代码点都映射到可变数量的字节。 没有O(1)算法将“6个字符”转换为“N个字节”,所以标准库不会将其隐藏起来。
您可以使用char_indices按字符char_indices字符串并获取该字符的字节索引:
fn truncate(s: &str, max_chars: usize) -> &str { match s.char_indices().nth(max_chars) { None => s, Some((idx, _)) => &s[..idx], } } fn main() { assert_eq!(truncate("ボルテックス", 0), ""); assert_eq!(truncate("ボルテックス", 4), "ボルテッ"); assert_eq!(truncate("ボルテックス", 100), "ボルテックス"); assert_eq!(truncate("hello", 4), "hell"); }这也会返回一个片段,您可以根据需要选择移入新分配,或者在适当位置更改String :
// May not be as efficient as inlining the code... fn truncate_in_place(s: &mut String, max_chars: usize) { let bytes = truncate(&s, max_chars).len(); s.truncate(bytes); } fn main() { let mut s = "ボルテックス".to_string(); truncate_in_place(&mut s, 0); assert_eq!(s, ""); }Make sure you read and understand delnan's point:
Unicode is freaking complicated. Are you sure you want char (which corresponds to code points) as unit and not grapheme clusters?
The rest of this answer assumes you have a good reason for using char and not graphemes.
which is baffling considering Rust treats strings as Unicode
This is not correct; Rust treats strings as UTF-8. In UTF-8, every code point is mapped to a variable number of bytes. There's no O(1) algorithm to convert "6 characters" to "N bytes", so the standard library doesn't hide that from you.
You can use char_indices to step through the string character by character and get the byte index of that character:
fn truncate(s: &str, max_chars: usize) -> &str { match s.char_indices().nth(max_chars) { None => s, Some((idx, _)) => &s[..idx], } } fn main() { assert_eq!(truncate("ボルテックス", 0), ""); assert_eq!(truncate("ボルテックス", 4), "ボルテッ"); assert_eq!(truncate("ボルテックス", 100), "ボルテックス"); assert_eq!(truncate("hello", 4), "hell"); }This also returns a slice that you can choose to move into a new allocation if you need to, or mutate a String in place:
// May not be as efficient as inlining the code... fn truncate_in_place(s: &mut String, max_chars: usize) { let bytes = truncate(&s, max_chars).len(); s.truncate(bytes); } fn main() { let mut s = "ボルテックス".to_string(); truncate_in_place(&mut s, 0); assert_eq!(s, ""); }更多推荐
发布评论