在不违反严格混叠的情况下将u8string

编程入门 行业动态 更新时间:2024-10-27 21:25:06
本文介绍了在不违反严格混叠的情况下将u8string_view转换为char数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述
  • 我的内存中有一堆二进制数据,表示为 char * (可以从文件中读取,也可以通过网络传输).
  • 我知道它包含一个以UTF8编码的文本字段,该文本字段在一定的偏移处具有一定的长度.
  • I have a blob of binary data in memory, represented as a char* (maybe read from a file, or transmitted over the network).
  • I know that it contains a UTF8-encoded text field of a certain length at a certain offset.

如何(安全且方便地)获取 u8string_view 来表示此文本字段的内容?

How can I (safely and portably) get a u8string_view to represent the contents of this text field?

将字段作为 u8string_view 传递给下游代码的动机是:

The motivation for passing the field to down-stream code as a u8string_view is:

  • 很清楚地表明,与 string_view 不同,该文本字段是UTF8编码的.
  • 它避免了将其作为 u8string 返回的开销(可能是免费存储分配+复制).
  • It very clearly communicates that the text field is UTF8-encoded, unlike string_view.
  • It avoids the cost (likely free-store allocation + copying) of returning it as u8string.

做到这一点的天真的方法是:

The naive way to do this, would be:

char* data = ...; size_t field_offset = ...; size_t field_length = ...; char8_t* field_ptr = reinterpret_cast<char8_t*>(data + field_offset); u8string_view field(field_ptr, field_length);

但是,如果我正确理解C ++严格别名规则,则这是未定义的行为,因为它通过返回的 char8_t * 指针访问 char * 缓冲区的内容按 reinterpret_cast ,而 char8_t 不是别名类型.

However, if I understand the C++ strict-aliasing rules correctly, this is undefined behavior because it accesses the contents of the char* buffer via the char8_t* pointer returned by reinterpret_cast, and char8_t is not an aliasing type.

是真的吗?

有安全的方法吗?

推荐答案

同一问题有时也会在其他情况下发生,例如使用共享内存.

This same problem occurs occasionally in other contexts too, including the use of shared memory for example.

一种使用原始"位中的位创建对象的技巧.不分配内存的内存是通过memcpy创建本地对象,然后在原始"磁盘上创建该本地对象的动态副本.记忆.示例:

A trick to create objects using bits in "raw" memory without allocating memory is to create a local object by memcpy, and then create a dynamic copy of that local object over the "raw" memory. Example:

char* begin_raw = data + field_offset; char8_t* last {}; for(std::ptrdiff_t i = 0; i < field_length; i++) { char* current = begin_raw + i; char8_t local {}; std::memcpy(&local, current, sizeof local); last = new (current) char8_t(local); } char8_t* begin = last - (field_length - 1); std::u8string_view field(begin, field_length);

在您反对不希望复制之前,请注意最终结果不会导致原始"图像的表示形式发生任何变化.记忆.编译器也可以注意到这一点,并且可以将整个循环编译为零指令(在我的测试中,GCC和Clang使用-O2实现了此目的).我们所做的全部工作就是通过在内存中创建动态对象来满足语言的对象生存期规则.

Before you object that you don't want to copy, notice that the end result causes no changes to the representation of the "raw" memory. The compiler can notice this too, and can compile the entire loop into zero instructions (in my tests GCC and Clang achieve this with -O2). All that we have done is satisfy the object lifetime rules of the language by creating dynamic objects into the memory.

更多推荐

在不违反严格混叠的情况下将u8string

本文发布于:2023-10-09 02:13:32,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1474414.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:情况下   u8string

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!