将围绕sockaddr

编程入门行业动态更新时间:2024-10-27 21:21:28

本文介绍了将围绕sockaddr_storage和sockaddr_in进行转换将破坏严格的别名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

在我之前的问题之后，我我真的对这段代码感到好奇-

Following my previous question, I'm really curious about this code -

case AF_INET: { struct sockaddr_in * tmp = reinterpret_cast<struct sockaddr_in *> (&addrStruct); tmp->sin_family = AF_INET; tmp->sin_port = htons(port); inet_pton(AF_INET, addr, tmp->sin_addr); } break;

在问这个问题之前，我在SO上搜索了同一主题，并得到了关于该主题的混合回答.例如，请参见此，此和这之间的差异，其中说使用这种代码在某种程度上是安全的.另外还有另一条帖子，内容为使用工会来完成这项任务，但是对于接受的答案的评论再次变得不一样.

Before asking this question, I've searched across SO about same topic and have got mix responses about this topic. For example, see this, this and this post which say that it is somehow safe to use this kind of code. Also there's another post that says to use unions for such task but again the comments on accepted answer beg to differ.

Microsoft的文档在相同的结构上说-

Microsoft's documentation on same structure says -

应用程序开发人员通常仅使用SOCKADDR_STORAGE的ss_family成员.其余成员确保SOCKADDR_STORAGE可以包含IPv6或IPv4地址，并适当填充该结构以实现64位对齐.这样的对齐使特定于协议的套接字地址数据结构能够访问SOCKADDR_STORAGE结构内的字段，而不会出现对齐问题.通过填充，SOCKADDR_STORAGE结构的长度为128个字节.

Application developers normally use only the ss_family member of the SOCKADDR_STORAGE. The remaining members ensure that the SOCKADDR_STORAGE can contain either an IPv6 or IPv4 address and the structure is padded appropriately to achieve 64-bit alignment. Such alignment enables protocol-specific socket address data structures to access fields within a SOCKADDR_STORAGE structure without alignment problems. With its padding, the SOCKADDR_STORAGE structure is 128 bytes in length.

Opengroup的文档状态-

Opengroup's documentation states -

头应定义sockaddr_storage结构.该结构应为:

The header shall define the sockaddr_storage structure. This structure shall be:

足够容纳所有受支持的协议特定地址结构

Large enough to accommodate all supported protocol-specific address structures

在适当的边界处对齐，以便可以将指向其的指针转换为指向协议特定地址结构的指针，并用于访问那些结构的字段而不会出现对齐问题

Aligned at an appropriate boundary so that pointers to it can be cast as pointers to protocol-specific address structures and used to access the fields of those structures without alignment problems

socket 的手册页也表示相同-

Man page of socket also says same -

此外，套接字API提供了数据类型struct sockaddr_storage.此类型适合容纳所有受支持的特定于域的套接字地址结构；它足够大并且对齐正确. (特别是，它足够容纳IPv6套接字地址.)

In addition, the sockets API provides the data type struct sockaddr_storage. This type is suitable to accommodate all supported domain-specific socket address structures; it is large enough and is aligned properly. (In particular, it is large enough to hold IPv6 socket addresses.)

我在野外已经看到在C和C++语言中都使用这种强制转换的多种实现方式，现在我不确定哪个是对的，因为有些帖子与上述主张相矛盾-此和.

I've seen multiple implementation using such casts in both C and C++ languages in the wild and now I'm uncertain of the fact which one is right since there are some posts that contradict with above claims - this and this.

那么填充sockaddr_storage结构的安全正确的方法是哪一种?这些指针强制转换安全吗?还是联合方法?我也知道getaddrinfo()调用，但是对于上述仅填充结构的任务而言，这似乎有点复杂.还有另一种推荐的memcpy方式，这样安全吗?

So which one is the safe and right way to fill up a sockaddr_storage structure? Are these pointer casts safe? or the union method? I'm also aware of the getaddrinfo() call but that seems a little complicated for the above task of just filling the structs. There is one other recommended way with memcpy, is this safe?

推荐答案

在过去的十年中，C和C ++编译器比设计sockaddr接口甚至编写C99时要复杂得多.作为其一部分，已理解的未定义行为"的目的已改变.过去，未定义的行为通常旨在掩盖硬件实现之间关于操作语义是什么的分歧.但是如今，归功于最终归因于许多组织，他们希望不再需要编写FORTRAN并有能力支付编译器工程师来实现这一目标，因此，未定义的行为是编译器用来对代码进行推断的事情. >.左移就是一个很好的例子:C99 6.5.7p3,4(为清楚起见，重新排列了一下)读取

C and C++ compilers have become much more sophisticated in the past decade than they were when the sockaddr interfaces were designed, or even when C99 was written. As part of that, the understood purpose of "undefined behavior" has changed. Back in the day, undefined behavior was usually intended to cover disagreement among hardware implementations as to what the semantics of an operation was. But nowadays, thanks ultimately to a number of organizations who wanted to stop having to write FORTRAN and could afford to pay compiler engineers to make that happen, undefined behavior is a thing that compilers use to make inferences about the code. Left shift is a good example: C99 6.5.7p3,4 (rearranged a little for clarity) reads

E1 << E2的结果是E1左移E2位的位置；空位用零填充.如果[E2]的值为负或为大于或等于提升的[E1]的宽度，则行为是不确定的.

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If the value of [E2] is negative or is greater than or equal to the width of the promoted [E1], the behavior is undefined.

例如，在unsigned int为32位宽的平台上，1u << 33是UB.该委员会之所以没有定义，是因为在这种情况下，不同的CPU架构的左移指令执行不同的操作:有些始终产生零，有些减少了类型宽度(x86)的模数，有些减少了模数的模数. (ARM)，并且至少会有一种历史上常见的体系结构会陷入陷阱(我不知道是哪一种，但这就是为什么它是未定义且未指定的原因).但是现在，如果你写

So, for instance, 1u << 33 is UB on a platform where unsigned int is 32 bits wide. The committee made this undefined because different CPU architectures' left-shift instructions do different things in this case: some produce zero consistently, some reduce the shift count modulo the width of the type (x86), some reduce the shift count modulo some larger number (ARM), and at least one historically-common architecture would trap (I don't know which one, but that's why it's undefined and not unspecified). But nowadays, if you write

unsigned int left_shift(unsigned int x, unsigned int y) { return x << y; }

在具有32位unsigned int的平台上，当编译器知道以下UB规则时，将推断y的值必须在0到32之间叫.它将把该范围输入到过程间分析中，并使用它来执行诸如在调用方中删除不必要的范围检查之类的操作.如果程序员有理由认为它们不是不必要，那么，现在您开始了解为什么这个主题如此蠕虫.

on a platform with 32-bit unsigned int, the compiler, knowing the above UB rule, will infer that y must have a value in the range 0 through 32 when the function is called. It will feed that range into interprocedural analysis, and use it to do things like remove unnecessary range checks in the callers. If the programmer has reason to think they aren't unnecessary, well, now you begin to see why this topic is such a can of worms.

有关针对未定义行为目的进行的此更改的更多信息，请参阅LLVM人们关于该主题的三部分文章( 1 2 3 ).

For more on this change in the purpose of undefined behavior, please see the LLVM people's three-part essay on the subject (1 2 3).

现在您了解了，我实际上可以回答您的问题.

Now that you understand that, I can actually answer your question.

这些是struct sockaddr，struct sockaddr_in和struct sockaddr_storage的定义，它消除了一些不相关的并发症:

These are the definitions of struct sockaddr, struct sockaddr_in, and struct sockaddr_storage, after eliding some irrelevant complications:

struct sockaddr { uint16_t sa_family; }; struct sockaddr_in { uint16_t sin_family; uint16_t sin_port; uint32_t sin_addr; }; struct sockaddr_storage { uint16_t ss_family; char __ss_storage[128 - (sizeof(uint16_t) + sizeof(unsigned long))]; unsigned long int __ss_force_alignment; };

这是穷人的子类.它是C语言中的一个普遍用法.您定义了一组结构，它们都具有相同的初始字段，这是一个代码号，用于告诉您实际上已传递了哪种结构.早在今天，每个人都希望如果您分配并填写了struct sockaddr_in，则将其向上转换为struct sockaddr，然后将其传递给例如connect，connect的实现可以安全地取消引用struct sockaddr指针以检索sa_family字段，了解它正在查看sockaddr_in，将其抛回并继续. C标准始终表示，取消引用struct sockaddr指针会触发未定义的行为-自C89以来，这些规则未更改-但每个人都希望在这种情况下是安全的，因为它将是相同的负载"无论您实际上使用哪种结构，都可以使用"16位"指令.这就是POSIX和Windows文档谈论对齐的原因.早在1990年代编写这些规范的人就认为，实际上可能会造成麻烦的主要方式是，如果您结束了发出未对齐的内存访问的操作.

This is poor man's subclassing. It is a ubiquitous idiom in C. You define a set of structures that all have the same initial field, which is a code number that tells you which structure you've actually been passed. Back in the day, everyone expected that if you allocated and filled in a struct sockaddr_in, upcast it to struct sockaddr, and passed it to e.g. connect, the implementation of connect could dereference the struct sockaddr pointer safely to retrieve the sa_family field, learn that it was looking at a sockaddr_in, cast it back, and proceed. The C standard has always said that dereferencing the struct sockaddr pointer triggers undefined behavior—those rules are unchanged since C89—but everyone expected that it would be safe in this case because it would be the same "load 16 bits" instruction no matter which structure you were really working with. That's why POSIX and the Windows documentation talk about alignment; the people who wrote those specs, back in the 1990s, thought that the primary way this could actually be trouble was if you wound up issuing a misaligned memory access.

但是该标准的文字并未提及任何有关加载指令或对齐的内容.这就是它的意思(C99§6.5p7+脚注):

But the text of the standard doesn't say anything about load instructions, nor alignment. This is what it says (C99 §6.5p7 + footnote):

只能通过具有以下类型之一的左值表达式访问对象的存储值: 73)

与对象的有效类型兼容的类型
与对象的有效类型兼容的类型的限定版本，
一种类型，它是与有效类型相对应的有符号或无符号类型对象
一种类型，它是与标准版本对应的有符号或无符号类型对象的有效类型，
在其中包括上述类型之一的集合或联合类型成员(递归地包括子集合或所包含的联盟的成员)，或
一种字符类型.

a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.

73)该列表的目的是指定对象可能会别名也可能不会别名的那些情况.

73) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

struct类型仅与它们自己兼容"，并且声明的变量的有效类型"是其声明的类型.所以您显示的代码...

struct types are "compatible" only with themselves, and the "effective type" of a declared variable is its declared type. So the code you showed...

struct sockaddr_storage addrStruct; /* ... */ case AF_INET: { struct sockaddr_in * tmp = (struct sockaddr_in *)&addrStruct; tmp->sin_family = AF_INET; tmp->sin_port = htons(port); inet_pton(AF_INET, addr, tmp->sin_addr); } break;

...具有未定义的行为，即使天真的代码生成将按预期方式运行，编译器也可以从中进行推断.现代编译器可能从中推断出case AF_INET 永远无法执行.它将整个代码块删除为死代码，并且随之而来的是热闹.

... has undefined behavior, and compilers can make inferences from that, even though naive code generation would behave as expected. What a modern compiler is likely to infer from this is that the case AF_INET can never be executed. It will delete the entire block as dead code, and hilarity will ensue.

那么您如何安全地使用sockaddr?最简短的答案是只使用 getaddrinfo 和 getnameinfo ."他们为您解决了这个问题.

So how do you work with sockaddr safely? The shortest answer is "just use getaddrinfo and getnameinfo." They deal with this problem for you.

但是也许您需要使用getaddrinfo无法处理的地址族，例如AF_UNIX.在大多数情况下，您只需声明地址族类型正确的变量，并在调用带有struct sockaddr *

But maybe you need to work with an address family, such as AF_UNIX, that getaddrinfo doesn't handle. In most cases you can just declare a variable of the correct type for the address family, and cast it only when calling functions that take a struct sockaddr *

int connect_to_unix_socket(const char *path, int type) { struct sockaddr_un sun; size_t plen = strlen(path); if (plen >= sizeof(sun.sun_path)) { errno = ENAMETOOLONG; return -1; } sun.sun_family = AF_UNIX; memcpy(sun.sun_path, path, plen+1); int sock = socket(AF_UNIX, type, 0); if (sock == -1) return -1; if (connect(sock, (struct sockaddr *)&sun, offsetof(struct sockaddr_un, sun_path) + plen)) { int save_errno = errno; close(sock); errno = save_errno; return -1; } return sock; }

connect的实现必须跳过一些环以确保安全，但这不是您的问题.

The implementation of connect has to jump through some hoops to make this safe, but that is Not Your Problem.

与另一个答案相反，在一种情况下，您可能想使用sockaddr_storage；与需要同时处理IPv4和IPv6地址的服务器中的getpeername和getnameinfo结合使用.这是一种方便的方法，可以知道要分配多少缓冲区.

Contra the other answer, there is one case where you might want to use sockaddr_storage; in conjunction with getpeername and getnameinfo, in a server that needs to handle both IPv4 and IPv6 addresses. It is a convenient way to know how big of a buffer to allocate.

#ifndef NI_IDN #define NI_IDN 0 #endif char *get_peer_hostname(int sock) { char addrbuf[sizeof(struct sockaddr_storage)]; socklen_t addrlen = sizeof addrbuf; if (getpeername(sock, (struct sockaddr *)addrbuf, &addrlen)) return 0; char *peer_hostname = malloc(MAX_HOSTNAME_LEN+1); if (!peer_hostname) return 0; if (getnameinfo((struct sockaddr *)addrbuf, addrlen, peer_hostname, MAX_HOSTNAME_LEN+1, 0, 0, NI_IDN) { free(peer_hostname); return 0; } return peer_hostname; }

(我也可以写struct sockaddr_storage addrbuf，但是我想强调一点，我实际上并不需要直接访问addrbuf的内容.)

(I could just as well have written struct sockaddr_storage addrbuf, but I wanted to emphasize that I never actually need to access the contents of addrbuf directly.)

最后一点:如果BSD员工已经定义了sockaddr结构，只是有点不同...

A final note: if the BSD folks had defined the sockaddr structures just a little bit differently ...

struct sockaddr { uint16_t sa_family; }; struct sockaddr_in { struct sockaddr sin_base; uint16_t sin_port; uint32_t sin_addr; }; struct sockaddr_storage { struct sockaddr ss_base; char __ss_storage[128 - (sizeof(uint16_t) + sizeof(unsigned long))]; unsigned long int __ss_force_alignment; };

...由于包含上述类型之一的聚合或联合"规则，上下行都将得到很好的定义. 如果您想知道如何在新的C代码中处理此问题，请继续.

... upcasts and downcasts would have been perfectly well-defined, thanks to the "aggregate or union that includes one of the aforementioned types" rule. If you're wondering how you should deal with this problem in new C code, here you go.

更多推荐

将围绕sockaddr

本文发布于:2023-10-09 02:12:22，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1474410.html