如何在http请求正文中支持中文？

编程入门行业动态更新时间:2024-10-27 18:23:32

本文介绍了如何在http请求正文中支持中文？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 URL = example, Header = [], Type = "application/json", Content = "我是中文", Body = lists:concat(["{\"type\":\"0\",\"result\":[{\"url\":\"test\",\"content\":\"", unicode:characters_to_list(Content), "\"}]}"]), lager:debug("URL:~p, Body:~p~n", [URL, Body]), HTTPOptions = [], Options = [], Response = httpc:request(post, {URL, Header, Type, Body}, HTTPOptions, Options),

如何解决此问题？

推荐答案

编码的运气不好

您必须格外小心，以确保输入内容符合您的想法，因为它可能与您的预期有所不同。

Luck of the Encoding

You must take special care to ensure input is what you think it is because it may differ from what you expect.

此答案适用于我正在运行的Erlang版本，该版本为 R16B03-1 。我将尝试在此处获取所有详细信息，以便您可以自己安装进行测试并验证。

This answer applies to the Erlang release that I'm running which is R16B03-1. I'll try to get all of the details in here so you can test with your own install and verify.

如果您不采取具体措施进行更改，字符串将解释如下：

If you don't take specific action to change it, a string will be interpreted as follows:

TerminalContent = "我是中文", TerminalContent = [25105,26159,20013,25991].

在终端中，字符串被解释为Unicode字符列表。

In the terminal the string is interpreted as a list of unicode characters.

BytewiseContent = "我是中文", BytewiseContent = [230,136,145,230,152,175,228,184,173,230,150,135].

在模块中，默认编码为 latin1 ，并且包含Unicode字符的字符串将按字节顺序列表（UTF8字节）进行解释。

In a module, the default encoding is latin1 and strings containing unicode characters are interpreted bytewise lists (of UTF8 bytes).

如果您使用类似 BytewiseContent ， unicode：characters_to_list / 1 将对汉字进行双重编码，并ææ¯ä 将被发送到您期望的服务器我是中文。

If you use data encoded like BytewiseContent, unicode:characters_to_list/1 will double-encode the Chinese characters and ææ¯ä will be sent to the server where you expected 我是中文.

指定每个源文件和术语文件的编码。

如果运行 erl 命令行，确保已设置为使用Unicode。

如果您从文件中读取数据，请转换 bytewise 编码为unicode（这也适用于使用 httpc：request / N 获取的二进制数据）。

Specify the encoding for each source file and term file.

If you run an erl command line, ensure it is setup to use unicode.

If you read data from files, translate the bytes from the bytewise encoding to unicode before processing (this goes for binary data acquired using httpc:request/N as well).

如果您将unicode字符嵌入模块中，请确保在模块的前两行中进行注释以表示尽可能多的内容：

If you embed unicode characters in your module, ensure that you indicate as much by commenting within the first two lines of your module:

%% -*- coding: utf-8 -*-

这将改变模块解释字符串的方式，例如：

This will change the way the module interprets the string such that:

UnicodeContent = "我是中文", UnicodeContent = [25105,26159,20013,25991].

一旦您确保要串联字符而不是字节，则串联是安全的。在整个过程构建完成之前，请勿使用 unicode：characters_to_list / 1 转换字符串/列表。

Once you have ensured that you are concatenating characters and not bytes, the concatenation is safe. Don't use unicode:characters_to_list/1 to convert your string/list until the whole thing has been built up.

在给定 Url 和Unicode字符 Content ：

The following function works as expected when given a Url and a list of unicode character Content:

http_post_content(Url, Content) -> ContentType = "application/json", %% Concat the list of (character) lists Body = lists:concat(["{\"content\":\"", Content, "\"}"]), %% Explicitly encode to UTF8 before sending UnicodeBin = unicode:characters_to_binary(Body), httpc:request(post, { Url, [], % HTTP headers ContentType, % content-type UnicodeBin % the body as binary (UTF8) }, [], % HTTP Options [{body_format,binary}] % indicate the body is already binary ).

为验证结果，我使用 node.js编写了以下HTTP服务器 code>和 express 。该死简单服务器的唯一目的是理智地检查问题和解决方案。

To verify results I wrote the following HTTP server using node.js and express. The sole purpose of this dead-simple server is to sanity check the problem and solution.

var express = require('express'), bodyParser = require('body-parser'), util = require('util'); var app = express(); app.use(bodyParser()); app.get('/', function(req, res){ res.send('You probably want to perform an HTTP POST'); }); app.post('/', function(req, res){ util.log("body: "+util.inspect(req.body, false, 99)); res.json(req.body); }); app.listen(3000);

要点

再次在Erlang中，以下功能将检查确保HTTP响应包含回显的JSON，并确保返回了确切的Unicode字符。

Again in Erlang, the following function will check to ensure that the HTTP response contains the echoed JSON, and ensures the exact unicode characters were returned.

verify_response({ok, {{_, 200, _}, _, Response}}, SentContent) -> %% use jiffy to decode the JSON response {Props} = jiffy:decode(Response), %% pull out the "content" property value ContentBin = proplists:get_value(<<"content">>, Props), %% convert the binary value to unicode characters, %% it should equal what we sent. case unicode:characters_to_list(ContentBin) of SentContent -> ok; Other -> {error, [ {expected, SentContent}, {received, Other} ]} end; verify_response(Unexpected, _) -> {error, {http_request_failed, Unexpected}}.

完整的 example.erl 模块发布在Gist中。

The complete example.erl module is posted in a Gist.

已经编译好示例模块并运行了回显服务器，您将需要在Erlang shell中运行类似的东西：

Once you've got the example module compiled and an echo server running you'll want to run something like this in an Erlang shell:

inets:start(). Url = example:url(). Content = example:content(). Response = example:http_post_content(Url, Content).

如果您设置了 jiffy 您还可以验证往返的内容：

If you've got jiffy set up you can also verify the content made the round trip:

example:verify_response(Response, Content).

您现在应该能够确认任何Unicode内容的往返编码。

You should now be able to confirm round-trip encoding of any unicode content.

在我解释了上面的编码后，您会注意到 TerminalContent ， BytewiseContent 和 UnicodeContent 都是整数列表。您应该以一种可以确定自己掌握的方式来进行编码。

While I explained the encodings above you will have noticed that TerminalContent, BytewiseContent, and UnicodeContent are all lists of integers. You should endeavor to code in a manner that allows you to be certain what you have in hand.

奇数球编码是 bytewise ，当使用不支持 unicode感知的模块时，它可能会出现。 Erlang关于使用unicode的指南在标题 UTF-8字节列表。要按字节翻译列表，请使用：

The oddball encoding is bytewise which may turn up when working with modules that are not "unicode aware". Erlang's guidance on working with unicode mentions this near the bottom under the heading Lists of UTF-8 Bytes. To translate bytewise lists use:

%% from www.erlang/doc/apps/stdlib/unicode_usage.html utf8_list_to_string(StrangeList) -> unicode:characters_to_list(list_to_binary(StrangeList)).

我的设置

据我所知，我没有修改Erlang行为的本地设置。我的Erlang由 Erlang Solutions构建和分发 R16B03-1 ，我的机器运行的是OS X 10.9.2。

My Setup

As far as I know, I don't have local settings that modify Erlang's behavior. My Erlang is R16B03-1 built and distributed by Erlang Solutions, my machine runs OS X 10.9.2.