从网页中提取特定数据(Extract specific data from webpage)

编程入门 行业动态 更新时间:2024-10-27 02:20:30
网页中提取特定数据(Extract specific data from webpage)

基本上这是我的代码:

int main() { CURL *curl; FILE *fp; CURLcode res; std::string readBuffer; curl = curl_easy_init(); char outfilename[FILENAME_MAX] = "C:\\Users\\admin\\desktop\\test.txt"; if(curl) { fp = fopen(outfilename,"wb"); curl_easy_setopt(curl, CURLOPT_URL, "http://www.example.com"); curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "user=123&pass=123"); curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data); curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp); res = curl_easy_perform(curl); Sleep(1000); curl_easy_cleanup(curl); fclose(fp); } return EXIT_SUCCESS; }

输出已成功保存在文本文件中。

我关心的是如何在特定标签之间提取特定内容。

例如,我只想要<bla> .............. </ bla>之间的内容。

什么是最简单的方式,谢谢你。

Basically this is my code :

int main() { CURL *curl; FILE *fp; CURLcode res; std::string readBuffer; curl = curl_easy_init(); char outfilename[FILENAME_MAX] = "C:\\Users\\admin\\desktop\\test.txt"; if(curl) { fp = fopen(outfilename,"wb"); curl_easy_setopt(curl, CURLOPT_URL, "http://www.example.com"); curl_easy_setopt(curl, CURLOPT_POSTFIELDS, "user=123&pass=123"); curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data); curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp); res = curl_easy_perform(curl); Sleep(1000); curl_easy_cleanup(curl); fclose(fp); } return EXIT_SUCCESS; }

The output is successfully saved in the text file.

My concern is how to extract specific content in between specific tags.

For example i want only the content between < bla> .............. < /bla> .

Whats the easiest way and thank you.

最满意答案

在您的示例中,您将响应从网站转储到文件,libcURL写入您按原样命中的网页返回的数据,它不会花费重组返回的数据。

您可以通过定义write_data函数来获取内存中的数据,该函数只需要以下格式:

size_t write_data(char *ptr, size_t size, size_t nmemb, void *userdata);

在内存中获取数据后,您可以解析它并根据需要对其进行重组。 有关使用write_data函数, 请参见示例 。

对于XML解析,您可以使用此示例代码

In your Example, you are dumping the response from the website to a file, libcURL writes the data returned by the webpage that you hit as it is, it does not take efforts for restructuring the returned data.

You can obtain the data in a memory, by defining the write_data function, which needs the following format only:

size_t write_data(char *ptr, size_t size, size_t nmemb, void *userdata);

Once you get the data in a memory, you can parse it and restructure it as required. See Example Here for using write_data function.

For XML Parsing you may use This sample code

更多推荐

本文发布于:2023-07-05 14:30:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1038445.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:网页   数据   Extract   webpage   data

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!