使用HTML5历史URL(无hashbang)将Facebook Scraper重定向到/?

编程入门 行业动态 更新时间:2024-10-26 03:24:49
使用HTML5历史URL(无hashbang)将Facebook Scraper重定向到/?_ escaped_fragment_ =(Redirect Facebook Scraper to /?_escaped_fragment_= with HTML5 history URLs (no hashbang) for AJAX content)

如果您使用hashbang URL,la /#!/path/to/content ,Facebook刮刀(以及Googlebot)将自动转发到/?_escaped_fragment_=/path/to/content ,您可以在其中呈现内容服务器 -一边为刮刀使用。

对于谷歌,如果你包含一个片段元标记( <meta name="fragment" content="!"> ),你可以使用HTML5历史风格的URL(例如,简单的/path/to/content ),它仍然会知道重定向到转义的片段网址。

Facebook似乎不支持这一点。 它将重定向到您设置og:url元标记的任何内容,但我不确定这是否正确使用了og:url标记。

If you use hashbang URLs, a la /#!/path/to/content, the Facebook scraper (as well as the Googlebot) will automatically forward to /?_escaped_fragment_=/path/to/content, where you can render content server-side for the scraper to use.

For Google, if you include a fragment meta tag (<meta name="fragment" content="!">), you can use HTML5 history style URLs (e.g., simply /path/to/content) and it will still know to redirect to the escaped fragment URL.

Facebook doesn't seem to support this. It will redirect to whatever you set the og:url meta tag to, but I'm not sure that this is proper usage of the og:url tag.

最满意答案

所以今天在Twitter上与你交谈并做我自己的研究后,我找到的唯一适合我的解决方案如下:

我正在使用node + express。 我首先检查查询字符串是否为google crawler,但如果用户代理是facebook,我会使用它代替我的片段变量。 然后我解析url并使用grunt-htmlSnapshot插件匹配我创建的一个快照。

app.use(function(req, res, next) { var userAgent = req.headers['user-agent']; var fragment = req.query._escaped_fragment_; if (userAgent.indexOf('facebookexternalhit') >= 0) { fragment = req.url; } // If there is no fragment in the query params // then we're not serving a crawler if (!fragment) return next(); // If the fragment is empty, serve the // index page if (fragment === "" || fragment === "/") fragment = "/.html"; // If fragment does not start with '/' // prepend it to our fragment if (fragment.charAt(0) !== "/") fragment = '/' + fragment; // If fragment does not end with '.html' // append it to the fragment if (fragment.indexOf('.html') == -1) fragment += ".html"; fragment = fragment.replace(/\//g, '_'); // Serve the static html snapshot try { var file = "./snapshots/snapshot_" + fragment; res.sendfile(file); } catch (err) { res.send(404); } });

我的所有快照都存储在./snapshots中,“/ contact /”页面的快照示例为:./ snapshot / images_contact.html

这一切都经过测试,效果很好!

So after talking to you today on twitter and doing my own research the only solution I found that suits me is as follows:

I am using node+express. I check query string first for google crawler, but if user agent is facebook I use this instead for my fragment variable. Then I parse the url and match one of the snapshots that I have created with grunt-htmlSnapshot plugin.

app.use(function(req, res, next) { var userAgent = req.headers['user-agent']; var fragment = req.query._escaped_fragment_; if (userAgent.indexOf('facebookexternalhit') >= 0) { fragment = req.url; } // If there is no fragment in the query params // then we're not serving a crawler if (!fragment) return next(); // If the fragment is empty, serve the // index page if (fragment === "" || fragment === "/") fragment = "/.html"; // If fragment does not start with '/' // prepend it to our fragment if (fragment.charAt(0) !== "/") fragment = '/' + fragment; // If fragment does not end with '.html' // append it to the fragment if (fragment.indexOf('.html') == -1) fragment += ".html"; fragment = fragment.replace(/\//g, '_'); // Serve the static html snapshot try { var file = "./snapshots/snapshot_" + fragment; res.sendfile(file); } catch (err) { res.send(404); } });

All my snapshots are stored in ./snapshots and example of snapshot for "/contact/" page is: ./snapshots/snapshot__contact.html

This is all tested and works great!

更多推荐

本文发布于:2023-08-06 18:26:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1454140.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:重定向   历史   URL   Scraper   Facebook

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!