使用HTML5历史URL（无hashbang）将Facebook Scraper重定向到/？

使用HTML5历史URL（无hashbang）将Facebook Scraper重定向到/？_ escaped_fragment_ =(Redirect Facebook Scraper to /?_escaped_fragment_= with HTML5 history URLs (no hashbang) for AJAX content)

如果您使用hashbang URL，la /#!/path/to/content ，Facebook刮刀（以及Googlebot）将自动转发到/?_escaped_fragment_=/path/to/content ，您可以在其中呈现内容服务器 -一边为刮刀使用。

对于谷歌，如果你包含一个片段元标记（ <meta name="fragment" content="!"> ），你可以使用HTML5历史风格的URL（例如，简单的/path/to/content ），它仍然会知道重定向到转义的片段网址。

Facebook似乎不支持这一点。它将重定向到您设置og:url元标记的任何内容，但我不确定这是否正确使用了og：url标记。

If you use hashbang URLs, a la /#!/path/to/content, the Facebook scraper (as well as the Googlebot) will automatically forward to /?_escaped_fragment_=/path/to/content, where you can render content server-side for the scraper to use.

For Google, if you include a fragment meta tag (<meta name="fragment" content="!">), you can use HTML5 history style URLs (e.g., simply /path/to/content) and it will still know to redirect to the escaped fragment URL.

Facebook doesn't seem to support this. It will redirect to whatever you set the og:url meta tag to, but I'm not sure that this is proper usage of the og:url tag.

最满意答案

所以今天在Twitter上与你交谈并做我自己的研究后，我找到的唯一适合我的解决方案如下：

我正在使用node + express。我首先检查查询字符串是否为google crawler，但如果用户代理是facebook，我会使用它代替我的片段变量。然后我解析url并使用grunt-htmlSnapshot插件匹配我创建的一个快照。

app.use(function(req, res, next) { var userAgent = req.headers['user-agent']; var fragment = req.query._escaped_fragment_; if (userAgent.indexOf('facebookexternalhit') >= 0) { fragment = req.url; } // If there is no fragment in the query params // then we're not serving a crawler if (!fragment) return next(); // If the fragment is empty, serve the // index page if (fragment === "" || fragment === "/") fragment = "/.html"; // If fragment does not start with '/' // prepend it to our fragment if (fragment.charAt(0) !== "/") fragment = '/' + fragment; // If fragment does not end with '.html' // append it to the fragment if (fragment.indexOf('.html') == -1) fragment += ".html"; fragment = fragment.replace(/\//g, '_'); // Serve the static html snapshot try { var file = "./snapshots/snapshot_" + fragment; res.sendfile(file); } catch (err) { res.send(404); } });

我的所有快照都存储在./snapshots中，“/ contact /”页面的快照示例为：./ snapshot / images_contact.html

这一切都经过测试，效果很好！

So after talking to you today on twitter and doing my own research the only solution I found that suits me is as follows:

I am using node+express. I check query string first for google crawler, but if user agent is facebook I use this instead for my fragment variable. Then I parse the url and match one of the snapshots that I have created with grunt-htmlSnapshot plugin.

app.use(function(req, res, next) { var userAgent = req.headers['user-agent']; var fragment = req.query._escaped_fragment_; if (userAgent.indexOf('facebookexternalhit') >= 0) { fragment = req.url; } // If there is no fragment in the query params // then we're not serving a crawler if (!fragment) return next(); // If the fragment is empty, serve the // index page if (fragment === "" || fragment === "/") fragment = "/.html"; // If fragment does not start with '/' // prepend it to our fragment if (fragment.charAt(0) !== "/") fragment = '/' + fragment; // If fragment does not end with '.html' // append it to the fragment if (fragment.indexOf('.html') == -1) fragment += ".html"; fragment = fragment.replace(/\//g, '_'); // Serve the static html snapshot try { var file = "./snapshots/snapshot_" + fragment; res.sendfile(file); } catch (err) { res.send(404); } });

All my snapshots are stored in ./snapshots and example of snapshot for "/contact/" page is: ./snapshots/snapshot__contact.html

This is all tested and works great!

更多推荐