搜索引擎从哪里开始抓取?

编程入门行业动态更新时间:2024-10-10 06:22:43

本文介绍了搜索引擎从哪里开始抓取?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

搜索引擎机器人使用什么作为起点?是 DNS 查找还是从一些固定的知名站点列表开始?有什么猜测或建议吗?

What do search engine bots use as a starting point? Is it DNS look-up or do they start with some fixed list of well-know sites? Any guesses or suggestions?

推荐答案

你的问题可以有两种解释:

Your question can be interpreted in two ways:

您是在问搜索引擎一般从哪里开始抓取，还是从哪里开始抓取特定网站?

Are you asking where search engines start their crawl from in general, or where they start to crawl a particular site?

我不知道大玩家是如何运作的；但如果您要制作自己的搜索引擎，您可能会将其植入流行的门户网站.DMOZ 似乎是一个流行的起点观点.由于大玩家拥有比我们多得多的数据，他们可能会从各种地方开始爬行.

I don't know how the big players work; but if you were to make your own search engine you'd probably seed it with popular portal sites. DMOZ seems to be a popular starting point. Since the big players have so much more data than we do they probably start their crawls from a variety of places.

如果您要问 SE 从哪里开始抓取您的特定网站，这可能与您的哪些网页最受欢迎有关.我想如果你有一个超级受欢迎的页面，很多其他网站都链接到这个页面，那么这就是 SE 开始进入的页面，因为来自其他网站的入口点太多了.

If you're asking where a SE starts to crawl your particular site, it probably has a lot to do with which of your pages are the most popular. I imagine that if you have one super popular page that lots of other sites link to, then that would be the page that SEs starts will enter from because there are so many more entry points from other sites.

请注意，我不从事 SEO 或任何其他工作；我刚刚为我正在从事的一个项目研究了一段时间的机器人和 SE 流量.

Note that I am not in SEO or anything; I just studied bot and SE traffic for a while for a project I was working on.