2008-05-11

搜索引擎爬虫列表

Posted in 实验室, Apache, FreeBSD/Unix服务器 at 18:52 Author:仲远

标签:

搜索引擎是人们再熟悉不过的工具,它将Internet上的信息索引起来,方便人们在海量数据中迅速查找有用的信息。而搜索引擎公司,国外以Google为代表,国内以百度为代表,成为互联网上举足轻重的公司,其访问量也远远超过传统门户网站,成为网民最离不开的网络服务。而遇到问题“Google一下”或者“百度一下”也成为人们日常生活中的习惯。

而对于网站建设者而言,也会经常与搜索引擎打交道,这就是Search Engine Spiders(网络爬虫,或者叫搜索引擎蜘蛛,网页爬取机器人等)。这些Spider爬虫蜘蛛会经常光顾各个网站,将网站上的最新内容爬取下来,并编入索引库中。以下是一些常见的搜索引擎蜘蛛列表:

高强度爬虫程序

Baiduspider+(+http://www.baidu.com/search/spider.htm)
百度爬虫
高强度爬虫,有时会从多个IP地址启动多个爬虫程序!
由于算法问题,百度爬虫对相同页面会多次发出请求(尤其是首页),令人烦恼。
推广效果好。
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
雅虎爬虫,分别是雅虎中国和美国总部的爬虫
高强度爬虫,有时会从多个IP地址启动多个爬虫程序!
比较规范的爬虫,看参考其网址,设定爬虫访问间隔。(但需要考虑同时出现多个yahoo爬虫)
推广效果尚可。
iaskspider/2.0(+http://iask.com/help/help_index.html)
Mozilla/5.0 (compatible; iaskspider/1.0; MSIE 6.0)
新浪爱问爬虫
算法差,大量扫描无实际意义的页面,对动态链接网站负担很大
推广效果差。
Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Sogou Push Spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
【早期用法:“sogou spider”】
搜狗爬虫
算法差,大量扫描无实际意义的页面,对动态链接网站负担很大
推广效果差。

中等强度爬虫程序

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Google爬虫
算法优秀,多为访问有实际内容的页面
推广效果好。 → 详情
Mediapartners-Google/2.1
Google AdSense广告内容匹配爬虫,对网页收录有一定辅助作用。→ 详情
Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/; )
【早期采用“ OutfoxBot/0.5 (for internet experiments; http://; outfoxbot@gmail.com)”】
网易爬虫
其搜索算法需要改进
推广效果差。
ia_archiver
Alexa排名爬虫,用于检测网站是否做了alexa排名的作弊。→ 详情
Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)
美国斯坦福大学的一个学生研究项目。→ 详情
WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)
来自俄国的爬虫,对中国大陆的网站基本无推广效果。
其agent信息中给出的网页无法打开,据说webalta.net是俄国非常流行的搜索引擎。

其他搜索引擎的爬虫

msnbot/1.0 (+http://search.msn.com/msnbot.htm)
MSN爬虫
特点未知
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)
{{{1}}}(欢迎补充资料)
特点未知
Mozilla/4.0(compatible; MSIE 5.0; Windows 98; DigExt)
DigExt并非一个独立的爬虫程序,而是IE5的“允许脱机阅读”模式标记。→ 详情
Mozilla/3.0 (compatible; Indy Library)
Indy Library本来是个开源程序库,但后来被spam bots冒用。→ 详情
抓取强度:各服务器上不一定
推广效果:无
P.Arthur 1.1
据称为北大天网的搜索引擎爬虫程序
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot 1.0)
名字上看来是Qihoo的
特点未知
Gigabot
Gigabot/2.0 (http://www.gigablast.com/spider.html)
Gigabot搜索引擎爬虫。已被google收购?(欢迎补充资料)
eApolloBot/1.0 (eApollo search engine robot; http://www.eapollo.com; eapollo at global-opto dot com)
lanshanbot/1.0
据说是中搜爬虫。(欢迎补充资料)
iearthworm/1.0, iearthworm@yahoo.com.cn
专抓图片的爬虫,据说来源IP是3721或阿里巴巴。→ 详情
TMCrawler
Mozilla/5.0 (compatible; heritrix/1.10.2 +http://i.stanford.edu/)
开源的网络爬虫,一个数字图书馆项目。→ 详情
WebNews http.pl

 

本文可以自由转载,转载时请保留全文并注明出处:
转载自仲子说 [ http://www.wangzhongyuan.com/ ]
原文链接:

20 Comments »

  1. sunnyu » Blog Archive » 通过网站日志做网络爬虫和恶意访问分析 said,

    2008年May27日 at 19:19

    […] http://www.wangzhongyuan.com/archives/367.html […]

  2. 双色球 said,

    2009年January18日 at 0:33

    很适合新手的资料!

  3. 石英砂 said,

    2009年April21日 at 17:24

    顶一下!!
    北京巩义明建科技有限公司供:补偿器,伸缩器,曝气软管,橡胶接头,阀门,石英砂,活性炭,活性氧化铝,聚合氯化铝,聚丙烯酰胺,生物球填料等产品.

  4. 云南旅游 said,

    2011年April6日 at 14:14

    研究SEO中…

  5. Not Many Know About Grants To Pay Off Student Loans said,

    2012年October17日 at 4:55

    Yes! Finally someone writes about student loans.

  6. scott Tucker leawood said,

    2013年April20日 at 3:00

    Thank you for sharing your info. I truly appreciate your efforts and I will be
    waiting for your further post thanks once again.

  7. kidney stone pain said,

    2013年August30日 at 18:00

    This site was… how do I say it? Relevant!! Finally I have found something that helped me.

    Appreciate it!

  8. kidney pain said,

    2013年October5日 at 17:33

    Unquestionably consider that that you said. Your favorite reason appeared to be at the internet
    the easiest thing to have in mind of. I say to you, I definitely
    get irked while folks think about concerns that
    they just do not know about. You managed to hit the nail upon the top and also defined out the whole thing without having
    side-effects , people could take a signal. Will likely be back to get more.
    Thanks

  9. http://www.aboutus.org/Kidney-Pain.org said,

    2013年October5日 at 17:41

    Spot on with this write-up, I seriously believe that this site needs much more attention.
    I’ll probably be returning to read through more,
    thanks for the information!

  10. Car Tracking GPS Localizacion satelital. said,

    2014年February13日 at 14:23

    This is a topic which is close to my heart…
    Cheers! Exactly where are your contact details though?

  11. Shauna said,

    2014年June13日 at 21:29

    Wonderful post but I was wanting to know
    if you could write a litte more on this topic? I’d be very grateful if you could elaborate a little bit further.
    Appreciate it!

  12. www.youtube.conm said,

    2014年August23日 at 22:57

    You may first write a post or entry about the video
    you are offering, but in a teaser way that leads your reader to click the video.
    I think Tube Slicer will really help me in my efforts to get
    noticed over my competitors because you can highlight your exact content you’re presenting,
    and they have bold letters that jump off the page. What better way to gain someone’s
    attention by having a good blog with a video to go along
    with it.

  13. Madeleine said,

    2014年September22日 at 21:00

    At this time I am ready to do my breakfast, once having my breakfast coming again to
    read more news.

  14. pharmacycatalog2014.com said,

    2014年September23日 at 11:54

    I’m truly enjoying the design and layout of your website.

    It’s a very easy on the eyes which makes it
    much more pleasant for me to come here and visit more often. Did you
    hire out a designer to create your theme? Excellent work!

  15. Delores said,

    2017年March14日 at 13:03

    Do you mind if I quote a couple of your articles as long as I provide credit and sources back
    to your site? My website is in the very same area of interest as yours and my
    visitors would definitely benefit from a lot of the information you
    present here. Please let me know if this alright with you.
    Regards!

  16. Google said,

    2017年March24日 at 19:02

    Google…

    Here are some links to web sites that we link to due to the fact we assume they’re worth visiting….

  17. Search Marketing said,

    2017年March26日 at 20:15

    Search Marketing…

    […]we came across a cool web-site that you simply may well take pleasure in. Take a search in case you want[…]…

  18. mobile forex app said,

    2017年March26日 at 22:17

    mobile forex app…

    […]Wonderful story, reckoned we could combine a number of unrelated data, nevertheless definitely worth taking a appear, whoa did 1 find out about Mid East has got extra problerms too […]…

  19. amazon product rankings said,

    2017年March27日 at 11:50

    amazon product rankings…

    […]one of our guests lately encouraged the following website[…]…

  20. Bluetooth Gaming Headset said,

    2017年March28日 at 4:09

    Bluetooth Gaming Headset…

    […]although internet sites we backlink to below are considerably not related to ours, we feel they are truly really worth a go by means of, so possess a look[…]…

Leave a Comment

*
To prove you're a person (not a spam script), type the security text shown in the picture. Click here to regenerate some new text.
Click to hear an audio file of the anti-spam word