Spiders you block

Discussion in 'Managing Your Online Community' started by Michael, Jan 10, 2010.

  1. Michael

    Michael Regular Member

    Joined:
    Jan 18, 2004
    Messages:
    166
    Likes Received:
    35
    If you block any spiders in robots.txt or by IP which ones are they?

    I have had quite a lot of baiduspiders or harvesters beginning to get on my nerves recently and I am considering completely blocking them, also Yandex seems to be quite a nuisance as of late, is there any benefit allowing these to crawl your content?

    Thanks
     
  2. Paul M

    Paul M Dr Pepper Addict

    Joined:
    Jun 16, 2009
    Messages:
    449
    Likes Received:
    136
    Location:
    Nottingham, UK
    Nope. Dont block any.

    How does a spider get on your nerves ??
     
  3. Noles

    Noles Adept

    Joined:
    Dec 2, 2009
    Messages:
    116
    Likes Received:
    10
    First Name:
    Ryan
    Even if there isn't a benefit, I'm wondering the same thing as Paul.
     
  4. Michael

    Michael Regular Member

    Joined:
    Jan 18, 2004
    Messages:
    166
    Likes Received:
    35
    By constantly crawling the site with at least 100-200 bots all online at once causing high server load for us when it really isnt necessary as well as ignoring robots.txt rules.
     
  5. quantnet

    quantnet Newcomer

    Joined:
    Oct 27, 2009
    Messages:
    13
    Likes Received:
    0
    I have seen lately that my site has been visited by hundreds of these spiders from this ip 94.102.63.60

    Anyone know how I can hide this spider on Who's online
     
  6. Brandon

    Brandon Regular Member

    Joined:
    Jun 1, 2009
    Messages:
    6,602
    Likes Received:
    1,706
    Location:
    Topeka, Kansas
    First Name:
    Brandon
    I welcome all bots
    if my hosting can't handle the traffic for spiders, I would get better hosting ;)
     
  7. Michael

    Michael Regular Member

    Joined:
    Jan 18, 2004
    Messages:
    166
    Likes Received:
    35
    Even if they ignore your robots.txt rules? Surely any robot that ignores robots.txt shouldnt be able to crawl your site and isnt behaving as it should. Our hosting is capable of handling the bots but not when theyre on for 6+ hours and are all from yandex and baidu.

    @quantnet you may want to ban them in htaccess by IP if its just that one IP.
     
    2 people like this.
  8. Brandon

    Brandon Regular Member

    Joined:
    Jun 1, 2009
    Messages:
    6,602
    Likes Received:
    1,706
    Location:
    Topeka, Kansas
    First Name:
    Brandon
    I still hold to my reply
    all bots are welcome on my site, make your forums private if you don't want them on it ;)
    and maybe you have your robots.txt file setup wrong? it will take a few days/weeks to clear out results if you've recently edited the robot.txt file.
     
  9. Michael

    Michael Regular Member

    Joined:
    Jan 18, 2004
    Messages:
    166
    Likes Received:
    35
    Oh, I had it in my head they downloaded it every few hours or so lol well I have decided to completely block yandex and baidu.
     
  10. Dan Hutter

    Dan Hutter aka Big Dan

    Joined:
    Jul 20, 2006
    Messages:
    1,412
    Likes Received:
    515
    Location:
    New York
    I believe I've blocked boardtracker. They basically monetize your content on their site. Almost like a digg for forums, I don't get the point other than making the owner money and at one point a couple years ago BT outranked my site for my own content, I didn't take very well to that.

    I just jumped on the site it looks to of changed significantly from when it first came on my radar but I'm still not sure about it.
     
    2 people like this.
  11. Ryan Ashbrook

    Ryan Ashbrook Regular Member

    Joined:
    Jun 29, 2009
    Messages:
    343
    Likes Received:
    25
    Location:
    Cincinnati, Ohio
    I don't block any spiders. I don't see a reason too.
     
    2 people like this.
  12. vlauria

    vlauria Addict

    Joined:
    Nov 25, 2009
    Messages:
    51
    Likes Received:
    15
    First Name:
    Vincent
    I 2nd that. Bots should be welcomed to your site, usually the main goal is to index your content and show it to other people so hopefully it becomes another channel of content to you.

    Now if they were copying your content and not linking to you that would be a different story - and you should contact them rather then just blocking outright (i've done that in the past and they just stopped crawling us)

    And if your site can handle bots crawling, upgrading to another host would be a good idea. I might even suggest upgrading to forum software that is more optimized at turning out pages - if bots are hurting your performance than that means each additional person on your site is hurting performance as well - and you should try to avoid that.
     
    2 people like this.
  13. cheat-master30

    cheat-master30 Grand Master

    Joined:
    Jul 30, 2009
    Messages:
    789
    Likes Received:
    59
    I do not block any spiders on my site. Sure, some like Yahoo's spiders come in ridiculous amounts sometimes, but I simply limit how fast they view the site to stop all that problem, and it's far better for any practical purposes than blocking them. I don't block the Internet Archive bot unlike some people I know, since I actually like being able to see what a website looked like a few years back, and it infuriates me to see all these sites blocking off their site's history purely for the sake of a tiny bit of bandwidth.
     

Share This Page