Robots.txt Disallow /posts/ ???

Discussion in 'SEO, Traffic and Revenue' started by Sylvain, Oct 11, 2013.

  1. Sylvain

    Sylvain Regular Member

    140
    17
    364
    I use XenForo 1.2.2 and I've noticed that many Xenforo forums disallow the /posts/ in their robots.txt files. Why not let robots crawl the /posts ?

    User-agent: *
    Disallow: /posts/
     
  2. Programmers World

    Programmers World Regular Member

    46
    3
    29
    That's strange. I would think that it would be extremely useful to let robots crawl your posts.
     
  3. Sylvain

    Sylvain Regular Member

    140
    17
    364
    There must be a good reason.

    I also found that many forums disallow all pages in their robots.txt file
     
  4. Cerberus

    Cerberus Admin Talk Staff

    1,031
    500
    818
    C&P of here... Which is a pretty good one

    Code:
    User-agent: *
    Disallow: /misc/
    Disallow: /help/
    Disallow: /search/
    Disallow: /register/
    Disallow: /login/
    Disallow: /online/
    Disallow: /lost-password/
    Disallow: /account/
    Disallow: /admin.php
    Disallow: /events/birthdays/
    Disallow: /events/monthly
    Disallow: /events/weekly
    Disallow: /goto/
    Disallow: /help/
    Disallow: /login/
    Disallow: /media/keyword/
    Disallow: /media/user/
    Disallow: /media/service/
    Disallow: /media/submit/
    Disallow: /misc/style?*
    Disallow: /misc/quick-navigation-menu?*
    Disallow: /online/
    Disallow: /forums/7/
    Disallow: /forums/20/
    Disallow: /forums/70/
    Disallow: /forums/49/
    Disallow: /forums/155/
    Disallow: /forums/156/
    Disallow: /forums/184/
    Disallow: /forums/200/
    Disallow: /forums/188/
    Disallow: /forums/186/
    Disallow: /forums/187/
    Disallow: /forums/189/
    Disallow: /forums/191/
    
    Allow: /
    
    Sitemap: http://admin-talk.com/sitemap/sitemap.xml.gz
    Sitemap: http://dir.admin-talk.com/sitemap.xml
     
  5. Sylvain

    Sylvain Regular Member

    140
    17
    364
    Pretty much similar to my file

    User-agent: *
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /goto/
    Disallow: /login/
    Disallow: /admin.php
    Disallow: /search/
    Disallow: search.php
    Disallow: /help/
    Disallow: /members/
    Disallow: /misc/
    Disallow: /online/
    Allow: /
    User-agent: ia_archiver
    Allow: /
    User-agent: BoardTracker
    Disallow: /
    User-agent: BoardReader
    Disallow: /
    User-agent: Baiduspider
    User-agent: Baiduspider-video
    User-agent: Baiduspider-image
    Disallow: /
     
  6. MyDigitalpoint

    MyDigitalpoint Regular Member

    114
    30
    349
    Well, there is always the chance that one tweak robots.txt to suit particular needs, problem could be that at every forum settings update, the robots file can be modified on the fly.

    I remember to have a long, long disallow list including search engines, crawlers and bots, but sometimes I wonder if all of those abide by the rules and refrain from crawl what is on-site.

    I read some time ago that the best way to prevent bots from crawling directories or files that we don't want to get crawled is making no reference to them in robots text to avoid they know about their existence... unless they are linked to other with crawling allowance.
     
  7. BamaStangGuy

    BamaStangGuy Administrator

    769
    549
    518
  8. GTB

    GTB Regular Member

    1,791
    270
    762
    Well the idea of a robots.txt file is to stop certain bots that obey it, to not waste time looking at files that don't need indexing to speed things up for bots. I wouldn't go daft adding loads of entries, but some files and folders are obvious for blocking. Cache folders and files like install and config.php files e.t.c

    It can help a little blocking those obvious files and folders from being indexed. You can also use it to limit crawl numbers, which can save you on bandwidth.
     
    Last edited: Oct 28, 2013
  9. Sylvain

    Sylvain Regular Member

    140
    17
    364
  10. BamaStangGuy

    BamaStangGuy Administrator

    769
    549
    518
    Yes. Though today I did spot a instance where /posts/ were being indexed. Will add it to that thread.
     
  11. deansaliba

    deansaliba Regular Member

    51
    15
    64
    I have noticed that more and more web sites and blogs are using the robot.txt file recently. I n the last month I have lost count of how many times I have looked for something in Google and found a site listed with a description saying that robot.txt had blocked the search engine from crawling the site. I didn't think they would index those sites if they were restricted by robot.txt. :confused:
     
  12. Sylvain

    Sylvain Regular Member

    140
    17
    364
    Some robots bypass the robots.txt file
     
  13. pixelek

    pixelek Regular Member

    229
    85
    394
    its not very wise thing to do. Of course its none of my business but, I wouldnt do that. Some of your users may post sensitive data and - by means of not using robots.txt (=letting robots crawl your site) - these data get revealed. Its strange policy of yours.
     
  14. BamaStangGuy

    BamaStangGuy Administrator

    769
    549
    518
    I have no idea how a robots.txt is going to protect my members from themselves?
     
  15. MyDigitalpoint

    MyDigitalpoint Regular Member

    114
    30
    349
  16. Sylvain

    Sylvain Regular Member

    140
    17
    364
    I checked my competitors.... some uses robots, some others don't
     
  17. cpvr

    cpvr Regular Member

    3,220
    823
    918
    I don't have any competitors and when I did most of them used robots.txt to block spiders from crawling their content.
     
  18. JoeyJ

    JoeyJ Regular Member

    18
    10
    37
    Disallowing directories also makes your site less vulnerable (to an extent). You'd be surprised how much informtion a robot can disclose about your website. It can index vulnerable files or sensitive directories, or even show parameters that could lead to MySQL injection.
     

Share This Page