Robots.txt Disallow /posts/ ???

Sylvain · Oct 11, 2013

I use XenForo 1.2.2 and I've noticed that many Xenforo forums disallow the /posts/ in their robots.txt files. Why not let robots crawl the /posts ?

User-agent: *
Disallow: /posts/

Programmers World · Oct 12, 2013

That's strange. I would think that it would be extremely useful to let robots crawl your posts.

Sylvain · Oct 12, 2013

There must be a good reason.

I also found that many forums disallow all pages in their robots.txt file

Cerberus · Oct 12, 2013

C&P of here... Which is a pretty good one

Code:

User-agent: *
Disallow: /misc/
Disallow: /help/
Disallow: /search/
Disallow: /register/
Disallow: /login/
Disallow: /online/
Disallow: /lost-password/
Disallow: /account/
Disallow: /admin.php
Disallow: /events/birthdays/
Disallow: /events/monthly
Disallow: /events/weekly
Disallow: /goto/
Disallow: /help/
Disallow: /login/
Disallow: /media/keyword/
Disallow: /media/user/
Disallow: /media/service/
Disallow: /media/submit/
Disallow: /misc/style?*
Disallow: /misc/quick-navigation-menu?*
Disallow: /online/
Disallow: /forums/7/
Disallow: /forums/20/
Disallow: /forums/70/
Disallow: /forums/49/
Disallow: /forums/155/
Disallow: /forums/156/
Disallow: /forums/184/
Disallow: /forums/200/
Disallow: /forums/188/
Disallow: /forums/186/
Disallow: /forums/187/
Disallow: /forums/189/
Disallow: /forums/191/

Allow: /

Sitemap: http://admin-talk.com/sitemap/sitemap.xml.gz
Sitemap: http://dir.admin-talk.com/sitemap.xml

Sylvain · Oct 13, 2013

Pretty much similar to my file

User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /goto/
Disallow: /login/
Disallow: /admin.php
Disallow: /search/
Disallow: search.php
Disallow: /help/
Disallow: /members/
Disallow: /misc/
Disallow: /online/
Allow: /
User-agent: ia_archiver
Allow: /
User-agent: BoardTracker
Disallow: /
User-agent: BoardReader
Disallow: /
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /

MyDigitalpoint · Oct 28, 2013

Well, there is always the chance that one tweak robots.txt to suit particular needs, problem could be that at every forum settings update, the robots file can be modified on the fly.

I remember to have a long, long disallow list including search engines, crawlers and bots, but sometimes I wonder if all of those abide by the rules and refrain from crawl what is on-site.

I read some time ago that the best way to prevent bots from crawling directories or files that we don't want to get crawled is making no reference to them in robots text to avoid they know about their existence... unless they are linked to other with crawling allowance.

BamaStangGuy · Oct 28, 2013

Read this: http://xenforo.com/community/threads/posts-urls-being-indexed-by-google.48729/

I don't use robots.txt at all with xenForo. I let it index what it wants and let the built in SEO for xenForo do what it is suppose to do.

GTB · Oct 28, 2013

Well the idea of a robots.txt file is to stop certain bots that obey it, to not waste time looking at files that don't need indexing to speed things up for bots. I wouldn't go daft adding loads of entries, but some files and folders are obvious for blocking. Cache folders and files like install and config.php files e.t.c

It can help a little blocking those obvious files and folders from being indexed. You can also use it to limit crawl numbers, which can save you on bandwidth.

Sylvain · Oct 28, 2013

BamaStangGuy said: ↑

Read this: http://xenforo.com/community/threads/posts-urls-being-indexed-by-google.48729/

I don't use robots.txt at all with xenForo. I let it index what it wants and let the built in SEO for xenForo do what it is suppose to do.
Click to expand...

No robots.txt means you let all robots crawl all they can?? Is your forim well referenced in Google?

BamaStangGuy · Oct 28, 2013

Sylvain said: ↑

No robots.txt means you let all robots crawl all they can?? Is your forim well referenced in Google?
Click to expand...

Yes. Though today I did spot a instance where /posts/ were being indexed. Will add it to that thread.

deansaliba · Oct 29, 2013

I have noticed that more and more web sites and blogs are using the robot.txt file recently. I n the last month I have lost count of how many times I have looked for something in Google and found a site listed with a description saying that robot.txt had blocked the search engine from crawling the site. I didn't think they would index those sites if they were restricted by robot.txt.

Sylvain · Oct 29, 2013

Some robots bypass the robots.txt file

pixelek · Oct 30, 2013

BamaStangGuy said: ↑

(...)
I don't use robots.txt at all with xenForo. I let it index what it wants and let the built in SEO for xenForo do what it is suppose to do.
Click to expand...

its not very wise thing to do. Of course its none of my business but, I wouldnt do that. Some of your users may post sensitive data and - by means of not using robots.txt (=letting robots crawl your site) - these data get revealed. Its strange policy of yours.

BamaStangGuy · Oct 30, 2013

pixelek said: ↑

its not very wise thing to do. Of course its none of my business but, I wouldnt do that. Some of your users may post sensitive data and - by means of not using robots.txt (=letting robots crawl your site) - these data get revealed. Its strange policy of yours.
Click to expand...

I have no idea how a robots.txt is going to protect my members from themselves?

MyDigitalpoint · Oct 30, 2013

Someone said that the golden rule to know if a file like robots.txt should be used or not is typing the address of a huge company followed by robots.txt, like in this particular case, and here you go,

http://www.google.com/robots.txt
http://www.nyc.com/robots.txt
http://www.microsoft.com/robots.txt
http://www.adobe.com/robots.txt

And so on...

Sylvain · Oct 31, 2013

I checked my competitors.... some uses robots, some others don't

cpvr · Nov 27, 2013

Sylvain said: ↑

I checked my competitors.... some uses robots, some others don't
Click to expand...

I don't have any competitors and when I did most of them used robots.txt to block spiders from crawling their content.

JoeyJ · Nov 29, 2013

Disallowing directories also makes your site less vulnerable (to an extent). You'd be surprised how much informtion a robot can disclose about your website. It can index vulnerable files or sensitive directories, or even show parameters that could lead to MySQL injection.

Log in or Sign up

Robots.txt Disallow /posts/ ???

Sylvain Regular Member

Programmers World Regular Member

Sylvain Regular Member

Cerberus Admin Talk Staff

Sylvain Regular Member

MyDigitalpoint Regular Member

BamaStangGuy Administrator

GTB Regular Member

Sylvain Regular Member

BamaStangGuy Administrator

deansaliba Regular Member

Sylvain Regular Member

pixelek Regular Member

BamaStangGuy Administrator

MyDigitalpoint Regular Member

Sylvain Regular Member

cpvr Regular Member

JoeyJ Regular Member

Share This Page

Log in or Sign up

Robots.txt Disallow /posts/ ???

Sylvain Regular Member

Programmers World Regular Member

Sylvain Regular Member

Cerberus Admin Talk Staff

Sylvain Regular Member

MyDigitalpoint Regular Member

BamaStangGuy Administrator

GTB Regular Member

Sylvain Regular Member

BamaStangGuy Administrator

deansaliba Regular Member

Sylvain Regular Member

pixelek Regular Member

BamaStangGuy Administrator

MyDigitalpoint Regular Member

Sylvain Regular Member

cpvr Regular Member

JoeyJ Regular Member

Share This Page

Useful Searches