Seeing as the robots.txt file can only block by folder, you just need to specify the literal path to the folder you want to include.
Example:
Disallow: /modules/Gallery/
to disallow the Gallery module file folder from being indexed.
However, this won't stop bad bots that ignore the robots.txt file altogether, and there are quite a few. Including quite a few email harvester bots. The best way to stop these is by using mod rewrites to send them to another site/page altogether. or by placing custom environment handlers in your .htacess file to block them, but you will suffer a performance hit doing it like this.
A tutorial on this has been posted here before, do a search on the forum and see if you can find it.
Also do a search on the net and look up what ways you can block bad spider-bots from getting anywhere near your site, let alone certain modules.
_________________ "Sic vis pacem para bellum!"
RAF71_Hornet / GibsonXXI
Imago Captain
Joined: Jan 17, 2003
Posts: 629
Location: Europe
Posted:
Sat Mar 19, 2005 9:01 am
I am using robots.txt two years now and closely monitoring the sites. So far no problems with bad bots. Better run this risk than overloading the CPU with tons of rules to folow from .htaccess
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum