| Author |
Message |
GanjaUK
Sergeant


Joined: Jan 12, 2004
Posts: 82
|
Posted:
Thu Feb 12, 2004 10:36 am |
  |
I google tapped my site today, at present there is about 10 google bots on there. Now the bots seem to be hitting the forum, but only the odd thread here and there, i think they only followed the forum via the latest forum post block on the main page, the last number at the end of each line is how many pages the bot hit, as you can see most hit only 1 page, 3 tops, is this normal? I was hoping they would hit 100's of pages, but it does not seem to be happening.
| Code: |
Yes 64.68.82.135 crawler13.googlebot.com 2004-02-12 18:25:43 2
Yes 64.68.82.46 crawler11.googlebot.com 2004-02-12 18:23:30 1
Yes 64.68.82.25 crawler10.googlebot.com 2004-02-12 18:23:19 2
Yes 64.68.82.176 crawler14.googlebot.com 2004-02-12 18:22:11 2
Yes 64.68.82.159 crawler14.googlebot.com 2004-02-12 18:18:54 1
Yes 64.68.82.27 crawler10.googlebot.com 2004-02-12 18:18:48 1
Yes 64.68.82.208 crawler15.googlebot.com 2004-02-12 18:18:34 1
Yes 64.68.82.159 crawler14.googlebot.com 2004-02-12 18:17:37 2
Yes 64.68.82.54 crawler11.googlebot.com 2004-02-12 18:15:23 1
Yes 64.68.82.55 crawler11.googlebot.com 2004-02-12 18:15:15 3
Yes 64.68.82.199 crawler15.googlebot.com 2004-02-12 18:15:03 2
Yes 64.68.82.136 crawler13.googlebot.com 2004-02-12 18:14:17 3
|
|
|
|
    |
 |
Daniel-cmw
Site Admin


Joined: Mar 02, 2003
Posts: 1662
Location: The UK!
|
Posted:
Thu Feb 12, 2004 10:39 am |
  |
You GoogleTapped it today and already have bots there! Thats good for a start.
The bots will come and go, they index more and more each time but dont scan the whole site in one go. |
_________________ Read Me |
|
   |
 |
GanjaUK
Sergeant


Joined: Jan 12, 2004
Posts: 82
|
Posted:
Thu Feb 12, 2004 10:44 am |
  |
Theres 20 googlebots on the site now!
My IP tracking module only shows the true url they go to modules=blah blah and not the new .html ones, is there an updated IP tracking module so i know for sure they are indexing the new .html ones? |
_________________
 |
|
    |
 |
GanjaUK
Sergeant


Joined: Jan 12, 2004
Posts: 82
|
Posted:
Thu Feb 12, 2004 7:31 pm |
  |
Another thing, all tho i see what pages the googlebots spidered, when i search them on google they dont apppear, does it reject certain pages for whatever reason, or does it take a long while to index them on google after they spider them? |
_________________
 |
|
    |
 |
Daniel-cmw
Site Admin


Joined: Mar 02, 2003
Posts: 1662
Location: The UK!
|
Posted:
Fri Feb 13, 2004 12:44 am |
  |
It can take quite a while, from a week to a month. Be patient  |
_________________ Read Me |
|
   |
 |
madd
Private


Joined: Nov 23, 2003
Posts: 39
|
Posted:
Fri Feb 13, 2004 7:13 am |
  |
Will they ever slow down? I just posted a new topic a little while ago because I think I'm getting a DoS attack from Google lol... My services are Soooooo slow pages time out, I'm on a cable connection with 140k upload, and this just started since installing GoogleTap.... I really want google to be able to index me, but not at a cost of having no site for others to view.... |
|
|
   |
 |
Zhen-Xjell
Nuke Cops Founder


Joined: Nov 14, 2002
Posts: 5939
|
Posted:
Fri Feb 13, 2004 7:18 am |
  |
On this site it takes Google a couple hours to go thru. But this isn't the 'initial' scan. Its going on more than 24 hours now? |
_________________ Paul Laudanski, Microsoft MVP Windows-Security
CastleCops: [de] [en] [wiki] |
|
     |
 |
GanjaUK
Sergeant


Joined: Jan 12, 2004
Posts: 82
|
Posted:
Fri Feb 13, 2004 7:23 am |
  |
You are hosting your own site on your 140k cable connection? If so, its not surprising theres nothing left.
If you wanted to block the google bots for a time you could deny from 64.68.82.* in the .htaccess
You should consider getting a real webhost. |
|
|
    |
 |
madd
Private


Joined: Nov 23, 2003
Posts: 39
|
Posted:
Fri Feb 13, 2004 10:05 am |
  |
Well Its not a huge site, its just basically a web log... online journal type of site.. I dont get many hits... But I'd like the site available to search engines...
I have had no problems before, Google started chewing it up....
I dont want to BLOCK google completely, because they I wont be indexed at all...
Can I allow google to index ONLY my news feed, say? or some smaller aspect so google doesn't chew it all up??? |
|
|
   |
 |
GanjaUK
Sergeant


Joined: Jan 12, 2004
Posts: 82
|
Posted:
Fri Feb 13, 2004 10:09 am |
  |
I dont know if thats possible because google will follow all the URL's from the index page. |
Last edited by GanjaUK on Fri Feb 13, 2004 10:11 am; edited 1 time in total |
|
    |
 |
madd
Private


Joined: Nov 23, 2003
Posts: 39
|
Posted:
Fri Feb 13, 2004 10:11 am |
  |
Google needs to chew up that much ?? What about having google index only the news xml feed ?? |
|
|
   |
 |
GanjaUK
Sergeant


Joined: Jan 12, 2004
Posts: 82
|
Posted:
Fri Feb 13, 2004 10:25 am |
  |
Take a look at your robots.txt file.
User-agent: googlebot
or wildcard charcter "*" to specify all robots: User-agent: *
And the pages you dont want the bot to follow below that. |
|
|
    |
 |
madd
Private


Joined: Nov 23, 2003
Posts: 39
|
Posted:
Fri Feb 13, 2004 10:36 am |
  |
Thank you I'll check that as soon as I get home... Can't check it now with Term Services because I have no pipe to surf >o\
Thanks again I'll mess with it when I get home from work... |
|
|
   |
 |
GanjaUK
Sergeant


Joined: Jan 12, 2004
Posts: 82
|
Posted:
Fri Feb 13, 2004 10:36 am |
  |
np, hope it works. |
_________________
 |
|
    |
 |
madd
Private


Joined: Nov 23, 2003
Posts: 39
|
Posted:
Sat Feb 14, 2004 6:08 am |
  |
would this work,
| Code: |
User-agent: Googlebot
Disallow: admin.php
Disallow: /admin/
Disallow: /images/
Disallow: /includes/
Disallow: /themes/
Disallow: /blocks/
Disallow: /modules/
Disallow: /language/
Disallow: /images/
User-agent: *
Disallow: /*/
Disallow: /*.*/
|
To only allow googlebot, and no others? |
|
|
   |
 |
|
|