is there a phpnuke friendly mod that builds sitemap.xml (and subordinate xml files) for the new index scheme that Google has just started implementing?
_________________ If you shoot for the moon and miss, you'll still be amongst the stars.
sixpack Lieutenant
Joined: Oct 20, 2004
Posts: 165
Posted:
Mon Aug 01, 2005 7:51 am
I went through this just recently and used a few different things. I made a post about what I found that worked and links to a few free tools to get the job done. Check it out Free Tools to Create XML Sitemaps for Google Sitemaps Beta Goodluck
Steptoe Captain
Joined: Oct 10, 2004
Posts: 562
Posted:
Mon Aug 01, 2005 11:09 am
I have beeh messing with a few site map generators over the last month..not being a coder and not realy understanding what Im doing I found it rather confusing
Yest came across this
http://johannesmueller.com/gs/
It takes a while to crawl the site, even on LAN...after 3 crawls using the filters It created a sitemap of only what I wanted. I crawled on a p3 512 meg ram...it works realy hard, and does an excellent job...ended up with a sitemap 70k of a site with approx 1200 posts, just over 200 members. Still took a few hrs to creal thu...just doing its job.
With the filters one can take out the reply, new post, account, and many other similar links.
Snoboreders Nuke Soldier
Joined: Jun 30, 2005
Posts: 31
Posted:
Tue Aug 02, 2005 7:26 pm
What paramaters would you remove? If I dropped "sid" do you think it would recognize that as sessions ID? Also, since the random_num pages were aborted, does that mean when Google crawls my site, it won't time out on those pages?
sixpack Lieutenant
Joined: Oct 20, 2004
Posts: 165
Posted:
Sat Aug 06, 2005 6:28 pm
sid would be good to filter as well as reply, mark, search etc... as far as the other question I am not sure what you are asking.. random_num?
I have installed a sitemapper and I spidered my own site using different bots with submitexpress.com.. I have a problem... My php Nuke is installed in a Folder called Nuke2 instead of the Main Directory whenever the Spider Shows the me the results... It brings up a list of bad url without the Nuke for example
http://studio505.net/modules.php?name=Journal
Should be http://studio505.net/Nuke2/modules?php=Journal
Something in the PHP nuke Site somwhere is sending these wrong URLS and sending the Database Engine Searches to dead ends... I though Installing the Sitemapper would fix it but it Didnt, what do I do?
Heres the Fucked Up thing, After I installed sitemapper.php in the Nuke2 Folder which is the PHP nuke Root Folder ( installed the Site Mapper to Correct this Issue)... IT SHOWED UP ON THE CRAWLERS! But Its name too was fucked up as well... thus sending Crawlers to a Page Cannot Be found...
Example what showed is
http://studio505.net/sitemapper.php
(should have been)
http://studio505.net/Nuke2/sitemapper.php
What is causing this?
Snoboreders Nuke Soldier
Joined: Jun 30, 2005
Posts: 31
Posted:
Thu Aug 18, 2005 8:39 am
Sorry RockDrala, I can't answer that one.
Here's what is under my "Remove Parameters"
gfx
orderby
osCsid
PhpSessId
PhpSessionID
random_num
Session
SessionID
SID
XTCsid
Here's what's under my "Drop parts"
ratenum
ratetype
Oh yeah I'm running Nuke Platinum. I've noticed the Googlebot and MSNbot are at my site every day. It's too bad Google hasn't updated their pagerank because I'm still at 0 (it was indexed about a 40 days ago).
Rockdrala Sergeant
Joined: Aug 09, 2005
Posts: 97
Posted:
Thu Aug 18, 2005 10:02 am
Can anyone tell me whay the hell my site is generating bad urls for crawlers? This is really pissing me off!
Maybe its some sick joke by the what his face that created php-nuke...
Here is a clue... The bot im using is the Meta Tag Analyzer from submitexpress.com You can choose if you want google bot results or spiders any bot you want. You do it all online... Great Tool
This BAD URLS have to generated FROM MY Site becuase its not Finding the Actual URL on the crawl results...
The Bots are Finding Acutal Names Page Names SOMEWHERE or it wouldnt have shown the sitemapper.php after I installed it.
So the answer is what is Cutting out the Nuke2 directory?
Steptoe Captain
Joined: Oct 10, 2004
Posts: 562
Posted:
Thu Aug 18, 2005 11:39 am
I think this is the problem, thu I do not understand the whys.
1/Google doesnt like sids and long urls
2/There for google doesnt like going beyond the links on the front page.
3/I think google tap still needs to be installed to take care of long urls and make urls like www.yoursite/forums.html
4/Even then google has trouble getting into forums, (something to do with SIDS ??) thu if latest forum block is on the front page these are crawled
5/Google does get to downloads, web links ok
6/Other engines like MSN, etc will crawl ok
I dont think google like to many urls on the front page..eg links to news posters details, links to new members details, and other similar stuff
7/ google and (most engines)doesnt like the user info that has visitor ips with xxx replacing the last ip numbers (replace this with zzz)
8/Google says the site map will not neccaryaly increases rankings but will crawl more parts of the site..it does this once or twice then stops at index page again.
9/Somehow I think pages/posts need dynamic meta tags for description of posts/threads??
Our site was in in the top 5 of subject on google, and out of 19 subject +parameters (kakariki + ) 12 where also in the top 10S
So unlike most trying to get up there, my playing has been from the top down.
I accidentally messed my meta tags...dont believe that google doesnt rate these very high..over night the main subject parameter stayed up, ALL the rest dropped below 100 and 200 rank! After 2 weeks they are slowly coming up...other engines didnt drop as much and come up faster again.
I dont have the answers , just observations.
Rockdrala Sergeant
Joined: Aug 09, 2005
Posts: 97
Posted:
Thu Aug 18, 2005 1:31 pm
Steptoe, I appreciate the info, I want you to check this out... go to www.submitexpress.com and choose the fee site meta tag anaylazer...
Select any Crawler, Google, Spiders, Etc and take a look, it is not just google but any crawler...
Its cutting out the Nuke2 directory in every URL found... do every URL goes to a dead end... Is this perhaps a meta engine somewhere hidden in the PHP Nuke?
Shure I could just throw a static html up with a sitemap and a redirect but seriously, we should know this for future reference of future PHP Nuke Isntallations... I bet that anyone else who has installed the 7.7 Like me.. and put it in a sub driectory,...who ran the same test I did with the online crawlers will see they have have the same results.
I may be a little crazy to spider my own sites just to see the results But I like to know what the Bot is seeing.. and right now its just seeing a bunch dead Ends.... Wouldnt you want to see what the hell the bots are seeing?
Where is this URL Generating engine hidden in phpnuke?!?!?!!?!
Thanks Steve
Steptoe Captain
Joined: Oct 10, 2004
Posts: 562
Posted:
Thu Aug 18, 2005 2:21 pm
www.studio505.net/nuke2
I dont know much about setting up subdomains...
But somehow , how yours is setup seems wrong???
Shouldnt it be something like www.nuke2.studio505.net/ and setup up like that in your virual hosts in the file httpd.conf in apache?
Rockdrala Sergeant
Joined: Aug 09, 2005
Posts: 97
Posted:
Thu Aug 18, 2005 5:02 pm
Apache functions are not avaible on my server... so I might as well kiss mod_rewrite and nuke sentinal and stuff like that goodbye.. everything I use is just core scripts.. and phpmyadmin... thays why im looking to fix the engine its self.. with out having to resort to additional programs.
softplus Nuke Cadet
Joined: Aug 25, 2005
Posts: 2
Location: Switzerland
Posted:
Thu Aug 25, 2005 12:37 am
Hi Rockdrala
I think there's a reason for the sub-optimal results:
Your site is probably the single most broken website I have ever seen (and I've seen a lot lately, testing my GSiteCrawler ). You have multiple head/body/html sections, you seem to be mixing several sites all into one single giant page.
There is no way any search engine will be able to get good results from your site, it's amazing it even renders in my browser. Sorry, that sounds really mean, but it isn't meant that way - it's just a real mess...
If I were you, I would try to clean it up a bit, at least make sure that you have a valid structure around it - even if the contents of the body section doesn't validate (it would be better, but is not a requirement). Make sure your page starts with either a doctype or a html-section, keep a single head-section and a single body section. Any search engine that comes along now will possibly pick a "random" one of your head-sections, possibly skipping your meta-tags (title, description, keywords) and possibly even just giving up trying to read the page. I know for sure that Google is getting very strict about the quality of the sites that they list..
Once you have that done, you can get to work with sitemaps, etc.. Before you clean up the code it is a waste of time as Google will only mark you up as "potentially bad" and put you on the back-burner regarding crawling.
Also, regarding the crawlers going from http://www.studio505.net/nuke2 to http://www.studio505.net/[bla bla] - that is an error in the crawler software (I suppose the test-link you posted isn't a very smart - standard-conform - crawler, usually that doesn't matter). It thinks that "nuke2" is a file instead of a directory, i.e. it thinks it needs to place all new links below http://www.studio505.net/ instead of in /nuke2/. It's a simple mistake, but the server headers (which you don't see) tell the browser/crawler, that it's a directory. But it's no big deal, Google + co. will do it right. You can confirm that using for example my GSiteCrawler, which finds the URLs correctly. (however your server is so slow that I didn't crawl the complete site ).
Hope that helps!
John
PS There are very many sites out there that are not valid HTML, but that is no reason to do this for your site as well. You want as many good points in your favor as possible for the search engines; get to work
Edit: The meta-tag checker you posted doesn't respect subdirectories at all. Dont use it
_________________ Try the GSiteCrawler for Google Sitemap files!
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum