Web Design Forum: Sitemap XML for BIG websites - Web Design Forum

Jump to content

WDF
WDF Premium Memberships Reseller Hosting
Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Sitemap XML for BIG websites Need a solution for auto XML gen on ASP for BIG sites Rate Topic: -----

#1 User is offline   sussextech 

  • Dedicated Member
  • PipPip
  • Group: Members
  • Posts: 124
  • Joined: 06-April 08
  • Reputation: 6
  • Gender:Male
  • Location:South Coast England
  • Experience:Advanced
  • Area of Expertise:SEO

Posted 25 January 2012 - 02:32 PM

Hi all! Wow, been a while since I posted on here ....

Does anyone have any idea of some server side software that will automatically generate a new sitemap.xml file (and automatically remove from the XML if a page is deleted) for websites with over 10,000 pages?

With over 10,000 that means likely index files with GZIP sitemap.xmls will be required, so this functionality is also important.

If anyone has ideas, would love to hear them.

Found a few already, but wanted more ideas!

Thanks in advance!
0

#2 User is online   rallport 

  • Web Guru
  • PipPipPipPipPip
  • Group: Members
  • Posts: 3,818
  • Joined: 03-January 10
  • Reputation: 266
  • Gender:Male
  • Location:England, UK
  • Experience:Advanced
  • Area of Expertise:Web Developer

Posted 27 January 2012 - 10:22 PM

Not sure why you'd need "software" to do this. Simply output the contents of the XML file based on a database query and and set the correct header.

It's best to set GZIP at a the htaccess level for certain file types.

Or, are you talking about about an XML file compressed using gz ? E.g. .xml.gz

As for size, you must have a huge site, as iirc Google allows for either 50,000 urls or ~10MB in size (uncompressed) - not sure what the significance of the 10,000 you mention is.
0

#3 User is offline   Sogo7 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 421
  • Joined: 02-February 11
  • Reputation: 42
  • Gender:Male
  • Location:Camarthen
  • Experience:Intermediate
  • Area of Expertise:Designer/Coder

Posted 28 January 2012 - 05:42 AM

I'm with 'rallport' on this, even if it's not a database driven site a relatively simple php script is capable drilling down through the folder structure and listing all the pages needed to create an xml sitemap.

Of course getting google to actually index all the pages is another problem alltogether.
0

#4 User is online   rallport 

  • Web Guru
  • PipPipPipPipPip
  • Group: Members
  • Posts: 3,818
  • Joined: 03-January 10
  • Reputation: 266
  • Gender:Male
  • Location:England, UK
  • Experience:Advanced
  • Area of Expertise:Web Developer

Posted 28 January 2012 - 04:47 PM

View PostSogo7, on 28 January 2012 - 05:42 AM, said:



Of course getting google to actually index all the pages is another problem alltogether.


To be honest, a well structured website, however big shouldn't need a sitemap containing every single page.
0

#5 User is offline   sussextech 

  • Dedicated Member
  • PipPip
  • Group: Members
  • Posts: 124
  • Joined: 06-April 08
  • Reputation: 6
  • Gender:Male
  • Location:South Coast England
  • Experience:Advanced
  • Area of Expertise:SEO

Posted 28 January 2012 - 05:32 PM

Hi Rallport - long time no speak!

Correct about the xml.gz compression. We're talking well over 50,000 URLs here. So I wanted to know if there is software available (or equivalent, anything that solves the issue) that can compress 50k URLs and more automatically. It then needs to be able to recognise new pages and page deletion and adjust the sitemap.xml appropriately.

Any ideas are welcome!

Thanks Sogo7 for the input, yes, getting them indexed is a different matter.

Regarding your other post unfortunately Rallport, it's not always that simple. It's good practice to provide an XML sitemap and especially in the case of much larger sites, it can help increase search visibility and provides another way to specify canonical URLs.

Cheers again! Any help on the sitemap front will be useful!
0

#6 User is offline   Sogo7 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 421
  • Joined: 02-February 11
  • Reputation: 42
  • Gender:Male
  • Location:Camarthen
  • Experience:Intermediate
  • Area of Expertise:Designer/Coder

Posted 29 January 2012 - 12:31 PM

This little thread's been tugging on my addled synapses as I've
got a demo project with an unusually large number of pages.


It took a while before the penny dropped and I realised that the needs of an SEO consultant are slightly different. Whilst a competant webmaster or web dev would tack in an auto sitemap creation & submission script, this is however a 'fit & forget' operation and as such can only be billable once.

SEO by contrast is an ongoing month by month process so your prefered solution would be in the realms of a remote 'Software as a Service' or as a standalone package (operated by the client) that can be licensed/sold with a built in expiry date. Thus some of SEO 'Voodoo' secrets are kept away firmly away from the client and the potential probably goes way beyond merely dynamic sitemapping/ auto submission.

In terms of coding this would be akin to running running ones own search bot/ spider that fetches remote pages and follows all the links it can find. I dare say many such scripts exist allready you could run on your own server that could be adapted and added to, the only niggle being the bandwidth required for a large client site (or multiple smaller ones) but this could be offset with the aid of a Googles own app engine. Where it chews on the bulk of the page data and spits out the links to your server.

As for a standalone package that's not really my territory, though there are some compilers around that will convert PHP to executable files. Personally not had a lot to do with them yet and a lot of feedback seems to suggest they are a little flaky.

There is one however that you may find useful called 'Djuggler' it's purpose built for content retrieval, very easy to use and has the ability to produce an exe file to do your bidding. The free personal edition has some features disabled and a limit on how many lines of instructions one can use, however once you know what you are doing grab a thirty day demo of the full version.
0

#7 User is offline   FizixRichard 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 325
  • Joined: 05-October 07
  • Reputation: 47
  • Gender:Male
  • Location:Market Deeping, England
  • Experience:Advanced
  • Area of Expertise:Web Designer

Posted 31 January 2012 - 01:01 PM

View Postrallport, on 28 January 2012 - 04:47 PM, said:

To be honest, a well structured website, however big shouldn't need a sitemap containing every single page.


This, if you have a huge website, lets say a news site. Just listing the main pages, categories and latest/featured news articles along with any other "important" content should suffice.

The sitemap should contain the "important" stuff, I use quotes for important as what is termed important will vary from site to site.

If you dump a 10,000 link sitemap Google isn't going to index them all.

This post has been edited by FizixRichard: 31 January 2012 - 01:03 PM

0

#8 User is online   rallport 

  • Web Guru
  • PipPipPipPipPip
  • Group: Members
  • Posts: 3,818
  • Joined: 03-January 10
  • Reputation: 266
  • Gender:Male
  • Location:England, UK
  • Experience:Advanced
  • Area of Expertise:Web Developer

Posted 31 January 2012 - 05:41 PM

View Postsussextech, on 28 January 2012 - 05:32 PM, said:



Regarding your other post unfortunately Rallport, it's not always that simple. It's good practice to provide an XML sitemap and especially in the case of much larger sites, it can help increase search visibility and provides another way to specify canonical URLs.



It really is and it's exactly what Google is looking for - well structured websites :)

I have a site with albeit less products, ~20,000 - In the sitemap I listed all the category pages and static content pages and a small sample of products (500) - all products were indexed no problem. Also, that site was well linked internally too.

Granted, this site had some authority on Google already (PR 4) but all products were indexed.

I think you, like lots of people on this forum, are still living by some of "seo myths" that were popular a few years ago. Google wants easy to find, well structured and quality content - end of :)
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users