Get Listed Faster at Google and Yahoo – Sitemap.xml Files
Sitemap.xml files – the best way to get into the search engines
CAVEAT – This is a VERY long blog article. You may want to go grab a cup of coffee, your favorite chai tea or a venti sugar free vanilla latte before you sit down to read this article!
For those of you who don’t know what these are, they are NOT the “old school” Site Maps – a page on your web site that lists links to all the other pages on the site in one place. Instead, a sitemap.xml file is a plain text file that sits next to your site’s viewable pages on the web server, but it’s only seen and used by the search engines. It tells the search engines what pages on your site you want indexed, and it lets them know their order of importance as well as the general frequency by which to re-index those pages.
The great debate
One of the many debated topics in the SEO world is whether a web site even needs a sitemap.xml file, let alone whether you should use this in submitting your site, or that you even need to submit it at all. Some people believe that free site submission is all you need to do – without a sitemap.xml file – leave the work to the search engines and their automated indexing “bots”. Others believe that the web is so prolific (which it is) and search engine indexing “bots” are so good (which they are are not) that if you simply create your web site, eventually it will be found and indexed (which it may, if other sites link to you and if those links don’t have a “no-follow” tag, and if you don’t mind waiting upwards of FOREVER to be indexed!)
Directly Submitting Your Site – The Options
Currently Google, Yahoo and MSN (as well as all the other search engines) allow you to submit your site for inclusion by one or more various methods
- Search Submit Basic
- Paid Search Inclusion
- Sitemap Submit
Search Submit Basic
The free search submit systems – where you simply go to a web page at that search engine, and enter the URL of your home page, has been around forever (in Internet Search Engine time). Submitting a site this way means that Google or Yahoo’s, MSN’s or Asks or “Joes Generic Search engine” systems will “eventually” get around to going to that web address and automatically poking around – navigating through the various links you have on the site, trying to determine what pages there are and what to index.
It’s a hit and miss method, and can usually take weeks. Once it’s done, if that automated poking misses some links (for a number of reasons) or scans and indexes pages you may not want indexed, you end up with shoddy results.
Sadly, if you have ever read the fine print at the top search engines (I have as part of my work) you’ll discover that NONE of the top search engines actually guarantee that you’ll be listed, if they ever do get to scanning your site this way.
The following statement can be found at Google’s free search submit page:
“We do not add all submitted URLs to our index, and we cannot make any predictions or guarantees about when or if they will appear.“
Site Submit Tools
At this point I need to mention the plethora of tools, software and services (some free, some for a fee) that claim they’ll do the submit work for you. There are many such offerings – Submit Express, FreeWebSubmission, IBusinessPromoter, BlastEngine, INeedHits, and on and on and on…
Site Submit Tools – Worth the effort?
There’s a couple fundamental reasons that I do not use nor do I trust such solutions. First, every search engine has it’s own rules about how much to include in Page titles, Meta keyword and Meta description fields – many of these submit services try to force only one set within that service’s rules, to everywhere. Others make you fill out two, three or more sets of fields. Well what about my client’s 50 page web site – the one where I or my team spent 30 hours coming up with unique titles, keyword sets and desccriptions foe every page on the site? Do I have to submit all 50 pages separately now through your “we’ll submit it for you” system? Please – save me the agony!
In my experience (yours may vary), I would prefer that the search engines come to my client’s sites and take what they need, and leave the rest.
Site Submit Tools – Submit your site to 5,000 search engines for free!
Next, many of these solution providers claim they’ll submit your site to 100, 1,000 or even 5,000 search engines and directories. WOW – you’ll send my site info to 5,000 places, for free? That’s GREAT! I don’t have to do a thing and my site will become famous overnight! YAY!
Now, I don’t have any problems with “Jacinda Jones Search Engine” or “Petey’s little Web Directory” wanting to exist or even maybe some day overtaking Google as the number one source for web site listings. Really – this is what the Internet should be all about. Except I happen to know for a fact that many web sites that claim to be search engines or web directories are really cloaked money machines – set up really for the purpose of loading their site up with AdSense ads or so they can charge for inclusion. If a web site’s primary purpose is either of these, that’s a BAD thing – Google may even penalize your web site for being listed there!
SO again, personally, I would prefer to avoid the aggravation and the potential harm. You are, of course, free to use such services, at your own peril. I don’t have that luxury because I have a responsibility to my business web clients – legitimate business owners who desperately need to do all the positive things they can to compete in today’s Internet, without the risks that come with such services.
Paid Search Inclusion -Guaranteed
Some search engines, like Yahoo, offer you the ability to pay them for the privilege of being found in their search engine. Yes, that’s right – pay a one-time fee (or a one-time set up fee and an annual inclusion fee) and they might scan your site sooner, or maybe even actually index your site sooner.
HUH? Maybe? Might? Yes that’s correct. Yahoo “guarantees” that if you pay them, you’ll be indexed AND included. Except.
Except if your site doesn’t meet one or more of their twenty three (23) guidelines for non-inclusion. That’s right, they want to ensure that only worthwhile sites get indexed. A lofty goal. Except it’s murky – a gray area issue – and open to interpretation. So what if you are 100% sure you have a valid, legitimate site worthy of inclusion (based on your beliefs)? You don’t have to worry then – right?
I manage hundreds of web sites. Through my and my teams efforts, many of them come up on the first page of organic rankings at Google. Most of those same sites also come up on the first page at Yahoo and MSN as well. Except one site. That one site comes up on the first page of Google for over three dozen keyword phrases (we’ve done a LOT of work to get those results). So where does it come up in Yahoo? No-where. (The site IS indexed, but it does NOT come up in a search for ANY of those phrases. – Hasn’t for over a year.) Why? Because before I inherited the site, it was banned from Yahoo. One or more of those guidelines had been violated, apparently. I wrote email after email, made phone call after phone call, re-worked the entire site top to bottom to try and comply with those “gray” guidelines. (No, it’s not a porn site – no adult content at all actually – no profanity, no defamation of anyone, etc etc etc)…
The last paragraph on the Yahoo Search Submit page states:
Treatment of Paid Content
Yahoo! designed its Search Submit programs to improve the quality of its search databases and thereby enhance the search user experience. Therefore, URLs submitted via such programs are subject to these guidelines and any other additional guidelines or policies adopted by Yahoo! from time to time.
Note in that last sentence: “and any other additional guidelines or policies adapted by Yahoo! from time to time.”
That means that just because Yahoo offers a “Guaranteed” result, even their own Paid Submission page states that they can not guarantee they will place your site in the search engine. And that they can make up the rules as they go! So why pay for a service when you can get the work done without paying a fee – that way, if your site is acceptable it will be listed – and you’ve saved $49 per page submitted (yes they are willing to charge you for every page you want them to list!)
Sitemap Submit – The best approach
So let’s say you have a web site – legitimate, professional, and well designed. Before you built the site or had it built, you read all my articles on SEO and you read (and implemented) all the guidelines from my white hat SEO fundamentals page.
Once the site is complete, I highly recommend you create a sitemap.xml file. Don’t worry if you’ve never created an XML file in your life – it’s not really difficult if you follow the method I show below. Or, there are many programs and web sites that will generate a sitemap.xml file for you for free or a fee.
I personally prefer not to use such services because they can sometimes generate entries to pages you don’t want to include (like links to 3rd party web sites, which should never be included!), or I’ve even seen a situation where one service actually generated a sitemap.xml file using improper coding. As an example of this, I ran my own blog site through one of these generators, and it listed one page five times!
So for me, as important as this step in the process is, if you just follow the sample I provide, you’ll be set.
If you’re still not sure about how to do it, or completely confused, you can hire a professional, but don’t be surprised if that “professional” uses one of those web based services or programs, or charges you an arm and a leg. If you’ve got a 50 page web site that’s three layers deep, it might be worth the fee – a choice you’ll need to make. We generate sitemap.xml files for our clients all the time – and it typically takes anywhere from 10 minutes upwards of a half hour at most.
Creating a sitemap.xml file
First, like any HTML type page, this has to be created in a plain text editor. Don’t try to use MS Word or another word processing program – they stick odd hidden formatting code in files that causes problems on the web! So SimpleText, or NotePad, will work fine. If you know DreamWeaver, you can create the file as long as you use DreamWeaver’s Code view.
Here’s the structure:
The first line of code informs the search engines that this is an XML file and how to read it.
The next line is the opening of the “URLSET” – meaning that from this point forward, until this tag is closed (with the </urlset> line), everything in between is information about one group of URLs – or one web site.
Note then how I have four web pages listed. Each one is within it’s own “url” tag.
The actual web address (url) for each page is identified between the “location” (loc) tags.
The “changefreq” line informs the search engine how often this page is expected to change. (Don’t worry – this is only an approximation – if your pages change more frequently or less, it’s not a bad thing – not all search engines will make use of this information, and those that do (for now) will use it as a general guide).
The “priority” reference is supposed to inform the search engine how relatively important this page is compared to a search someone does. So for example, if you want to list your Contact page in the file, but would prefer people find your home page, then the home page would have a priority of 1.0 and the contact page would perhaps be given a priority of 0.5
Again though, this is all as a general guideline. And you can always come back at a later date and revise the file. But it’s more important to perform thorough SEO work on the pages, than it is to worry whether you should rate a page at 0.7 or 0.9 for priority. If you’re not sure, just rate them all at 1.0 and let the search engines do the rest.
So the above code image is just an example. Your site might have five entries, or it might have fifty.
How do you know how many it will need? Simple. Go to your web site. When you get to the home page, that’s the first entry. Then click on every link on your site that you want to be sure will be indexed at the search engines. Copy the web address that comes up in the web address field of your browser for each one.
CUSTOM URL’s – Shopping Carts, Member Areas, and odd information in the URL line..
What if you own a web store – a shopping cart system? And let’s say that if you click on the category link, then click down into an individual product. The URL might end up looking like this:
The search engines used to choke on this kind of web page reference (and some still do!) The good news is that Google knows how to handle that – so if you really feel that you have to have that page indexed at Google, then go ahead and put that page in the sitemap.xml file.
Of course, if you build (or have the site built) properly, and if all you do is include the top five pages in your sitemap.xml file, when the search engines do go about their automated indexing, those other pages will eventually end up being indexed anyhow. So it’s most vital that at the very least, you include the top pages for your site.
Checking the validity of the file
Okay so once you think you’ve got the file created. Save the file as a plain text file and be sure that you name it exactly as sitemap.xml
Once you’ve saved it, open the file in Firefox or Internet Explorer on the PC or Firefox on a Mac. If you’ve done your work properly, it will look something like this:
Ignore that gray box – a sitemap.xml file is not supposed to have any style information associated with it!
If you made a mistake – like leaving out that </urlset> tag, you will see something like this:
or that same bad file seen in Firefox
So the goal is to actually be able to see the file from inside your browser and have it look like it does in your text editor.
If you need more information, or if you really want to get into the nitty gritty of what can go into a sitemap.xml file, and if you’re really geeky, check out Google’s in depth sitemap.xml instruction page on the sitemap protocol.
OK – NOW WHAT?- Time to post the file to your web site.
Once you’re pretty sure you did it right, you need to get the file up on your web server – the computer where your web site is located. And it has to be placed in the same location (folder) as your site’s home page.
If you have file transfer access (FTP) to your web site, you can upload the file yourself. – It should go in the same place as your site index (index.html or index.php or index.cfm or whatever the main site index is).
If you do not have FTP access, contact your web host and explain that you need them to put it there for you. If they tell you your site already HAS a sitemap.xml file – GREAT! – but don’t believe them! I have personally dealt with professional web hosting companies that claimed this and when I checked I found they were lying!
How do you check to be sure the file is there?
Go to http://www.mysite.com/sitemap.xml – in Firefox or Internet Explorer. If it’s there, you will see it exactly as I showed you how you’d see it if you looked at it in your web browser on your local computer!
If you get a “404 page not found” error, or if you simply get “redirected” to your site’s home page, or anywhere but that file, it’s not in the right place.
Okay – so I have the file there, now how do I tell Google it’s there?
I could write sixteen articles on how to use Google Webmaster Tools and Yahoo Site Explorer – so I’ll leave the details for those to another few lessons. For the purposes of this article, I’ll provide the links to those sites.
Whatever you do though, if you’ve come this far, you can definitely get through the Google Webmaster Tools environment and the Yahoo Site Explorer environment – just be patient and take your time! (or send me an email asking for help!)
The most important thing here is that if you do this properly, once you submit your sitemap.xml file to Google and Yahoo for inclusion (at no charge!) you can expect to see your site indexed within anywhere from a couple days to up to a week, at most! That’s weeks or months faster than any other free method, and less expensive than the slew of fee based services that aren’t so great anyhow!
Note – With Google, you should first sign up for their webmaster tools program then submit your site. You get a lot more services along with it, it’s all free, and it’s great (although lately their site verify service has been having major glitches. – note that you do NOT have to verify your site to submit the sitemap.xml file – verifying your site lets them show you statistics and report errors on your site to you but if their own verify system isn’t working it can be very annoying and confusing! – so initially just use their system to submit the sitemap.xml file. Then if you want later, you can experiment and explore the verify service.
Note – with Yahoo – on the link above, you’ll see a bot on the right side for submitting your url or “feed” in this case, the “Feed” is your sitemap.xml file – so put your link in there (http://www.mysite.com/sitemap.xml)
As soon as you click the button to “add my site” you’ll have to sign into Yahoo (or create an account). It can be quite annoying if you’re not paying attention but well worth the few minutes it takes.
Again, if you get lost on any of this, please contact me and I’ll be happy to do what I can to help!
A word about robots.txt files
I will devote another article to the importance of robots.txt files (another plain text file that sits at your web site next to the index page and your sitemap.xml file) – but need to mention that all web sites should definitely have robots.txt files as well. If you want more info on them now, check out the wiki entry for them at wikipedia.