Coding Spider-Friendly HTML
Modern browsers implement CSS and XHTML well enough that we can finally get rid of table tags. With some new coding techniques, you can reduce download, reduce maintenance time, and make your site more accessible to the disabled. Oh, and search engines will finally understand your pages, too.
Search engine spiders crawl through your site every week, looking for good bits to store in their databases. When people run a search on Google, Yahoo!, or any of the other search sites out there, what they are really doing is sending a query to that search engine’s database. So it is in your best interest to create HTML that not only pleases your website users who see it through their browsers, but you also want to make sure it’s is easy for these spiders to digest correctly.
There are four basic ways to help your code look better to a spider. Many of these techniques have become reasonable only recently with the widespread adoption of decent CSS support. Here they are, in order from most potent to least:
One big thing to note: not all of these techniques need to go hand in hand, although using all of them will help your site more than using just some. CSS and XHTML can be a bit confusing to transition to all at once. Pick the tips here that seem the most likely to help, and then create a hybrid style of coding while you learn the ropes. There’s no reason you can’t use CSS to arrange your table layout, or mix JavaScript rollovers with a:hover hilights. But, however far you get into this, the time to start is now.
Heightened Textuality
Remember that most search robots will see your website much as a text browser would. If you have access to Lynx (or a Lynx substitute), take a quick look at your site—this is similar to what the search engines see. Is it understandable? Is it even there? Features such as frames, Flash, JavaScript and DHTML may prevent crawlers from accessing the content of your site, so even a beautiful looking site can be invisible to search engines.
Flash is a big culprit. You could have an amazing, text-rich site that is attracting all sorts of vistors and sure looks good, but look at it like a spider would and there’s no there there. (Compare that to this other text-rich site you know.) The problem is that only Flash understands Flash. Even your browser needs a Flash plugin to make it work. To a search engine without this ability, a Flash site is empty.
Many sites create an HTML site in parallel with their Flash site, but this raises the problem of maintaining two sites for every update—which can get expensive. But what do you do if your audience demands functionality that only Flash provides?
One solution that maintains much of the Flash-iness but leaves the text visible to a spider, is to break the page into rectangular sections. Sections that require Flash functionality can use small Flash bits, and sections that are textual content can be HTML. Problems sometimes arise if the design requires colors or lines to cross from one section to another, but otherwise this can be a successful way to get the Flash look while remaining visible to spiders. But never put your navigation in such a Flash bit—if a spider can't see the links to the rest of your site, then your whole site just became invisible. No spiders, no visitors.
Spiders are allergic to JavaScript, for a similar reason. Without downloading, interpreting, and running the script, the spider has no way of knowing if it writes new HTML dynamically to the page, opens a new window, or simply causes a button to blink. A spider is software, like a browser, so it is perfectly capable of doing that—except that most JavaScript reacts to a user’s action, and that the spider can not do. If you have JavaScript or DHTML dynamically adjusting your site, understand the this content is invisible to spiders. If you need it to show up in search engines, put it in HTML.
Frames. Every spider out there puts forth a diligent effort to index sites behind frames. But frames by their nature break the rule of the web that every location is represented by a single unique address. So even if the search engine figures out what’s in your site, assembling the proper link to get there may be difficult or impossible. Properly used, frames can enhance certain sites. But you should always be aware of how it affects how people can find you.
Your choice of what technologies to use will naturally be tailored to your content and audience—a photographer won’t get very far describing his photos without showing them. But if you have mostly textual content, opting for HTML can increase your site’s visibility to search engines.
Crash Diet
If you’ve been coding HTML at all over the past decade, you are likely very familiar with tables. You may think of them as a blessing or a bane, a useful workaround or an odious hack. But regardless of your opinion, they are a bulky solution to the layout problem. Cascading StyleSheets have promised, if not to perfect web layout, then to at least clean it up.
Eliminate tables as much as possible, because they inflate the code and slow down display. If you’re still new to this CSS game, the best place to start is by breaking your page into header, navigation, content, and footer divs which you can arrange on a page using someone else’s CSS tricks. Later you can venture to replace those tables one at a time.
You’ll notice something immediately about your code: it’s less complex. That makes it easier for the spider to see what’s code and what’s content. If you want another quick trick, replace all those font tags with CSS styles. To smooth the transition, put a <span style="..."> wherever you had font tags; later, replace the style attribute with a call to a class defined in a central stylesheet. Then, aspire to remove even that by attaching a style to existing tags.
Lastly, keep the actual code downloaded per page small. Since search engines are only interested in the content, try to reduce the amount of code to render the page to a minimum. Keep JavaScript and CSS in external documents that are referenced from the HTML code, and the spiders can happily ignore them.
Content In Context
HTML tags can do a much greater job than just laying out or styling the content. They can describe it in ways software like a search engine spider can understand.
Instead of making the headline of the article big by defining a “.big” class in your global stylesheet, put the headline in <h1> tags. A spider doesn’t know whether big text is a headline or a decorative identifier, like “Bivia” on this page. But it knows that <h1> is pretty darn important. Don’t like how the headline looks? Style it to fit. And remember, h1-h6 are importance levels, not size. There’s no reason you can’t style h4 to be larger than h3 if that’s what looks right. But h3 is always more important to the page than h4.
If you’re following a complete SEO plan, then you probably have researched some keywords that you already sprinkled through the text. Now, take those same keywords and place them strategically in your HTML code. Of all the HTML tags on your page, the Title tag is the most important to search engines. It supercedes all the h1’s you put in.
Use the alt attribute in all your image tags. Not only is it required under XHTML specs, but it also gives search engines needed guidance as to what the image is. The spiders than try to figure out if there is a caption for the image by relating the alt and filename values to the surrounding text. Notice how they grab all that text to deduce what the image is, instead of just looking at the image? This is a strong clue as to how much spiders like text.
Now, the code-in-context bit everyone’s waiting for: Meta tags. Feel free to put your keywords in there, but do this last; it’s not much help. Meta tags used to be the popular means to make your site search engine friendly, and the reasons why they are not anymore is a whole article to itself. If you think Meta tags are important to the survival of your site, you might want to see why Meta tags are not as useful as you think.
In Order
Ok. If you’ve gotten this far, you’re likely to be a CSS guru by now, positioning all sorts of fancy boxes willy nilly with pixel precision. Now is the time to do something radical:
Put the content first.
In the table-tag days, if you wanted a header across the top, you needed to put that first. Navigation to the left? Second. Content? Nearly last, with all that code, links and branding above it. CSS lets that go.
If you’re absolutely positioning the divs, then it doesn’t matter what order they come in within the code—the stylesheet rearranges it anyway. Put the content div first, and all the important text content first within that. Then, put the navigation, and then branding. The layout on the page is the same, but now all the text is first in the code, and the links second.
I swear, I think I just saw a spider smile.

