Why Do So Many Large Websites Fail Basic HTML 101?

January 01, 2012

I am about to work for a large recognizable company where I will be doing iOS development as well as help out on their mobile website. Curious as to what I am getting into I took a peek at the mobile website and their main website.

Not pretty. Both sites fail miserably on validation, including obvious flaws that aren't even valid syntax. Hopefully I can gently nudge them into making things better without immediately pissing anyone off.

The main site uses Vignette, a particularly vile web content management system I remember examining at some point in the past decade. Many of these systems are very expensive and generate particularly crappy HTML. I wish browsers could punish really terrible HTML but the developers bend over backwards to accommodate almost every kind of error which allow these dreadful content management systems to continue to exist.

Both of these sites declare themselves as XHTML but fail to follow the syntax rules thus making the whole declaration pointless. For a time I built websites with XHTML as well as it seemed the popular thing to do, but now it makes no sense whatever as HTML5 is clearly the correct choice. XHTML was rarely if ever actually served up by web servers which generally served them as HTML anyway necessitating the browser to mostly ignore the doctype anyway. XHTML is really easy to validate being an XML derivative. Why bother declaring it if you aren't going to do that?

Mobile browsers for the most part today are HTML5 compatible as far as I know so there is little reason not to go ahead and be modern. Even if you really still care about people with antique cell phones with ancient browsers at least make whatever you serve up validate so you won't get odd behavior. The whole point of validation is to ensure you get a reasonable chance of having your website appear correctly for all of your customers.

I imagine most of the time websites that are hand generated are far more likely to be valid. Big companies often use fancy web content management systems as they think they have more control over a large website with fewer people but I wonder if that's really true. Still why would you not want your website to be at least reasonable valid even if it is generated by some monster generator?

In this specific case the main website has for example a div tag in the head section. Even though it is declared as XHTML many of the tags are incorrectly terminated. The mobile site is XHTML and has tag attributes in upper case. Sure you can argue it doesn't matter as sites still appear to work but by not paying attention to the relatively simple syntax validation you wonder what else is wrong or missing. To me a professional website should be valid because it is part of doing professional work. If you screw up in the little things you wonder what big things are wrong. Security, maybe?

I remember a few years back my local phone company (at the time SW Bell, now AT&T) had a contact form on their website which I tried to use on my Mac with Safari. It would not submit the form at all (nothing happened) so I looked at the HTML and found multiple html, head and body tags all mixed together, likely by some templating or content management system run amuck. It was so confusing that Safari couldn't make heads or tails of it. Apparently IE6 (at the time) was the only browser that it worked on. I complained (via an email address), documenting how truly awful the HTML was, and the only reply I got was that they didn't support Safari. What a joke. I cancelled my local phone service so I never knew how long it took for this to be fixed.

This website is HTML5 and validates (though I noticed I need a character encoding). It's not rocket science.

I'm sure if I checked the 100 top websites in the world most would fail validation, generally in some embarrassing way. Apple's site is one that is very close to validation (it's also HTML5) so I know it can be done by a big company. Microsoft is pretty terrible, yet another failed XHTML site. Google is invalid as well though not too bad. IBM is impressive, an XHTML site that validates (though I have to admit my sister was in charge of that for a while, family pride). Facebook serves up its mobile site to the validator which of course then fails as XHTML. Oracle is a complete disaster, again XHTML. Yahoo is HTML5 but has a lot of errors which make me think that it might have been XHTML at one point.

I could go on and on, but to me having an invalid website should be a professional embarrassment.