Jan 9, 2013

Enhance your Site's Content with Microformats

It is time to become aware of the place microformats have in your site structure. Microformats are attributes present in the markup of your pages that grant an explicit deceleration (eg "this is a Blog Post") as to the role of the content present in the document. As their use increases, those sites without the additional information accuracy gained though microformatting may find themselves falling behind in search results and clickthroughs. A small time investment adding microformatting to your website content may yield large returns to your site and its content strategy.

HTML5 already has elements that lend some indication as to their role in your document. These range from weak (ul indicates a list of something) to the strong (address representing the contact information for the document or section). In HTML5 we have document formatting present in the markup, but it is essentially macro formatting. These are rough tools used to describe information as they relate structurally as a web document, all documents on the web must be structured to fit into this document model. They describe the role of the content in the document structure but provide little insight into the meaning of the document's content. Microformats exist to be compatible with this web page markup, but make available enhancements to add better semantics to the presented data.

The address element makes for a good example. An address is a complicated thing and the address element in HTML5 is a container for whatever represents the document/section contact information:

<address>
Joe Somethingorother
1234 Main Street
Somewhereville, ST 12345-9876
phone: (123) 123-4567
fax: (123)123-9876
</address>


This is presently clear enough for anyone reading to divine the elements of the contact information from the text in the address element. This illustrates one of the grand dichotomies between machines and men. We can intuitively understand the context of a grouping of text while a machine cannot. Microformats, therefore, are in the service of machines, to allow an explicit programmatic understanding of the context of items in the markup. This is performed by threading into the existing markup another more semantically narrow and accurate markup: a schema.

By "machines" we predominantly refer to search engines, by "search engines" we predominantly refer to the big three: Yahoo!, Microsoft (Bing) and Google, by "big three" we predominantly refer to Google, Google and Google. The fundamental practical aim of these microformats is to allow search engines a better understanding the data present in web pages. Schema.org is a collection of microformats: definitions of specific subject areas which may be interlaced within the HTML markup to provide the explicit declaration of the type of content, well beyond what the generic HTML markup indicates. If you look at the history of Schema.org it was started by Yahoo!, Microsoft and Google to make semantic understanding of content easier for their search engines. I will be focusing on the microformats available at schema.org, there are others available as well: Open Graph, RDFa, ...

These schemas are already in use by Google for displaying extra information in search results, you can see this concept in Google documentation as "rich snippets". The data for these snippets are informed by schema use in the repective indexed pages. When search results display properly microformatted products, which are listed for sale on a page, the extra product information shown in the search results comes from information communicated through the use of the Product schema. As schema use grows you may expect to see more schemas used as well as a greater adoption of schemas understanding in other services.

Applying a microformat is a matter of adding more attributes to existing tags (and more tags where necessary). The Schema.org documentation goes into detail on the proper application of these attributes.

Let us microformat that address element to make it semantically clearer.

<address itemscope itemtype="http://schema.org/PostalAddress">
Joe Somethingorother
<span itemprop="streetAddress">1234 Main Street</span>
<span itemprop="addressLocality">Somewhereville</span>, <span itemprop="addressRegion">ST</span> <span itemprop="postalCode">12345-9876</span>
phone: <span itemprop="telephone">(123) 123-4567</span>
fax: <span itemprop="faxNumber">(123) 123-9876</span>
</address>


(Why addressCountry, addressLocality, addressRegion and the street is streetAddress not addressStreet? Do not spend too much time worrying on this, take it as it comes.)

One drawback to microformatting is the proliferation of markup tags required to apply the formatting to detailed data. In order to properly apply microformatting to content you find yourself requiring a tag where there was none use a "span" tag or a "div" tag (if it contains spans). These are the most generic of HTML tags: the most basic and context neutral. Schema markup may be applied to any document element, try to keep things simple.

The microformatted address is good, but limited. It carries more information than we have marked-up, what we really need to properly microformat this element is something more. Remember the "address" tag is not just an address, it is "contact information". There is no reason (save symmetry) to use the limited postal address schema to semantically markup the entire address tag. In this case the content of the address tag represents a contact person and we should therefore use a broader and more correct schema: we will be formatting it as a Schema.org Person. Because the contact point in this example is an individual, by using the Person schema we can add more information beyond what is available when using just the Schema.org PostalAddress element. Fortunately a great strength of the schemas is that they use each other in defining sub-elements. The Person schema allows us to also use the PostalAddress element as part of the person's information:

<address itemscope itemtype="http://schema.org/Person"> <span itemprop="name">Joe Somethingorother</span>
<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress">1234 Main Street</span>
<span itemprop="addressLocality">Somewhereville</span>, <span itemprop="addressRegion">ST</span> <span itemprop="postalCode">12345-9876</span>
phone: <span itemprop="telephone">(123) 123-4567</span>
fax: <span itemprop="faxNumber">(123) 123-9876</span>
</div>
</address>


That is a little more appropriate for our use. Now the contact information is more specific but also retains the address microformat as well. You may also use the Organization schema if the contact information is for a business entity.

Schemas can go a long way in supplying more information from your web presence. The compelling use case today is to supply Google better information for search results on your web page. It is also easy to see how more applications may be developed to take better advantage of microformats as they gain wider adoption.