Confluent Forms LLC, located in Easthampton MA, is a boutique branding, graphic design, web design, web development, Blogger development, and PHP/MySQL application development firm providing services to customers from the Fortune 100 to local non-profit organizations and academic institutions. Serving Western Massachusetts and beyond.

Google's microformatting mixed messages

by on July 16, 2013
Last updated on
Multiple views of the same microformatted data
We have been proponents of using markup for adding semantic information to your site's content. These microformats are especially useful for embedding machine-readable "hints" to provide better understanding of your site content to the number of crawlers and other automated processes.

The promise of the Google-sponsored microformats are that both users and machines may find your page useful and content rich in their own way. Google's Webmaster Tools and Rich Snippets reinforce the need for webmasters to incorporate microformatting into their sites.

The modern approach to web development is destination agnostic

You develop your documents (web pages) as stand-alone content and use available techniques to make each approachable in a proper manner for each viewer. Microformats are an important part of this, allowing these documents to hold relevant and well-structured data which may be used for multiple purposes. There is an awareness that some viewers are on desktops, some on mobile and some not using any real interface at all, but are machines parsing your document in the purest form. As a developer you want your documents to be as complete as possible, provide all needed information, and then reduce them down as needed based on viewer: one document, multiple uses. When we design the structural markup of a page we try to make these completely honest documents: a full view of all information in the purest of markup form completely agnostic towards intended types of use. In these documents we must provide all information necessary for all uses, whether by man or machine, whether in device display or as a raw information source.

Depending on the viewing context, some of the information in your document either should not, or is willed not, to be overtly displayed. Often this reduction in displayed content is a function of design. While you, as the developer, may find it is necessary to include total data in the document the visual design of the webpage may limit what pertinent information is available for user viewing. This is common, there are techniques available to determine viewing context and disable the display of information when desired even though that data remains present in the document.

Modern web design is built on these techniques. Information deemed unimportant in a specific viewing context can simply be not shown in that context. Take, for example, a modified microformatted address from our previous post:

<address itemscope itemtype="">    
  <span itemprop="name">Joe Somethingorother</span>   
  <div itemprop="address" itemscope itemtype=""> 
    <span itemprop="streetAddress">1234 Main Street</span>     
    <span itemprop="addressLocality">Somewhereville</span>,
    <span itemprop="addressRegion">ST</span>
    <span itemprop="postalCode">12345-9876</span>     
    <span class="phone">phone: <span itemprop="telephone">(123) 123-4567</span></span>
    <span class="fax">fax: <span itemprop="faxNumber">(123) 123-9876</span></span>

A standard scenario would be the designer raising a vehement argument that including the fax number will take up too much space, and people hardly fax anymore so we should not include it at all. As the developers trying to create an honest document we would try to include as much information as could possibly be useful in some context. The designer would be thinking only of a visual context, however the developer understands that the document must serve more than one master. Fax number, while not particularly popular, is part of the individual's information and should live in the document if it is an honest documentation of the individual's data. Especially now that microformatting is involved and this document is expected do multiple duties: information source for the visual display of some data as well as useful content for the purely data driven automaton. The typical, common and best practice for hiding information for design purposes is to hide it with CSS. Everybody wins, the visual design does not show the fax, but the information is still available data in other contexts, such as a popular search engine trying to answer the query "What is Joe Somethingorother's fax number?".

Where is the conflict with Google?

You could easily see this approach applying to more esoteric data types, in the case of some universal identifier, while increasingly useful for an automated process using your document data would be confusing to a human user. Here is where there is an uneasy sense of conflict from the Google approach. They want a single data-rich document (our honest document) for indexing as well as a document capable of display adjustment for any device. They also want us using microformats to provide extra information to assist the data indexing and context. A document rich in machine useful data is most likely going to contain information less useful for the visual user. So we would wish to hide that information from the user.

Yet there is a common understanding that Google may not use and even worse consider deceptive information that is not shown to the human user. This was understandable as the application of techniques for hiding information have been used in the past to create fraudulent presence in Google search results. In modern web development hiding some information is not only useful, it is a best practice! Single document, multiple interfaces.

To get around the risk of looking fraudulent it has been recommended that content not intended for visual display be interspersed using meta tags. So it may look something like this:

<span class="phone">phone: <span itemprop="telephone">(123) 123-4567</span></span> 
<meta class="fax" itemprop="faxNumber" content="(123) 123-9876" /> 

But this is not in keeping with the strong technical desire for uniformity and simplicity in the creation of an honest document ignorant of end use. If the document purely ignores intended destinations then all similar content should have equal value. To put one item under div/span and another under meta, creates a implicit distinction in our data. Worse, it takes the honest document and adds viewing intention: designing for Google. It is no longer a honest source document, it contains viewing intention. If we modify the source document because of our expectations for a single search engine crawler, we are not generating a agnostic document. The document is no longer honest, it is instead being developed for the benefit of the dominant search engine.

This is a technically ugly, destination aware, mangling of the source content. From the machine point-of-view phone and fax should be equivalent pieces of content, even down to the markup used in their representation. There is no legitimate reason one would be span and one meta in a honest document. In effect, to avoid looking fraudulent we actually need to misrepresent our content making our documents strangely inconsistent: less honest, more use aware. The argument can be made that Google is not a total knowledge search engine, but exists for human interaction and therefore care only what humans will see. Therefore, if your site does not display that information, they don't care to use it. Their support of microformats for machine understanding makes me think this is not the case. That their drive is always towards more data, not less, and the more data in a document, the more useful it becomes.

How to neutrally represent content in a Google world is a problem we run into while developing websites. This is a problem Google should explicitly acknowledge, this is a problem they should fix, it is a problem I believe they have fixed. Of all the services they provide the indexing and relevant connecting of information is their number one most important function. I have to believe their actual capability to divine intention and delineate true uses of modern web development techniques from fraud will always be better than is publicly known. I don't think the SEO police will come down on you either if you hide machine useful data elements from display, but I do think there is some inconsistency in the message. If you want data rich documents, don't potentially penalize the hiding of the extra data elements not immediately useful to users of a visual display.

So what is it Google wants exactly?

A richness of available information in a display neutral document or a page designed only for human viewers? Really they want neither, as always what they really seem to want is to keep their process a mystery and developers in the dark. What do we want? To make a honest document with microformats, use modern web development techniques to display content as needed and not risk being viewed as fraudulent in doing so.

Join the conversation

No comments:

Post a Comment