Creatifica Associates Ltd Creatifica Associates Masthead
Subscribe Subscribe to RSS feed
Adrian Dale - Creatifica Associates
Creatifica Associates Ltd.
Information Management Insight
Home | Consultancy | Training | Library | About Us | Contact Us

Google edges towards a more structured universe

Google is taking its next steps towards recognising and using the structure of well coded web resources (Google rich snippets).   Reading this announcement would leave the average information scientist/architecture alternately bursting with excitment or fuming with frustration.  Here are some quotes:

A lot of previous work on structured data has focused on debates around encoding. Even within Google, we have advocates for microformat encoding, advocates for various RDF encodings, and advocates for our own encodings. But after working on this Rich Snippets project for a while, we realized that structured data on the web can and should accommodate multiple encodings: we hope to emphasize this by accepting both microformat encoding and RDFa encoding. Each encoding has its pluses and minuses, and the debate is a fine intellectual exercise, but it detracts from the real issues.

We do believe that it is important to have a common vocabulary: the language of object types, object properties, and property types that enable structured data to be understood by different applications. We debated how to address this vocabulary problem, and concluded that we needed to make an investment. Google will, working together with others, host a vocabulary that various Google services and other websites can use. We are starting with a small list, which we hope to extend over time.

How many years have we fought the battle for adding structure against the tide of “Google doesn’t need/use it”?  And now straight from the horses mouth at Google is recognition of the value of metadata and structured vocabulary!

This is of course excellent news but there has been some negative comment from Ian Davis (Google’s RDFa a damp squib) who questions why Google hasn’t just adopted existing standards.

Google has also announced Google Squared and Google  New Search Options probably an attempt to respond to the WolframAlpha launch. Again both exciting developments that have opened up a new front for the information professional.  This whole debate will be a major theme at Online 2009.

What a turn around for Google! Two years ago at Online 2007 a Google salesman announced the death of metadata – “Google pays no attention to the structure of web pages and doesn’t need web masters to do any tagging.”   This met with some derision from delegates but many were resigned to Google’s muscle wiping out their love of structure.

However, this salesman wasn’t being strictly accurate.  Anyone who has studied the structure of Google results pages will know that contents the <title> tag is displayed from html pages as is the contents of <meta  name=”description” content=”abcd” />. 

For MS Office and PDF documents, Google can and usually does (but not always!) display the contents of the title properties if they appear meaningful.  This has caught out many a government department in UK as many authors and web masters don’t check the contents of the title field.

So structure is finally returning to the information world – and not before time!

The true power of open-linked data

Tony Hirst has just posted an excellent article demonstrating how open-linked data and mash-up tools can transform the presentation and hence the value of information “Visualising MPs’ Expenses Using Scatter Plots, Charts and Maps”. MP’s expenses are a hot topic in the UK at the moment and a spreadsheet of them was posted by the Guardian. Tony Hirst shows the challenges and successes of using open-source tools to present this data in ways that make it much more usable. When I think of how much the first Executive Information Systems cost in the 1990s! The would make an excellent case for Online Information 2009 – and we will be tapping Tony on the shoulder!

Tim Berners-Lee on Open Linked Data

RAW-DATA-NOW – This was the plea from Tim Berners-Lee at the TED conference as he exhorted the audience to push for the publication of linked data from every field of life.  For anyone who wonders what the point of the semantic web is, this is worth a look and the comments are worth a read – view video. It is clear from our planning for the Online 2009 conference that the semantic web has now come of age, albeit in the form of the Open Linked Data movement. We will be putting out a call for speakers shortly looking for the best examples of innovative information management using these principles.

How uncool – repository URIs

As part of his efforts to re-promote the concepts of “cool URIs”, Andy Powell has just completed a review of the URI designs for the UK’s university e-repositories.  Given that these repositories are designed to provide persistent access to the output from research programmes, persistent identifiers would seem to be essential.  The results were surprisingly disappointing, with most institutions having committed at least one of the cardinal sins:

  • Building the name of the repository software into their URIs
  • Allowing the use of the underlying technology stack (.aspx, .php, .html) to appear in the URIs
  • Using a non-standard port to access the repository
  • Building the name of an organisational unit into the URI
  • Using a “jazzy” project name as part of the URI (remember the Amazon obidos!)
  • Outsourcing to a third party – losing the institutional focus – and being reliant on 3rd party URIs

How can this have happened?  Why have these organisations not thought through their naming and addressing policy and design rules?  The answer is surprisingly simple.  Most of the people in charge of web implementations have not been sensitised to the importance of addressing – “cool URIs” as Tim Berners-Lee called them in 1998.  I lectured this week in a UK library school and from what I could see, the question of digitial identifiers and  the importance of their effective management featured no where in the curriculum..  And yet what was the ISBN if not the precursor of persistent identifiers?

Your Website is Your API: Quick Wins for Government Data

Knowing my passion for URIs, Peter Winstanley from the Scottish Government brought this article by Jeni Tennison to my attention. She lays out the three key things that public sector web sites need to do using a URI based model to achieve it:

  • identify the data that you control
  • represent that data in a way that people can use
  • expose the data to the wider world
  • She lays out a strong business case and clear examples – so let’s follow them!