Open Calais

A brilliant tool owned by Thomson Reuters helps Gulf News classify and categorise content on the fly.

In order to develop the tagging section of GulfNews.com we have made use of Open Calais, a tagging service owned by news agency Reuters.

Before a web story is published online, in well under a second, the story flies half way around the world to one of Calais' servers, that then data mines it for significant keywords, before returning it to us.

Naturally, in news the context is highly important, and the service uses something called Natural Language Processing (NLP) to understand the use of words in a sentence to then determine what the relevant keywords  actually are. “Fed” for example, could mean both being fed, as well as a short form of the US Federal Reserve. NLP helps Calais determine in which way the way the word is being used.

Of course, anything machine based, especially when it comes to something as idiosyncratic as language, will be an inexact science, and on some of our tagging pages you will find some unusual results. We are continuing to work on these pages to make the results better, but given the automation involved surprises will continue to be... unsurprising.

To make Calais work better for us, we do not show all the tags that the system returns, only those that are relevant to Gulf News and its audience. Calais recognises 1000s and 1000s of terms, but while altocumulus weather tagging results may be of interest to the geography departments of universities around the world, it's rarely of interest to the Middle East news reader - especially in a region not known for anything other than its clear blue skies.

We therefore limit tags to the primary tag associated with each of our In Focus pages. This ensures that only tags we consider relevant actually show up.

Calais itself has a relevancy tool, and this determines the order of keywords in our list. Unfortunately this is often based on frequency. While that may sound like a good idea, consider how often the word UAE or Dubai may be mentioned in a story about the Dubai Financial Market. Ideally we would like the latter to show up first in our list, Calais often plumps for Dubai instead.

This matters for a host of reasons. We use tagging, for example, to automatically surface stories across the site. In an article for example, you will find related stories associated to Keyword 1 and Keyword 2 - that is the first and second keyword in a story - in the middle column. If you are reading a story about Southampton Football Club, getting the relevancy right is important to returning more articles on the team you clearly like reading about.

Despite the current limitations there is no doubt that "semantic tagging" has made possible a whole new set of ways to present content across the site.

GulfNews.com will continue working with the tool, adding more functionality and refining the results in the months and perhaps even years ahead...

 

Loading...