Towards Web 3.0

Ten years ago my wife, children and I visited China. Above is my daughter in Tiananmen Square. Behind her is a fellow clearing the snow with a broom. At that time, I saw soldiers clearing snow from local highways with brooms. Human capital. China has lots of it.
________________________________________________
It is hard to find a search term for which Google won’t come back with at least ten thousand results; most come back in the millions. How many results will a person look through before they find what they need?

With thousands of results, the investigator is then left to wade through this vast amount of information which is presented in incomparable form and structure. He must assimilate, organize, assess, and then come to some conclusion as to what it all means and what action to take.

If I look back ten years, a search was a very manual effort with books, libraries, etc. There was no Google. I think it’s fair to say that the progress that has been made is in the area of access to information. Access has increased to a point where I can find records of my grandfather’s arrival in North American in 1913 or read all the latest news from hundreds of newspapers around the world. I can gather the opinions of a wide number of readers on products through various fora. This has come about as a result of what some might call “the digitization of everything.” And that process continues.

All this information is presented to me in a human-readable form. But not in a computer-processable form. To be processed by a computer in an efficient and effective manner requires the abstraction or distillation of the semantics from the mead and those meanings need to be presented in a consistent and comparable form.

Computers are not able to take a word and infer its semantics. People aren’t either. What does anschrift mean? People and computers need to be taught. So how do you teach a computer? A very popular approach these days is to tag things. A tag generally imparts one meaning; many tags impart several meanings.

Everything is being tagged: photographs (flickr); news (CBC, BBC); music (iTunes); stuff (digg, technorati) etc. So let that process proceed and some day everything will be tagged. Then we will have transferred the problem from hits returning millions of documents to hits returning millions of tags. Yet, it will signal another step forward because a tag is easier to compare and process for a machine, a computer, and so part of the process can be automated. Pushing work to machines is always a good thing.

But machines like consistency, even more than us. So when you tag something it would be nice to tag everything in the same way. For example, do you use the tag Person, People, Human, Party of Interest, Man, Woman, Child, Homo sapien, etc. So all we need to do is to consolidate towards a standard set of tags and life is even easier. Simply said, harder to do. Anyone who has ever lead a project to define tags for say office documents knows how hard it is. It seems like such a simple idea, yet it is so hard to achieve. First, there is the challenge of coming up with the tags themselves, then they have to be defined so they can be assigned consistently and then they have to be used; documents or things have to be tagged. Three points of failure. A fourth, if you include maintenance (i.e., adding new tags, etc.)

Picking the right tag involves some level of subjectivity and thus is prone to error and or inconsistency. Yet, on the scale of the web, where there are millions of users, a consensus on which tags to use seems to emerge. In del.icio.us for example, I can tag my stuff with what ever terms I feel like; what ever makes sense to me. Or, I can use tags they suggest that others have used to tag the same or similar content. It is through this latter means that we are offered a path towards common tags. It is by having millions of people tagging the same content that we come to some agreement on what is an appropriate set of tags to use. We arrive at that “bell curve” of tags that provide the meaning.

I came across an article entitled “22 ways to overclock your brain…” I posted it to del.icio.us and tagged it under retirement. I noted that it had be posted by 2,106 other people and usually tagged under brain and health.
If you look at the tags more closely they can be categorized into different groups. I arbitrarily categorized the terms into seven groups: message style; format; subject; [emotional] reaction; context; action and domain. These seemed to cover the tags used in this instance. I’m sure a more detailed study of tags across many millions of documents might come up with a common set of categories.

This sample categorization of tags suggests first: that people draw different meanings from things and second it offers some perspective on what those different dimension might be. Some see the material in terms of the format is it available (an article); many see it in terms of the subject (i.e., the brain); some interpret it in a broader context (health, fitness, etc.).

Yet it all remains pretty subjective, Darwinian at best. It could be the subject matter itself that makes tagging difficult; the more abstract a thing the wider the range of interpretations?

If we look at something closer to the tangible end of the spectrum, products for example, they are [usually] physical things and have some straightforward characteristics. Amazon has recently announced Amapedia. “Articles about products are tagged with a term that describes what the product is (“is-a tags”) as well as their most important features (“facts”). [1].”

Two reasonably straightforward, obvious and understandable categories of tags, in my opinion.

“A fact in Amapedia is a piece of information about the subject of an article. Every article can have many facts; each fact consists of a name and optionally a list of values. [2]”

As well Amazon has provided the community to develop and tag the articles and the tools for them to do that: Amapedia. Amazon describes Amapedia this way:

Amapedia is a community for sharing information about the products you like the most.

Amapedia introduces an exciting new way of organizing products we call “collaborative structured tagging”. In a nutshell, it makes it easy for you to tag products with what they are and with their most important facts, and for others to search, discover, filter, and compare products by those tags.

Amapedia is the next generation of Amazon.com’s ProductWiki feature; all of your previous ProductWiki contributions were preserved and now live here.

I added an entry [3]

So, if Web 3.0 is the next step in the evolution of the Web and if that next step includes doing something useful with all the information the html web (web 1.0) has created and the collaborative web (web 2.0) has tagged, then Amapedia is a step. Amapedia is a little different than the other tagging fora, such as digg, del.icio.us, etc. in that it is more structured and focused. As a result it should provide a better quality of tagging. A better quality tag will enable web 3.0 tools to provide the next step: doing something meaningful and useful with the search results. In this case, although narrowly focused, it should help us in our purchasing decisions. A big part of every one’s life.
_____________________________________________________
So fellow minions, like the Chinese army of ten years ago, we are the human capital tagging the digital content being published across the world. It will be through our votes of tags that the world’s content will be interpreted and then processed in the Web 3.0 world.

Towards Web 3.0

Comments

Leave a Reply Cancel reply