September 16, 2011

Annotations in BI tools: why are they important?

I believe that the most important part of collaboration in Business Intelligence tools is capability to annotate data down to database row level. Why is that important?

Any data present in transactional and then in analytical systems usually reflects real-life events in business environment. "Primitive" events like purchase order, customer call, consumed content or service, etc. usually are well structured and explicitly represented in data models. However, there are much more important events and influencing factors at macro level that are not that obvious -- e.g.reasons of loss of sales, increased customer churn, government acts, competitor's actions or weather cataclysms. Such influencing factors and events usually don't get accounted in IT systems and do not exist in data models. However, their impact can usually be observed in key performance indicators change (actually, this is what KPIs are for). It means that KPI trends contain encoded information about influencing factors in specific time period. Role of good analyst or manager is to decode knowledge about influencing factors from data trends in current context. And then use this knowledge (often in a collaborative manner) for making right decisions. Therefore, KPI data itself doesn't represent big value -- it's just bits and bytes that mean something. But important knowledge extracted from this data -- does.

That's why any decent data analysis and data visualization tool should have capability of data annotations -- i.e. explicit explanations or comments made to specific subset of data. Here are a few considerations how it should be done, in my opinion:

Stick to context
All annotations should have business context -- e.g. specific customers, products, branches, marketing campaigns, etc. Without context any explanation is meaningless. Context should include at least time interval, when the note is relevant. Users should also be able to clearly see which context a comment is related to. Also users should have capability to see only comments which are relevant in current context (selections or filters).

Focus on business entities
Annotations should be tied to business abstractions and entities (like customer, product, etc.) and not to technical abstractions and entities (like charts, tables, databases, files, servers).

Conform to behavioral patterns
News sites, social networks and instant messengers have taught people to read text streams, where they have posts, notifications and comments in one place. It might be not appropriate in some cases, but this is commonly accepted behavioral pattern and it's better not to go against it. So reading data annotations should be similar to reading Twitter or Facebook streams (in appropriate context, of course).

Visualize annotations
As I mentioned previously, any comment has to be tied to context which is usually represented in BI system as combination of metrics and dimensions. In turn, values of metrics and dimensions should have indication if they have annotations. It means, that if a table or a chart contains annotated data, it should be clearly visible to a user. Finding proper approach for visualizing annotations in charts is not an easy task as it should deal effectively with possibly big number of comments.

Here is an example how it was done in Explainum -- my experimental social BI hobby project.



Make pointing easy
It should be possible to point in emails and IM chart to comments, specific data trends, anomalies or specific business entities like customer, order, etc. using usual URLs and hyperlinks which can be copy-pasted to and clicked from anywhere.

Again, examples from Explainum -- link to a dip in chart and link to a comment.

Design Signal/noise filters
Social systems sooner or later face problem of noise -- which is large amount of irrelevant information. While for BI systems there always is "natural" filter in the form of combination of dimensions, there still could be a problem if many comments belong to the same context. The problem could be partially solved by using various tags, that represent domains of knowledge, branches of an organization or different languages. But much better approach is to use proper semantic abstractions.

Use relevant semantic abstractions
This is the most interesting thing, I kept it for the dessert.

Both traditional BI platforms and social services like Twitter, Facebook or Google Plus have one common flaw that make them poorly applicable for decision making process -- they know nothing about decisions. I mean, they don't have appropriate semantic abstractions. What I'm talking about:

BI platforms grew up from databases, not from management psychology or knowledge management labs. They inherited core abstractions from database world: fields, tables (charts are simply another representation of tables), dimensions, expressions, alerts. But these are not the abstractions that business people use in every day life -- like goals, successes/failures, events, influencing factors and causes, risks, decisions, problems, expectations, etc.

The same is true for popular social services, which also include (besides the ones mentioned above) services like Yammer and Identi.ca. What kind of abstractions do they employ? Friend/follower, tag/hashtag (which could be anything and nothing specific), like/dislike, retweet/repost. Again -- these are not core abstractions from business environment.

I propose to use semantic structuring of textual information for specifying who, what, where, why, how, etc. right in the text of an annotation. This approach might require either some form of multilingual lexical analysis (which is far not a trivial thing) or using some kind of advanced tagging where each type of a tag represent certain abstraction (may be there are other ways, but I couldn't imagine anything else). Examples of semantic abstractions may include:
  • Interests, goals, subject areas
  • Influencing factors, reasons
  • Parties, persons, groups
  • Decisions, instructions
  • Anticipations, forecasts, predictions
Imagine how much easier would be our life if we could use these abstractions in BI tools. How easier and more effective could be conversations with questions like: why do we have high customer churn? what are influencing factors? Did we have something like this problem in the past? What caused them that time? Who solved it and what was the solution? What were the results?

In my opinion, this could be the real business intelligence, because it doesn't deal with data only -- it deals with real-life knowledge. What do you think? if you find this theme interesting.

PS. One more my experiment: QlikView extension for data annotations -- Explainum Feeds.