This is a fantastic start to defining the use cases that need to be addressed. I would add one more metric type that seems to be hinted at, but not defined fully. If we go back to Kimball's definition of metrics, we can add semi-additive metrics. These are metrics that can sometimes be aggregated across some dimensions, but not all. For instance, month-end balances can't be summed across months. They represent a running total. They can be averaged across months though. Another example is a budget or limit in a parent table that can't be aggregated across it's children, but needs to be aggregated across other budget records. In these instances, we can do nested sql to get the appropriate aggregations -- sum up line items, join to parent table and sum again. Also, we could join and sum making sure that we divided the parent metric by the # of line items so everything sums up correctly.
I would argue that these solutions are inefficient and a hack that makes self-service analytics by less technical people almost impossible to achieve. It's not that an end user shouldn't have to think about data granularity, but they shouldn't have to create technical solutions on their own for it. A metric store that can address this will be one that has the basic capabilities needed for creating analysis-ready data sets in an organization.
Aside from semi-additive metrics, the other question for metric stores is the reliance on relational models. How can a metric store function on top of a knowledge graph? How do we implement it as a graph or as a virtual graph (i.e. Stardog)? This has to be asked just due to the scalability and robust flexibility of graph modeling.
My apologies. I just saw this. Graph modeling is by default schema-flexible as opposed to relational modeling methods where adding or removing a field result in burdensome maintenance tasks. In contrast to json modeling, graph modeling is more easily used for analytics and doesn’t require a new schema to be performant in a give set of use cases. Finally — and this requires an adjustment in mindset — there are no inner or outer joins with a graph which potentially makes self-service reporting and modeling much easier for non-data people. I don’t know about you, but anytime a business unit tries to do join even in a tool like powerBI there are often errors. If a new join is needed then users need to only define a new relationship between nodes. One can imagine, a core metrics layer with separate layers for revops, risk Mgmt, product, etc.
At Humanic, we further believe that the metrics layer needs to be integrated within a context of a tool that the business owner (Revenue Leader) uses to take some revenue generating action, otherwise it becomes yet another tool that Data Scientists and Business Analysts will be utilizing.
That by itself is not a bad idea except that much is lost in translation when a business user needs to communicate data requirements to a Data Scientist.
Amit, Excellent article and thanks for sharing your thoughts. I agree that there are benefits of unbundling the metrics layers and solution #3 is promising.
If solution #3 were adopted by a few vendors, the next question is, what will be the bridge between visualization grammar and the query generation tool.
Generation of queries for multiple aggregation levels is non-trivial. What if the tool vendor wishes to optimize the query created by a separate layer?
This is a fantastic start to defining the use cases that need to be addressed. I would add one more metric type that seems to be hinted at, but not defined fully. If we go back to Kimball's definition of metrics, we can add semi-additive metrics. These are metrics that can sometimes be aggregated across some dimensions, but not all. For instance, month-end balances can't be summed across months. They represent a running total. They can be averaged across months though. Another example is a budget or limit in a parent table that can't be aggregated across it's children, but needs to be aggregated across other budget records. In these instances, we can do nested sql to get the appropriate aggregations -- sum up line items, join to parent table and sum again. Also, we could join and sum making sure that we divided the parent metric by the # of line items so everything sums up correctly.
I would argue that these solutions are inefficient and a hack that makes self-service analytics by less technical people almost impossible to achieve. It's not that an end user shouldn't have to think about data granularity, but they shouldn't have to create technical solutions on their own for it. A metric store that can address this will be one that has the basic capabilities needed for creating analysis-ready data sets in an organization.
Aside from semi-additive metrics, the other question for metric stores is the reliance on relational models. How can a metric store function on top of a knowledge graph? How do we implement it as a graph or as a virtual graph (i.e. Stardog)? This has to be asked just due to the scalability and robust flexibility of graph modeling.
Excellent points Dan! Could you elaborate more about the graph modeling?
My apologies. I just saw this. Graph modeling is by default schema-flexible as opposed to relational modeling methods where adding or removing a field result in burdensome maintenance tasks. In contrast to json modeling, graph modeling is more easily used for analytics and doesn’t require a new schema to be performant in a give set of use cases. Finally — and this requires an adjustment in mindset — there are no inner or outer joins with a graph which potentially makes self-service reporting and modeling much easier for non-data people. I don’t know about you, but anytime a business unit tries to do join even in a tool like powerBI there are often errors. If a new join is needed then users need to only define a new relationship between nodes. One can imagine, a core metrics layer with separate layers for revops, risk Mgmt, product, etc.
My project: https://github.com/Agile-Data/flat-ql
It compiles flat query language into SQL dialect of various database for data analysis.
Great blog post Amit.
At Humanic, we further believe that the metrics layer needs to be integrated within a context of a tool that the business owner (Revenue Leader) uses to take some revenue generating action, otherwise it becomes yet another tool that Data Scientists and Business Analysts will be utilizing.
That by itself is not a bad idea except that much is lost in translation when a business user needs to communicate data requirements to a Data Scientist.
Would love to get a coffee to discuss further! saksena@humanic.ai
I’m working on this. :-)
https://github.com/TheSwanFactory/hclang/blob/master/hc/cqml.hc
Let’s do coffee!
Ernest.Prabhakar@gmail.com
Amit, Excellent article and thanks for sharing your thoughts. I agree that there are benefits of unbundling the metrics layers and solution #3 is promising.
If solution #3 were adopted by a few vendors, the next question is, what will be the bridge between visualization grammar and the query generation tool.
Generation of queries for multiple aggregation levels is non-trivial. What if the tool vendor wishes to optimize the query created by a separate layer?
Merits of metrics! 🙌 🎉