William Thomson, Lord Kelvin |
Lecture on "Electrical Units of Measurement" (3 May 1883), published in Popular Lectures Vol. I, p. 73
Estimating Data Value
Introduction
There has been a lot of interest in data monetization over
the last decade or so and this discussion is meant as a way of thinking about
estimating the potential value of data and in contradistinction to realization
of that value. Think of this as an
analogue of a geological survey without any guarantee of finding economically
viable mines; an attempt at a quantitative analysis of what value data have
along the lines that the late Lord Kelvin had suggested.
Column Value Model
We assume a
company’s value to be a combination of human, physical, and data capital; a
fully automated company with no human resources is a pipedream, a company must
have physical assets to conduct business (be they rented) and must make
decisions based on some information or other.
(For simplicity, Goodwill valuation is excluded from this model).
So we start
with the following formula for the company’s valuation and the proceed to give
an estimate of its data assets in a more defined manner.
Company Valuation = Human Capital + Physical Capital +
Data Capital
Assume all
three parts have the same value and the company’s valuation is $ 3 billion:
Data Capital = Company Valuation/3 = $ 1 Billion
In this
model, we would like to estimate the average value of each column of data in
all the relational schemas that the company uses, i.e.
Let us
further assume that there are 50 relational schemas in this company, that they
each have 50 tables of 20 columns each.
This gives a total of 50,000 columns in total that yields an average
value of $ 20,000 per column. In this
approach, all columns are considered to be of equal value; be they customer
names (let us say) or such ubiquitous columns as “Last Updated Date”.
We can then
proceed to estimate an Average value for each Schema
Alternative Models
Data Volume Model
In this model, the value of each schema is estimated based
on its data volume. That is:
The total Data Valuation will be divided by this number, an average value per Gigabyte is extracted, and the value of each schema is then computed by multiplying its data volume by that average number.
Time Dependent Model
Another model is one with time-dependent variable weights
for value of each part of the company’s valuation, i.e.:
Valuation = h(t) Human Capital + p(t) Physical Capital + d(t)
Data Capital
With
constrains:
h(t) + p(t) + d(t) = 1 and h(t) ≥ 0, p(t) ≥ 0, d(t) ≥ 0
Weighted Schema Model
In this
model, we are capturing the importance of each schema – via the factor a(n) –
and the historical data volume available for each Schema by the second sum and
the factor b(m).
The exp(1-m)
is intended to model the aging and staleness of the data.
No comments:
Post a Comment