An introduction to data quality
According to a survey from InformationWeek Analytics*, data quality in 2015 is «still the No. 1 "barrier to success" cited by both Business Intelligence and analytics types and information management professionals».
The data contained in a company's databases describe facts from the real world at the time they're entered in the information system. But for how long do they stay up to date since reality keeps moving on ? As time goes by, a « distance » is installed between the data stored in databases and the reality they represent.
This distance between the « reality » and the data which describe it constitutes a business risk of more or less importance depending on the significance of the data compared to the « business » needs. Thus, as an example, let's take the case of customers' mailing addresses. How does the company manage the address changes ? What consequences holds a « out of date » address for the company ?
This risk gets bigger when the data are used to feed the decision processes as part of Business Intelligence or are traded with external partners to the company (customer, supplier, administration…). How do you measure the impact of « wrong » data transmitted to tax administration, by instance ?
For REVER, as far as "data quality" is concerned, you must distinguish two essential aspects:
- data timeliness: the objective is to guarantee data compliance in relation to the reality of the facts it represents (e.g. the customers' addresses in a CRM), this necessitates organizational procedures within the company in order to guarantee correctness (role of the data stewart, for example)
- data accuracy within information systems aims at guaranteeing that the recorded data respect the rules defined by the « business » and that they are not contradictory. This aspect of quality can be controled by tools. A couple of examples of incoherent or contradictory data: a nonexisting date (31 June or an impossible leap year) or the number of children of a customer which wouldn't be the same number from one database to another (Does Mrs. X have 2 or 3 children?)
Data timeliness is a matter of the company's and its employees' work, while data accuracy can be managed by applications and is a matter of the IT field.
MATCHING BETWEEN DATABASES AND REALITY OF THE WORLD
A company has to measure data quality from a « risk » angle in line with its business goals. The efforts required to reach an acceptable level of data quality must be proportional to the risks incurred… In this way, it is useless - even possibly quite costly - to maintain mailing addresses up to date if the company does not use them, for example…
It's up to the « business », and according to the importance of each of its missions, to define and enforce a certain level of data quality requirement, since it's understood that data quality cannot be 100% certified.
REVER's proposed solutions :
It's about :
- separating the technical part from the organizational part
- allowing detection of data inconsistencies in databases inside an application or among several applications
- allowing the identification of data which are concerned by critical activity of the « business » and focusing all efforts towards constant improvement of data quality for that particular activity
*Source: 2015 Analytics & BI Survey