Managing Data Quality
20
Data as a raw material
When using data, many individuals assume that they can use all the data ‘as is’. This assumption is flawed, as explained by the analogy of how artisans making furniture vary their approach according to the raw materials they are using:
For a raw material such as newly manufactured metal or plastic, the product will be highly consistent with conformance or test certificates proving the quality of the material. This consistency of product means that all the material could be used, with the main challenge being how to minimise wastage.
For a raw material of seasoned wood, the product is likely to have some pieces with knots in them, be slightly warped and/or include areas of woodworm or rot. In this case, the artisan assesses each piece of wood to determine the best way to utilise it in order to make furniture, perhaps rejecting pieces that were too rotten or warped or choosing to use the more knotty pieces of wood for the back of the furniture where it is less visible/critical.
When undertaking data exploitation, data are the raw material – but are a raw material that can vary from being like a metal through to being like wood. Before undertaking any analysis using data, it is important to consider whether the data are like wood and need understanding of their quality to know which data should perhaps be ignored, which may need cleansing and which seems reliable. The aggregate assessment of the quality of the input data will help inform how you describe the confidence levels of the data outcomes.
Ideally, organisations will improve the sourcing of their data in the longer term, and look to implement the quality control and certification that ensures the data become like the metals and plastics that enable consistent manufacturing processes in this analogy.
The data machine: expectations vs reality
Acquiring, storing, managing and exploiting data within an organisation involves many activities and processes. These activities could be thought of as a machine powering the organisation. If you had to visualise the ‘data machine’ that represents your organisation, what sort of machine is it?
In a quarry, an excavator is a large (and expensive) piece of equipment that is essential for loading rock blasted from the rock face into dumper trucks for transporting to the processing plant. These are sophisticated and powerful machines that, if maintained correctly, should load rock effectively and efficiently for many years.
Implementation of an enterprise software solution can be likened to this excavator – it can provide a powerful and, arguably, efficient means to run business processes and acquire and analyse data. Similar to a major piece of equipment in a quarry, this will be a large investment for the organisation.
If you were to visualise the data processing activities of your organisation as a machine, would it be similar to this major item and be a single, efficient and effective entity? In reality, perhaps, there could be a few manual work-arounds to overcome deficiencies in certain
Challenges when exploiting and managing data
21
areas, the use of some spreadsheet-based tools for manipulating data and interfaces with other corporate systems that do not function perfectly. In such cases, the machine that you imagine to represent the way that your organisation exploits data will be more like the types of machine imagined by the artist William Heath Robinson in the early 20th century: complicated, inefficient, tenuously held together and lacking proper documentation.
Whilst you perhaps visualise your data machine as being closer to the latter, how would the senior leaders in your organisation view it? If they have recently invested in enterprise software tools, then they could believe that the ‘data machine’ is a single, efficient entity, when in reality there are a number of data quality issues arising from data migration problems, interfaces that do not function correctly and so on.
This analogy illustrates how organisations can have an unrealistic perception of the way that they process and exploit data, believing them to be far more coherent and efficient than they actually are. Later in this book we define approaches to help provide structure and definition to your ‘data machine’.
Do your data trust you?
You could ask (or be asked) the relevant question: Do you trust your data? What if you were to turn this question on its head? This leads us to think about the issues from the perspective of the data, that is if I am a data set can I trust the people who handle me to do the right thing with or to me? Do your data trust you? This can be another way of illustrating the importance of managing data correctly.
The data set’s view
It’s taken me a long time to get to where I am. I started small and now I’m so big that people often refer to me as ‘big data’. I still, however, have my needs and the same aims in life: I want to be reliable, dependable, available and adapt to change; I don’t necessarily want to be perfect, but fit for your use; I don’t want to be too needy and expensive to keep, otherwise you would archive or replace me.
Please remember, if you treat me well I will be faithful and true. Treat me unkindly and I will do whatever I can to make your life frustrating. Just think what I could do if I ever get to drive vehicles for you!
If you remove my competition, I promise I will be stronger; I do not like to share the limelight with inferior spreadsheet copies of me. If you put garbage into me, then I won’t be happy and will only send even more garbage back at you.
If you care for me, then I will never become out of date; I have the ability to change if you allow me to. If I change, however, I want to be sure I’ve changed for the right reasons and not just because you think I will look better if I was different, or that I will fit into your perceived view of me. After all, I am what I am and we will work together better if we fully understand each other.
Managing Data Quality
22
If you have not specified what I need to look like, then I have no idea how to behave or what to tell you; I’ll try and predict this and tell you what I think you want to know, which perhaps will not be right.
I can be so clever if I am set up correctly; I can autocorrect, provide you with selectable values, undo and even auto-archive. But if you don’t take care, I have the ability to ruin your day. I have a whole catalogue of sayings to throw at you. Here are some of my favourites:
invalid key stroke;
invalid data format;
command not recognised;
no search results found;
data out of range;
can’t undo;
last session did not save.
Sometimes I get abused. The finance department often reshapes me and turns me into something I don’t recognise of myself; I think they call this accountancy. These days I’m used a lot in business intelligence, or ‘BI’ for short; they don’t like what I tell them most of the time.
The most fun I ever had was when someone put me through a data migration project. Someone thought I would be very different and much improved as a result, but no, I’m still the same. At least I have a new shiny coat to wear and I fared better than some of my cousins; one of them became schizophrenic after he was part of failed data migration. He has never been the same since!
I was so chuffed recently when I was honoured with the title of being the ‘single version of the truth’, until I found out all my brothers and cousins were also receiving this award.
So, please be very careful when handling me. I do not like any of the following:
1. Being changed – unless my physical representation has also changed, and you have been given permission to change me. I hate to be ‘dressed up’ or made to look good because I don’t give you the answer you want to see; let me be me.
2. Being shared – unless