Data scientists and analysts have some of the hottest skills in the IT market right now. They are being paid high salaries and are using advanced analysis techniques to uncover hidden patterns and indicators within an organisations that can be utilised to drive decision making. Analytics and insights when visualised with tools like Tableau or Qlik Sense, can be pretty damn sexy. With all this hype and excitement there is a huge amount of pressure to go out and get your very own expert to start immediately producing these magical insights. But what is often overlooked is the even sexier area of data governance that is at the core of any successful insights program.
Data governance? I’m not convinced it’s as sexy as you think.
Have you ever been in a meeting and been asked how many customers that you have, or how many Tier 1 clients in your portfolio, or who your most important customer is? If I ask these questions in my business, I get answers like “around X” or “it depends on what you mean by Tier 1” and so on. They are simple questions that are incredibly difficult to nail down. In many organisations our definitions of data terms are not clearly defined, they become subjective and this drains the value from any analysis that we might undertake. If data is not accurate and trusted, the insights we gain from the data are meaningless and certainly should not drive decision making.
What we need is a data governance framework, to provide the organisation with high quality data. This framework helps ensure quality, access and integration of data to maximise the effectiveness of the data analytics solutions. It encompasses the policies, procedures, roles and responsibilities relating to data management.
I’ll show you my framework if you show me yours
There are lots of ways to visualise a data governance framework, but this is one of my favourites, which has been created by the University of Notre Dame in the US.
This framework is effective because it focuses on the need to provide access to data as an outcome, which is facilitated through the use of technology but is reliant on defined business processes and attitudes to ensure that this accurate data is accessed in a controlled and appropriate manner.
Lean mean data machine
If you are a larger organisation, or an organisation that has a large number of data repositories or just simply a very large amount of data, defining and deploying a data governance framework across all of your data is going to be extremely challenging and highly unlikely to succeed. A much lower risk method is to utilise a “lean” approach. That is, to identify one or two areas where there will be a clear business benefit from potential improvement and where you can also have the senior support and resources required to drive a successful outcome. If you are successful with these lead areas, you will be able to deploy your framework iteratively across the entire organisation.
Data governance just got RACI
Across an organisation there are many “keepers” of data and an effective data governance framework will require their buy-in. One way to achieve this is through the establishment of a Data Council. Members of this group should represent each of the data custodians or stewards. These data custodians must first classify the data that will be governed by the framework. In classifying the data they must agree on a single definition of each term. This effort has been estimated to take up to 10 hours per term, so it is not a simple or quick undertaking. The use of the RACI (Responsible, Accountable, Consulted, Informed) matrix allows stakeholders across the organisation to self-declare an interest in specific data terms, which increases the likelihood that the stakeholder will buy-in to the data governance process.
An example of the RACI matrix is shown below:
Remember that as the business changes over time, the definition of these data items may also change. It is the responsibility of the Data Council to keep the definitions up to date and they should use the RACI matrix to engage stakeholders and guide the process of review and approval.
The data dictionary will allow high value data fields that are utilised by many parts of the organisation to be identified. These then drive the design of the core data object models, used to architect the data warehouse/s.
No time like the present
The process of extracting data from legacy systems, transforming and loading into the data warehouse, will highlight data quality issues which may be a result of data entry or possibly erroneous business logic. This is an opportune time to address the data quality issues, before the data is inserted into the warehouse and ideally fixing them in the source system.
Another consideration at this stage is the reporting required by the business. The development of key reports that are in a format expected by the various user groups, which serve to answer the most common and important queries for the business, will drive consistency of the use of data across the organisation as well as promoting user adoption.
The consideration of reporting will also drive the development of the security model which must sit across the data warehouse and control access to the data. Applying the security model on the data warehouse allows the data custodian to control access to the data at a row and field level. This approach avoids the problematic alternative of working directly with a legacy system application security model.
The next two data governance areas to be considered in the warehousing solution are compliance and archiving. These areas are likely to be driven by external influences such as regulatory bodies that might include APRA, PPIPA, HRIPA, State Records Act and many others. These will define how data must be stored to maintain compliance and how long data must be maintained. These factors can have large impacts on the solution given the different volume of data that may be required to be maintained.
If you build it, they might not come…
The final element of the framework is training and awareness. This is a critical area that is frequently overlooked. I once worked with a clients analytics team who couldn’t understand why people weren’t using their reports. The team had created around 170 insightful reports, deploying them as they were built on their company intranet. The problem was they didn’t tell anyone that they were there, or why the report was relevant.
When you are building your reporting solution, it is critical to engage with the stakeholders to gather their input before the reports are developed. Communicating and training stakeholders to interpret the reports is critical to their understanding and application. Any reporting solution generated by an analytics team in isolation, is at risk of being a report for the sake of it.
I told you governance was sexy
While we all love to create data visualisations, without a data governance framework these can easily become interactive pictures without insight. A data governance framework doesn’t have to be the death of the party, but it must address issues with data quality, standardising data definitions, consider appropriate security measures, as well as compliance and retention requirements. Like all IT projects, I recommend a lean approach, by working with a smaller subset of high value business data to design a governance framework that will be practical in your organisation. Once validated, you can continue to deploy that framework iteratively across your entire organisation and data sources.