# Back to school for execs, data school that is - Part 2

In __Part 1__ of this series of articles, we covered the baseline of analytics with the discussion of populations and samples, means and medians, errors and data quality as well as standard deviation and z scores. In this article we are going to focus on reporting and specifically how an executive can understand the types of visualisations that will be frequently presented as part of regular operational meetings. The goal of this article is to give you an understanding of each of the chart types and the confidence to ask questions.

## Visualisations

Visualisations are the way in which analysts communicate insights from data and will form a large portion of the reporting provided to most executives. Visualisations, when used correctly, are able to convey insights from large and complex sets of data. There are a wide range of visualisations that are available, but a few that you will come across regularly.

**Bar/Column Chart**

You might be think "I know what a bar chart is, what else is there to learn", but stay with me. The bar/column chart is used to display data that is independent of each other, for example nationality or employment status might be displayed on a bar chart.

*Chart 1 - Column Chart*

The bar/column chart is generally the best visualisation when comparing the relative size of the segments of information.

The bar/column chart is generally the best visualisation when comparing the relative size of the segments of information. This is because humans are very effective at comparing the relative size of rectangles.

The data on the chart may be displayed either horizontally (bar) or vertically (column). If there are a very large number of bars to display the use of the horizontal bar chart can make the information more readable and therefore preferred.

*Chart 2 - Bar Chart*

A common enhancement to the standard bar/column chart is the clustered bar/column chart. In this case an additional level of detail is able to be displayed on a single visualisation. In the example below, we can see that in addition to the change in total employees over the three year period, we are also shown how the number of males versus females has changed during that period.

*Chart 3 - Clustered Column/Bar Chart*

This is a very effective approach to allow the reader to easily compare the trend of each category over time as well as to each other. It does however, make it more difficult to see the change in total number of employees. This can be easily overcome with the addition of the total as a third column. Now we can see that overall there is very little movement in the total number of employees, but a significant change in the gender balance of the workforce.

*Chart 4 - Clustered Column/Bar Chart with Total*

Another common variation of the bar/column chart is the use of the stacked bar/column. The chart below shows the same information as the chart above. The overall height of the column shows the total number of employees, with the blue and orange representing the male and female employees. You can immediately see, that it is much more difficult to quickly assess the change in the male/female representation of the workforce with the use of the stacked column chart in this case.

*Chart 5 - Stacked Column Chart*

**Line Chart**

The line chart is used to show how a value has changed over time. For example, we can see how the total number of employees has changed from 2017-2019. The line chart is more effective than the bar/column when the change in value is very small, as humans are very good at evaluating if two lines are parallel. In the example below with the horizontal grid lines, we can see that the change from 2017 to 2018 is larger than the change from 2018 to 2019. This level of detail was more difficult to detect in Chart 1 for instance.

*Chart 6 - Line Chart*

It is important when using a line chart that the information is actually related. For example it is not logical to use a line chart to display the gender split in a single year as shown below.

*Chart 7 - Incorrect use of Line Chart*

**Scatter Plot**

The scatter plot, also sometimes referred to as the X-Y plot is used to look at the relationship between two values. An example of a scatter plot is shown below, displaying how the speed of a car affects the stopping distance. As we would expect, as the speed increases so does the stopping distance.

*Chart 8 - Scatter Plot*

Using a scatter plot, it is easy for our eyes to see the overall trend of the relationship, but this can be helped by adding a trend line to the chart, as shown below.

*Chart 9 - Scatter Plot with Trendline*

In the above example we say there is a positive correlation (relationship) between the weight and the height, because as one increases so does the other. We can also have a negative correlation (relationship) as shown below, where we see inflation decreasing we have unemployment rising.

*Chart 10 - Scatter Plot with Negative Correlation*

We can also have a scatter plot that shows no relationship between the two variables being displayed, as shown below. In this example the dots are randomly placed. If we know the value of the x axis because there is no correlation, we could not determine what the y value would be.

*Chart 11 - Scatter Plot with No correlation*

The scatter plot can be very useful when making a decision, for example if we look at the chart below we can see that as we increase our advertising spend we see an increase in our sales. The question would be

"If I need to hit my sales target of $15M, how much do I need to spend on advertising?"

From the chart, we can make a data driven decision and determine that I need to spend around $135K on advertising.

*Chart 12 - Using Scatter Plot for Decision Making*

Now before we go out and spend $135K, we would probably want to be fairly confident that we are going to actually bring in those sales. The way that we measure the strength of that relationship is through the correlation coefficient, which may also be referred to as r. Your analyst will be calculating the correlation coefficient from the data, and should be presenting it in their report in situations like the one displayed in Chart 12. The correlation coefficient has a maximum value of 1 and a minimum value of -1 which occurs only when all of the dots are exactly on the trend line.

There are no definitive rules on how strong the correlation needs to be, to rely on it for decision making as it is very dependant on the specifics of your operating environment.

As a very broad general guide look for a correlation of 0.7 and above for a positive correlation or -0.7 and below for a negative correlation.

**Pie Chart**

The pie chart is used to show the proportion of a number of items relative to each other. The pie chart is quite possibly, the most overused visualisation of all time. This is because it is very difficult for the human eye to compare the relative size of the circular segments.

The pie chart is quite possibly, the most overused visualisation of all time.

__Bernard Marr__ put together the example below, which shows the same set of data displayed on a pie chart and on a bar chart. Using the bar chart it is immediately obvious which Product has the best sales, the same cannot be said for the pie chart.

*Chart 13 - Pie Chart*

*Chart 14 - Bar Chart with Same Information as Pie Chart*

Pie charts really should be used in only very limited situations. These are when the parts combine to make a whole ie ethnicity of a country's population or hair colour. Additionally, only when there is less than 5 categories. If you have more than 5 categories, then the pie chart will become too cluttered and a bar chart would be a better choice.

**Histogram**

The histogram is a type of bar chart that is used to show the distribution of the data. This is the type of chart used to determine is a value is normally distributed or follows a bell curve, as shown below.

*Chart 15 - Normal Distribution*

To create a histogram we split one of the variables into categories or bins. For example, if we are looking at the age distribution of a population we would split the age into categories ie 0-5, 5-10, 10-15, 15-20 and so on. Then we count the people who fit into each category. The 2016 census data for Australia is shown below, with the number column showing the count of people who fit into each of the 5 year age categories.

*Table 1 - Australian Population Data (2016 Census)*

And we can see the histogram of the data below.

*Chart 16 - Australia's Population Distribution (Census 2016)*

The histogram may not be commonly used in executive reports but it is important for an analyst to be reviewing when conducting their analysis for the executive report, as different distributions can lead to a range of issues which will be discussed in the next article in this series.

## Summary

Visualisations form a major part of the reporting presented to executives, given their ability to communicate insights quickly from large amounts of data. We have reviewed the most common types of visualisations, when they should be used and some of the common pitfalls of each. In the next article in this series we will be looking at how to interpret the information displayed in a visualisation and how poor use of a visualisation, can lead to erroneous management decision making.

## Kommentare