Data Warehouse Decisions: The signs you are pushing Excel beyond its limits.
Updated: Aug 10, 2023
Bet you won’t guess what the most widely used database is?
OK, well maybe the title gave it away, it’s Excel. In almost every business in the world, there is someone who is using an Excel Spreadsheet like a database. Often the use of Excel is appropriate. It’s a tool that the majority of business users are comfortable with and it does a pretty good job of managing smaller volumes of data. Officially Excel can handle over 1 million rows of data, but in practice once you are getting around 70,000 rows Excel becomes slow and cumbersome.
Another major difference between Excel and a data warehouse is the way that it manages changes to the data. I’m sure everyone has experienced the pain of accidentally copying over a cell or dragging a formula too far, or not far enough. The little slip of a mouse can have some pretty major implications to the data. Worse still is that these little errors can creep in, without anyone realising, and tracking them down can turn into a major nightmare.
What are the signs that you are using excel as a data warehouse?
Excel is a spreadsheet tool. It’s an electronic ledger that was originally created to replace paper accounting worksheets. It’s also great at storing tabular data.
So, when do you cross the spreadsheet/data warehouse line?
In my experience there are a couple of signs:
You are starting to get tens of thousands of rows in your spreadsheet.
You are downloading CSV files from multiple source systems and importing them into a central spreadsheet.
You know the spreadsheet is really important, so now you are storing it in Sharepoint or Google Drive or a Shared Folder to ensure multiple people have access and it is backed up.
You consider the Excel file to be business critical.
If you are experiencing any of these, then it’s probably getting to the point that you need to re-evaluate, or at least think about your strategy moving forward.
Why don’t businesses take action?
A lot of businesses are using Excel as a data warehouse, and lots of those businesses know that it is a problem, but yet they don’t address the issue. Generally, there are two blockers:
Tech skills
Cost
Tech Skills
The raw source data is often located in a few different systems. The data is extracted by downloading to a CSV and then added to the Excel file. It’s a pretty manual process, but it’s simple and anyone can probably do it.
Switching to a data warehouse means thinking about how to get that source data into the warehouse. That in itself can be quite overwhelming, particularly if you have a small IT team. While having an automated data pipeline might be the goal, it doesn’t have to be the first step. Your data warehouse can load data from a CSV, using a simple user interface. So, as a first step, you could choose to keep your manual load process. There are also a range of new visual low code tools that enable you to automate these sorts of tasks, without needing programming skills. The pipeline setup is also a once-off task, so thinking about how to get that delivered externally, could also avoid the tech skills problem.
Cost
Excel is essentially free, so that is going to be a hard one to beat on price. What you also might want to consider is how much time you are wasting waiting for the spreadsheet to load and calculate, the cost of chasing down those errors and the time you waste manually moving data around. The other potential costs to consider are your business continuity impacts. What will it cost your business if this file is corrupted and becomes unusable?
The cost of your data warehouse is going to include:
Data warehouse costs
Data pipeline costs
Staff costs
Data warehouse costs
If you are thinking about a data warehouse, you should only consider a cloud data warehouse. There are a million reasons why, but if we just focus on the operational costs with a cloud data warehouse, you only pay for the time that you access your data. If your data is sitting there and you are not asking a question then you are not paying (yes, you pay storage costs but that is going to be around $25/month). If you design your warehouse well, then your costs can be minimised.
Data pipeline costs
If you have decided to automate your data transfer from your source system to your data warehouse, then you are going to have some ongoing costs. There are lots of tools available and some of these charge by the row of data transferred, so picking the right one is important. The build costs of the data pipeline can also be minimsed by using low code visual tools.
This is definitely an area where you need to be careful and get some good advice. There are also lots of tools that are open source and do a great job, and can help minimise your costs in this space.
Staff costs
So traditional databases like SQL Server, MySQL, Postgres and similar platforms typically have a Database Administrator (DBA), whose job it is to look after the platform. They tune it, they back it up, they patch it, they make a full-time job out of it. One of the reasons that I would only consider a Cloud Data Warehouse is because they are fully managed platforms. That means that you don’t need to do any of those things anymore, and so you don’t need a full-time DBA.
For a smaller organisation, I would suggest working with a partner to help with the initial setup. Once that is complete, you can always have your partner provide advice and help with any issues or changes that you might want to implement down the track. That way the partner can provide you access to multiple data experts, without you needing to carry the payroll liability.
Conclusion
Excel is undoubtedly a versatile and widely used tool for managing smaller volumes of data, but there comes a point where its limitations become apparent. Recognising the signs that you have crossed the spreadsheet/data warehouse line is crucial. While it might be tempting to continue using Excel, due to its familiarity and ease of use, there are compelling reasons to consider a shift towards a dedicated data warehouse.
The decision to move from Excel to a data warehouse is a strategic one, considering the growth and complexity of your data needs. As you evaluate your current data management practices, keep in mind that the journey to a data warehouse is an investment that pays off in valuable insights and smoother data operations for the future.
Commentaires