The very first step when creating a Tableau dashboard is to set up a connection with a data source. Tableau allows users to connect with several kinds of data source connections such as a text file (csv), Google Sheets, Microsoft Excel file, hyper file, or a direct connection with various databases like Google BigQuery.
Connecting to the data sources that Tableau supports is generally a smooth and straight forward process. However, a lot of times, once the connection is established and dashboards are built, you might have to start all over at the data source stage again because of limitations with the data source connection that you were unaware of.
So, in this article, we will go through a few examples of limitations for some frequently used data sources.
We often store our data in Google Sheets as it is convenient to make changes, overwrite and update files. Tableau also supports Google Sheets connections. However, one of the very first limitations we should be aware of is that Google Sheets can only store data up to 5 million cells or 18,278 columns (up to column ZZZ). This won’t be an issue if adding new rows/cells to the file every day, week or month, etc. does not exceed 5 million cells.
However, what if your data exceeds the size limitations? In this case, we could stack up Google Sheets files using the Union function in Tableau as a possible solution. However, a single union can only have a maximum of 10 inputs. Since we are often dealing with quite large datasets, eventually you’ll likely reach a limitation again.
Google Sheets, as a connection with Tableau, seems like a great choice in terms of maintenance and convenience, but if you know you have a large volume of data, you might want to consider other options.
Microsoft Excel is a powerful tool for doing an analysis. Usually, calculations are tied up in Excel and then the whole Excel file is imported into Tableau as a data source. Like Google Sheets, Microsoft Excel has a drawback in terms of a maximum limit and only allows up to 1,048,576 rows by 16,384 columns.
On top of that, Microsoft Excel is stored locally, unlike Google Sheets which is stored in the cloud. Another point to consider is if you want to keep data fresh, you have to manually replace the current Excel file with an updated Excel file.
It is possible to automate this task to save time, but that would require programming knowledge or the use of additional tools. Here is an article written by my colleague on how to automatically refresh Excel/csv files.
Tableau Published Data Source
Tableau Published Data Source is a good choice in terms of security. Many organizations have a policy to which data must be kept and used in the form of published data sources. This helps protect data from being edited or manipulated by only allowing authorized users to make changes. Even if users do not not have a Tableau Desktop license, they can still connect with published data sources and make changes to the dashboards through web authoring. It is also great in terms of keeping data fresh as there are several options to do so.
However, when we build a dashboard, we need many columns from different data tables. So, we might need to use a built-in join function in Tableau Desktop. In 2021, Tableau released a new function called Relationship. It is quite similar to join, but the relationship function is more automated, for example, you do not need to determine a join type (inner, left, right, outer) like before. It is also flexible and lets you establish one-to-one, one-to-many relationships more easily.
However, there are some circumstances where you have a secure data policy to only use Tableau published data sources, and you have many tables you want to connect together. Relationships have a drawback because you will not be able to define relationships between published data sources. You might have to workaround a bit, by connecting extract data sources in Tableau Desktop, define relationships between tables, or do the join, then publish it as a single data source on to Tableau Server.
Each data source connection has its own benefits and drawbacks. One thing to keep in mind is that you need to know your data before designing the data source. It might take some time to check all the requirements in the beginning, but it could save you a lot of time by not having to come back and redesign it all over again.