Creating Insightful Dashboards with Spark and Tableau Desktop

Large-scale data visualization automation with Tableau Desktop Tool

Photo by Author

1. Introduction

As a visual representation of data, data visualization is a widely adopted method in data analytics to gain useful business insights (e.g., trends, patterns, outliers, correlations, etc.) from large-scale datasets. Recently, I presented a software development method for using Spark, Plotly, and Dash to develop interactive and insightful data visualization dashboards for Web applications in Python [1].

Similarly to [1], this paper uses the same open source dataset as used in [2] to show how to use Spark and Tableau Desktop [3] to create insightful dashboards from large-scale datasets in Cloud data lake without programming.

Figure 1: High-level overview of work flow.

Figure 1 shows the high-level overview of the work flow. It consists of the following major steps:

connecting the Tableau Desktop for dashboard authoring to Sparkquerying dataset from Cloud Data Lakecreating data visualization graphs from loaded datasetcreating dashboards from created individual graphspublishing dashboards to Tableau server for sharing

2. Connecting Tableau Desktop to Spark

As described in [4], the following steps can be followed to use Spark SQL as a distributed query engine using its JDBC/ODBC [5] and connect Tableau Desktop to the distributed Spark SQL Engine [4]:

install Hadoopsetup Hivesetup MySQLsetup Sparksetup Tableau Desktop

3. Querying Dataset from Cloud Data Lake

Once Tableau Desktop has been connected to the distributed Spark SQL engine successfully, we should be able to browse to the default schema and see the Hive Hadoop cluster tables [4].

From the perspective of creating dashboards with Tableau Desktop, there is no difference between a table loaded from a Hive Hadoop cluster and a table loaded from a local Microsoft Excel file. For convenience, the free version of Tableau Desktop Public with a local Excel file that is converted from the dataset csv file in [2] will be used for demonstration purpose in this paper.

4. Creating Data Visualization Graphs

We need to create individual visualization graphs first before visualization dashboards can be created.

We can use Tableau Desktop to create many different types of graphs. As described in [1], some of the graphs are suitable for visualizing continuous numeric features, while others are suitable for visualizing discrete categorical features.

Similarly to [1], this paper uses Tableau Desktop to create the following common diagrams for demonstration purpose.

Graphs for numeric features: scatter plot, histogram chart, and line chartGraphs for categorical features: bar chart, line chart, and pie chart

4.1 Graphs for Numeric Features

Tableau Desktop uses the symbol # to indicate numeric features. This subsection shows how to use Tableau Desktop to create the following three common graphs for numeric features:

scatter plothistogram chartline chart

4.1.1 Scatter Plot

For a pair of numeric features, scatter plot uses each pair of feature values as coordinates to draw a point on a 2D plane. As example, as in [1], Figure 2 shows a scatter plot of two numeric features Patient ID and Admission Deposit for people from 21 to 30 years old. The feature Type of Admission is used for color coding.

The following steps can be followed to create the scatter plot:

drag the feature Patientid and drop it into the Columns shelfdrag the feature Admission Deposit and drop it into the Rows shelfdrag the feature Type of Admission and drop it on to the Color property of the Marks Cardclick the dropdown on Marks Card and select Circleright-click on the feature Age, choose show filter, and select 21–30 onlyFigure 2: Sample scatter plot for a pair of numeric features.

This scatter plot reveals the business insight that the majority of emergency and trauma patients between 21–30 years old had a deposit in the range of $3,000 to $6,000.

4.1.2 Histogram Chart

Tableau Desktop treats a histogram as a bar chart. Because of this, the numeric feature for columns needs to be converted to categorical by binning, and the numeric feature for rows needs to be aggregated such as summation.

The following steps can be followed to create the histogram in Figure 3:

right-click on the numeric feature Patientid and select “create bins” to create a new feature Patientid (bin).drag the new feature Patientid (bin) and drop it into the Columns shelfdrag the numeric feature Admission Deposit and drop it into the Rows shelf, and then click on the dropped feature and select aggregation SUMdrag the categorical feature Type of Admission and drop it on to the Color property of the Marks Cardclick the dropdown on Marks Card and select Barright-click on the feature Age, choose show filter, and select 21–30 onlyFigure 3: Sample histogram chart for a pair of numeric features.

This histogram shows us the business insight that urgent patients between 21–30 years old had the smallest total amount of admission deposit, while the emergency patients had the largest total amount of admission deposit.

4.1.3 Line Chart

Similar to the creation of scatter plot, the following steps can be followed to create the line chart in Figure 4:

drag the feature Patientid and drop it into the Columns shelfdrag the feature Admission Deposit and drop it into the Rows shelfdrag the feature Type of Admission and drop it on to the Color property of the Marks Cardclick the dropdown on Marks Card and select Lineright-click on the feature Age, choose show filter, and select 21–30 onlyFigure 4: Sample line chart for a pair of numeric features.

Similar to scatter plot, this line chart shows the same business insight that majority of emergency and trauma patients between 21–30 years old had a deposit in the range of $3,000 to $6,000.

4.2 Graphs for Categorical Features

Tableau Desktop uses the symbol Abc to indicate categorical features. This subsection shows how to use Tableau Desktop to create the following three of the common graphs for categorical feature value counts:

bar chartline chartpie chart

4.2.1 Bar Chart

As an example, the following steps can be followed to create the bar chart in Figure 5 for the value counts of the categorical feature Stay:

drag the feature Stay and drop it into the Columns shelfdrag the same feature Stay and drop it into the Rows shelf, and then click on the dropped feature and select aggregation Countclick the dropdown on Marks Card and select Barright-click on the feature Age, choose show filter, and select 21–30 onlyClick on the Color property of Marks Card and select a purple colorFigure 5: Sample bar chart for categorical feature value counts.

This bar chart shows the business insight that for patients between 21–30 years old, more patients stayed in hospital for 21–30 days compared with other hospitalization intervals.

4.2.2 Line Chart

Similar to bar chart, the following steps can be followed to create the line chart in Figure 6 for the value counts of the categorical feature Stay:

drag the feature Stay and drop it into the Columns shelfdrag the same feature Stay and drop it into the Rows shelf, and then click on the dropped feature and select aggregation Countclick the dropdown on Marks Card and select Lineright-click on the feature Age, choose show filter, and select 21–30 onlyClick on the Color property of Marks Card and select a purple colorFigure 6: Sample line chart for categorical feature value counts.

Similar to bar chart, this line chart reals the same business insight that for patients between 21–30 years old, more patients stayed in hospital for 21–30 days compared with other hospitalization intervals.

4.2.3 Pie Chart

The creation of pie chart is not as straight forward as the creation of line and bar charts.

we can follow the following steps to create the pie chart in Figure 8 for the value counts of the categorical feature Stay:

drag the feature Stay and drop it into the Columns shelfdrag the same feature Stay and drop it into the Rows shelf, and then click on the dropped feature and select aggregation Countright-click on the feature Age, choose show filter, and select 21–30 onlyclick Show Me on the upper-right corner on the graphing sheet and select the pie icon. A small pie chart will show up as shown in Figure 7.increase the size of the pie chart by selecting the Size property of the Marks Card and then selecting and dragging the bounding box of the pie chartdrag the aggregation feature CNT(Stay) and drop on to the Label property of Marks CardFigure 7: Converting bar chart to pie chart.

Figure 8 shows the final pie chart after following the above steps.

Figure 8: Sample pie chart for categorical feature value counts.

Similar to bar chart and line chat, this pie chart confirms the same business insight that for patients between 21–30 years old, more patients (2,197 in total) stayed in hospital for 21–30 days compared with other hospitalization intervals.

5. Creating Dashboards

Once individual graphs have been created, we can select and combine individual graphs into a dashboard.

5.1 Dashboard for Visualizing Pair of Numeric Features

The following procedure can be followed to create a dashboard for the visualization of the pair of numeric features Patientid and Admission Deposit.

Step 1: select the Dashboard menu and then select New Dashboard

Step 2: drag the following created graph sheets one by one and drop on to the new dashboard sheet:

Sample Scatter Plot of Pair of Numeric FeaturesSample Histogram of Pair of Numeric FeaturesSample Line chart of Pair of Numeric Feature

Figure 9 shows the new created dashboard.

Figure 9: Sample dashboard for visualizing a pair of numeric features.

5.2 Dashboard for Visualizing Categorical Value Counts

The following procedure can be followed to create a dashboard for the visualization of the value counts of the categorical feature Stay.

Step 1: select the Dashboard menu and then select New Dashboard

Step 2: drag the following created graph sheets one by one and drop on to the new dashboard sheet:

categorical_feature_barcategorical_feature_linecategorical_feature_pie

Figure 10 shows the new created dashboard.

Figure 10: Sample dashboard for visualizing categorical feature value counts.

6. Publishing Dashboards

Once a dashboard is created, it can be published to Tableau Server for sharing.

There are three types of servers:

Tableau PublicTableau ServerTableau Cloud

The dashboards created in this paper can only be published to Tableau Public because the free Tableau Desktop Public has been used for creating those dashboards.

The publishing steps in [6] can be followed to publish the dashboards with related dataset to Tableau Public server for public view.

7. Conclusion

Tableau is a multi-tier visual data analytics platform with complicated architecture. This paper presented a method to integrate Spark with Tableau for query data from large-scale data lake (e.g., Hadoop Hive) in Cloud first and then demonstrated how to use Tableau Desktop to create insightful dashboards from loaded datasets without programming.

As a dashboard authoring and sharing tool, Tableau Desktop supports many different ways of creating visualization graphs. It can be confusing to understand where to start and how to create insightful visualization graphs and dashboards. This paper can help to learn Tableau Desktop quickly by focusing on only two simple visualization scenarios: one for visualizing a pair of numeric features and the other for visualizing the value counts of a categorical feature.

References

[1] Yu Huang, Developing Interactive and Insightful Dashboards with Spark and Plotly Dash

[2] Yu Huang, Predicting Hospitalized Time of Covid-19 Patients

[3] Tutorial: Get Started with Tableau Desktop

[4] A Guide to Setting up Tableau with Apache Spark

[5] Distributed SQL Engine

[6] Share your findings

Creating Insightful Dashboards with Spark and Tableau Desktop was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Logo

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

We don’t spam!

Leave a Comment

Scroll to Top