Tableau Prep vs Einstein Analytics – Data Cleansing

  1. May 10, 2020 Steps to follow: Open the Tableau and add data source file – YearlyData But there might be a problem in this data. The first indication of which can be the displayed message saying that Data. Till now it is not clear what the actual problem is, so let’s load the data – Products sheet.
  2. Best practices for tidy data using Tableau Prep Think about your data holistically. Before you get started, it’s important to think about how people will use the data. Know the basic structure of your data. Now that you understand how the data will be used, who will use it, and where it.

Data Cleaning in Tableau Last Updated: 10 May, 2020 To visualise data in Tableau, we need a data source file. Most of the times the data file contains no straw value and can be used directly for the visualisation. Using Data Interpreter to clean the data. Though the entire data has been transported into the Tableau workspace, it appears that the data is not in the correct format. Data Interpreter is here to help. Under the sheets tab, click on the Data Interpreter option and the data gets formatted nicely.

This is the 2nd part of the blog series, so make sure you check out part 1 here!

At the recent Tableau conference there was a lot of talk around the Salesforce product Einstein and how this is starting to be integrated into some areas within the Tableau product range. Salesforce and Tableau have allowed users to bring all of their data together by combining Einstein Analytics and Tableau to form Tableau CRM.

Tableau + Einstein = Tableau CRM

This blog is going to focus on comparing the similarities and differences between the data preparation aspects of both products – Tableau Prep and Einstein Analytics. The key areas that I will focus on are:

  • Input – How do you connect to data?
  • Investigation – Getting to know the data and if there are any problems
  • Cleansing – Cleaning the data with filters, calculations and string manipulation
  • Combining – Joining and unioning the data
  • Output – How can you get the data after you’ve finished?

This part is all about Cleansing your data so that it ready to find some insights.

Cleansing

One of the most important aspects of a data preparation tool is what abilities you have to clean the data and fix any issues that there may be. There are many different techniques to clean data but I am going to be focusing on Filtering, Calculations, & String manipulations.

Filtering

Data Cleansing Process

Tableau Prep

There are multiple ways of filtering within Tableau Prep. The most basic is the ‘Selection’ filter where the user can select which fields or rows to filter from the view. The Profile Pane is great for this, as all the user has to do is select the field or value that they want to filter and click ‘Keep Only’ or ‘Remove/Exclude’:

Other filtering options include using one of the following:

  • Calculation: this must end in a Boolean and uses Tableau’s normally calculation syntax
Data
  • Wildcard: this allows you to filter a discrete field where the value doesn’t have to exactly match the full value. This is shown with these easy selection options:
  • Null Values: choose to keep Null or Non-Null values from a field

Einstein

Again you can filter your values in a couple of ways within Einstein Analytics, so I’m going to cover both of them.

Dataflow

Filtering in a Dataflow is easy as this is a built in as a transformation so all you need to do is press the filter icon and complete the setup within the tool:

The filter transformation gives you the option to use the standard filter expressions or SAQL. I find using the SAQL syntax to be a bit easier, but both have their benefits depending on the use case. Here’s an example using SAQL:

Recipe

Filtering within a Recipe is also easy as there is a dedicated filter button. It has a different setup compared with the Dataflow as it is more visual and easier to understand as you aren’t relying on SAQL.

You can add multiple conditions and these appear as steps on the left pane within the recipe, making it easier to understand what filters have occurred. This is similar to the changes pane within Tableau Prep and is great for auditing/handing over the process to other people.

Calculations

Now we have filtered our data, it’s time to create some new fields via calculations. Types of calculations can be focused on arithmetic, dates, strings and others. Both Tableau Prep and Einstein Analytics allow you to do a lot with calculations so let’s have a look at how they differ.

Tableau Prep

Just like filtering, Tableau Prep makes it really easy to create a calculation with just a click of a button:

Data Cleaning Tableau Definition

Once pressed the calculation dialog box opens, where you can type fields and Tableau prompts to help you create the calculations. It also has the reference pane on the right side with a list of all of the calculations and brief explanation on each of them.

You can also add some more complex calculations including LODs (Level of Detail) and Rank calculations. Tableau Prep has a visual editor when completing analytical calculations making them them super simple to create:

All very easy and user friendly!

Einstein

As normal let’s split out the two different options with Einstein Analytics to see the different options that are available.

Dataflow

Depending on the type of calculation that you need to create will depend on what tool you use.

There are a couple of options:

Data cleansing best practices
  • computeExpression – these are derived fields generated using a SAQL expression which is based on one or more fields from the workflow. The computeExpression transformation performs calculations based on other fields in the same row. Here’s a simple example:
  • computeRelative – these are similar to computeExpression transformations, however computeRelative performs calculations based on the same field in other rows. It has a similar setup to the computeExpression but this time you select the Partition and Order by values:

Recipe

Now let’s turn our focus to the Recipes and see how these differ from the Dataflows. Within a recipe, instead of having the compute buttons, you need to click on the drop-down on the field you want to create a calculation on, then select ‘Formula’:

This then opens the formula dialog box at the bottom of the screen. From here you can type the formula from scratch, however there aren’t as many prompts so this can be a bit harder if you aren’t sure on the syntax or which fields you want to include.

Making calculations is fairly easy and similar in both Tableau Prep and Einstein, with a few differences around working and syntax. In my experience the computeExpression is similar to normal calculations in Tableau Prep, whilst computeRelative is similar to LOD’s.

Personally, I prefer the syntax and usability of Tableau Prep as I believe it gives you a lot of guidance on what calculations are available and provides an easier user experience especially if you are new to the tool.

String Manipulation

Tableau

The final section that I’ll cover in the Cleansing topic is how each tool deals with manipulating string fields. This is really important when cleaning data, from splitting First and Last Name or parsing particular words from a longer string, so let’s see how each of the tools deal with this!

Tableau Prep

Within Prep there are lots of different ways to manipulate strings. This can be completed within the calculated field by using the different string operators including splits, contains, and RegEx. But there are also lots of native features to help you along the way, and make the harder challenges that little bit easier.

Some of the native features include:

  • Splits – Automatic predicts what the best way to split the string is. This is useful when splitting things like Names. Also, custom splits allow you to take more control by giving the separator and how many times you want to split the string.
  • Clean Strings – There are many different options when cleaning strings including removing punctuation, changing the case, and trimming spaces:

There are lots of other options when working with strings, and these are covered within a great blog post on the Preppin Data website.

Einstein

Again let’s see how Einstein handles string manipulation in both the Dataflow and Recipes.

Dataflow

There aren’t any dedicated string manipulations transformations within a Dataflow so we are reliant on the features within the SAQL expressions in a computeExpression transformation. If you are unfamiliar with these then I would recommend taking a look at the SAQL Reference guide on the Salesforce website. Here you’ll find a section for string operators, however we only have the ability to concatenate strings therefore we are quite limited.

Recipe

Within a Recipe there is a bit more functionality to deal with different strings. In each of the string fields there are various different options including:

  • Trim – this removes leading or trailing whitespace
  • Substring – this allows you to extract a given number of characters from the string
  • Split – here you can choose a separator, from a given list or custom delimiter, and split the field into 2 columns.
  • Upper / Lowercase – Changes the case of the string.

As you can see both tools have the ability to manipulate strings in different ways, with Tableau Prep having a few more native features including RegEx functionality. I have only covered a few basic techniques for both so I would highly recommend taking a deeper look into both products.

That’s all for the second part of this blog series. Within the next part I will focus on how you can Combine and Output data in both of the tools! If you missed it take a look at part one which is focussed on Inputs & Investigation.

Hopefully you have a bit of an understanding about how Tableau Prep and Einstein compare and how they can come together in the future.

Data Cleaning In Tableau

If you’re interested in learning more about Tableau and Salesforce then please get in touch via [email protected]!

In late April of this year (2018) Tableau released a new tool called Tableau Prep.

So what is Tableau Prep?

You know all that very unsexy data prep stuff you have to do before you can create anything interesting?

The stuff you do to get the dataset ready for analysis in Excel, or maybe in SPSS, R, SAS, or <insert one of many other tools here>.

Tableau has now given you another option. Not only will Tableau prep give you a different, more visual, way to prepare data. You’ll also end up with systematic (and easy to alter) change history covering the full preparation process.

But is it actually worth trying out? Let’s give it a test.

We’ll start with some rough data.

Back when I worked with early childhood data in North Carolina, one of the key pieces of data we would examine was child care quality. When I was on the inside I had a data warehouse login, but the data is also available publicly through a search form on the NC DHHS website.

So let’s use the basic search form to pull the data.

Don’t enter anything, and just go down to the bottom of the page and click submit on the blank form. This will give you data for the whole state with the contact information and license ratings.

If you’re a programmer you might see this as an opportunity to scrape some data. But that’s more work than it’s worth right now. Instead I went ahead and just copy/pasted all the data from the page into an empty Excel spreadsheet.

The last time I worked with this data using this technique, I prepped the file within Excel. A series of sorts, a few formulas, and some row deletes brought me to something I could use in Tableau.

Data Cleaning Steps

It wasn’t exactly systematic and I didn’t write down all the steps, so probably wouldn’t be the best approach if this data was super important to your job.

Bringing the Data into Tableau Prep

I started by just opening up Tableau Prep and dropping in the Excel spreadsheet.

Then I went ahead and clicked on Sheet1 in Tableau Prep to start cleaning the data.

Here are some of basic operations I ended up doing during the data cleaning step. I screen capped them after finishing the cleaning.

Data Interpreter

So I let Tableau’s Data Interpreter give my data a quick clean. This basically distributed the merged cells and put in 3 lines of data for each facility. Not where we will want to end up but something perfectly helpful at this early stage.

Recoding Data

There were a couple of variables I really wanted to recode. Instead of a license type field I wanted the star rating for each facility (1, 2, 3, 4, 5, or other). I also wanted to separate the center and home variables. Both of these tasks were a cinch by just duplicating the license type field and then using group and replace.

Data Cleansing Best Practices

Excluding Data

Tripling the data because of the address field doesn’t really make sense. Really I only need the second row (which includes city and zip code information). In order to do that I started by excluding all the contact information phone number rows by searching for any cell that starts with a “(“. Then I excluded any address contact information rows that start with a number.

Next I used automatic split to break up city and zip into their own variables.

Outputting the Data

After I was done with all the cleaning steps, I setup the Prep file to export the data. This gave me a Tableau Data Extract to import right into Tableau Desktop.

Tableau Database Connection

Playing with the Data

Data cleaning in tableau public

Now let’s give the data a test.

I went ahead and created a nice little map with the average star rating for facilities by zip code. I set the color cutoff at 4 stars (as this is what Smart Start would usually consider the minimum for good quality childcare). This gives a nice map that’s pretty easy to explore.

It also hinted me into an issue with the zip level data (165 rows have the extra 4 digits which were not auto picked up by Tableau Desktop). If this were a real project I would go back to prep and make the change to 5 digits for everyone.

Using a simple dot plot lets me sort the data by average star rating to get a different view of the data.

Then splitting the data by home licenses and center licenses shows pretty quickly one of the biggest differentiators in level of care. Home based child care, no matter the zip code, is by and large going to show as lower quality.

This makes sense but also shows why counties without a lot of centers (and more home care settings) would show indicators of lower quality care.

The Verdict

I have to say, I was really impressed by Tableau Prep.

I spent a good half of my career in data prep and even though I know how to code and write syntax, the workflow in Tableau Prep was so smooth and useful that I could see myself switching to Prep for most of my work.

The natural audit trail offers a clear advantage to cleaning in Excel. All in all, it’s easy to use, visual, and makes for a replicable data preparation process.

Want to see my finished product in action?

Coments are closed

Most Viewed Posts

  • African Sigils
  • Bhad Bhabie Discord
  • Using A Tor Browser

Scroll to top