Market Research Survey Data Analysis in Tableau
We recently had a very interesting project where we had to process a large amount of data from a worldwide market research survey for a global brand. The amount of data was too large for the usual office tools like Excel and with around 1.200 columns of data was a great opportunity to test out Alteryx as the data preparation tool for Tableau.
I won’t go into detail on how to process market research data with Alteryx and present it in Tableau as there are numerous articles on this subject on the internet including the excellent work by Steve Wexler which is sure to get anyone up and running visualizing such data. Rather I wanted to give a general view on how I approached some specific problems that this survey threw up in case anyone else finds themselves up against similar challenges.
As I mentioned the data consisted of around 1.200 columns of data. As we know when it comes to processing and presenting data the general rule is less columns and more rows. Additionally if you have a lot of columns the chances are that you don’t have row level data so you will need to do some unpivoting to get the data into rows. I could see that we had demographic data in multiple columns that we could compress and then the actual questions which would need to be unpivoted to rows. The questions themselves were of the usual types, namely Likert, Yes/No, Multichoice etc.
So getting to work on reducing the column count the first thing I noticed was that although we had a single Country column we also had Region columns for each country giving a total of 201 columns containing the Region for each Country like this:This wasn’t really necessary as just as useful is a single Region column in conjunction with the Country column. This was easily done in Alteryx by simply adding the column values together into a new Region column using the Formula tool. Our 201 columns become 1 single column which in conjunction with Country gives use exactly the same information:This was the same with income, another 201 columns containing the income for each country. I could compress the income data into a single column and use a regular expression if necessary to extract the currency information to an additional column although this wasn’t needed. So again 2 minutes in Alteryx reduces 201 columns to:This was possible for several other demographic Dimensions so I was able to reduce the columncount to a still weighty 820 but at least now it was split into demographic or static data on one side and all the question data that needed to be unpivoted on the other. This is also easy in Alteryx using the Transpose tool. I simply select the columns I want to keep and the columns I want to unpivot:So after my unpivoting or transposition operation I had gone from having 1.200 columns of data to 12 columns! This does of course increase the number or rows…I now had 130.000.000 rows of data! It is possible to split the questions or question types into separate columns which would reduce the row count (see the Wexler website mentioned above) but I was interested in seeing how well it would perform with the least modifications and added additional columns where necessary.
Actually with 130 Million rows Tableau performed brilliantly. The report was fully usable with only around a 3 or 4 second query time for new queries. No one in my tests with this first extract complained about the performance and were astounded when the number of rows was reviled to be 130 Million!
Map and Enhance
Now I had my base data I could go about cleaning and enriching it. Filtering out the NULL results reduced the number of rows to around 30 Million. The next thing to do was to add a Relevant Company column. The respondants were asked the same question for a number of companies, for example, ‘Do you think BMW make quality cars’, ‘Do you think Fiat make quality cars’ and so on. In the results the question would be Q1 for example and then a suffix denoted the company. So the example above would be Q1_001 and Q1_002 for the results for BMW and Fiat respectively. I could simply join an Excel lookup sheet to parse out the companies into a new column and I was good to go.
As I said I didn’t want to write a comprehensive how to, just an overview of what’s possible and how easy it is once you get your head round it. I do however want to give a bit more in depth information on how we tackled a specific question type as it does illustrate the power and ease of Alteryx and Tableau.
These so called semantic differential type questions present two opposing adjectives and ask the respondent to give their response, for example:
This presents a challenge as the results are presented as such:
This means that you can’t get a proportional representation of the results where the positive and negatives are summed. To make matters more complex, in the survey it wasn’t always positive on the left and negative on the right, they were mixed.
I had to split this in to 2 parts in order to get a satisfactory result. Firstly I needed to generate an aggregatable measure from the response, i.e. a number and then I had to figure out which adjective had been chosen. In the example above the ‘Strongly agree with negative statement’ should give the result
Adjective = boring Value = -2
I evaluated the response in a Formula tool in Alteryx like so:
IF [response]== „Strongly agree with negative statement“ THEN „-3“
ELSEIF [response] == „Somewhat agree with negative statement“ THEN „-2“
ELSEIF [response] == „Slightly agree with negative statement“ THEN „-1“
ELSEIF [response] == „Slightly agree with positive statement“ THEN „1“
ELSEIF [response] == „Somewhat agree with positive statement“ THEN „2“
ELSEIF [response] == „Strongly agree with positive statement“ THEN „3“
This gave me my values. I then joined an Excel lookup table to give me the correct adjective based on the question and the calculated value from above:
So my results for these particular questions now looked like this:
There are duplicated adjectives because in this particular survey two brands were presented to be evaluated.
Presentation in Tableau
In Tableau I can use these values directly to give a balanced evaluation of the adjective in question. All the plus and minus values will be summed by Tableau and then we can take the absolute value of the variable to get the total for each adjective with the following calculated field:
This can be represented in a variety of interesting graphical ways:
You can see an anonymized version of the finished workbook Tableau Public.
Alteryx makes the preparation of this type of market research and survey data very straightforward and intuitive. It took a while to figure out exactly what I needed to get from the survey data but once I got it figured out it was pretty easy to not only get the data I needed but also to map it to other data and enrich it.