Wednesday, 8 April 2015

Knowing your data …

Lots of good things come from New Zealand … Bungy jumping, Easiyo® make-your-own yoghurt, the All Blacks®, Bega® and Mainland® cheese, good Sav-blanc wines, Crowded House, Russell Crowe (well, maybe not the last one).

But what also comes from New Zealand, and has done so for a very long time, are creative and competitive ideas pursued by skilled and knowledgeable market researchers.

The latest I have come across is a rather cool online tool developed by Irene Rix and her business partner Josh Bondy, at their aptly named ‘CodeKiwi’ business, albeit based in Melbourne.

Some of you will know that I can go on a bit about the quality of data files that I am given to analyse.  Indeed, some of the leading interviewing packages still don’t seem to have ‘got’ the idea that their great online interview scripting capability can still result in data files that are painful to analyse. 

In most cases, asking your data provider to give you output in SPSS .sav format will minimise my pain (and that of any analyst).  But not always.  In any case, a lot of times I still receive data files in .csv or .xlsx format, and that can raise a whole lot of issues that normally just don’t occur with SPSS data, e.g. (to name just a few):

  • No labelling information for code frames or questions
  • Answers to multi-response questions recorded in just one field, with responses separated by commas
  • ‘Pick any’ responses recorded as 1, 2, 3, etc instead of 0/1.

I could go on.  

In most cases, I therefore prefer to refer people to the excellent guide to SPSS data file preparation, available on the Survey Analysis website

The CodeKiwi team’s approach, in contrast, stems from their recognition that in the wider data and analytics field, the .csv (and .txt) data file format is pretty much pervasive.  Their online tool thus allows anyone (not just market researchers and analysts) to upload these types of data files, and have  comprehensive and automatic checks and reports run on the data structure, format, missing value patterns, distributional forms and characteristics, values requiring recoding, string lengths and a host of other information.

In addition, a unique ‘Data Health Index’ is provided, that is based on the pattern of missing values, mix of variable types, and their skew, variance, and concentration of values.

In their own words:

“Knowing your data before you get started can save hours of pain.  Learn what's lurking in that shiny new datafile, and save your time for the clever stuff...”

The “Know thy Data” initiative is a first-rate effort to come to grips with a key issue that is faced by all data analysts, whether in market research or in other fields of data endeavour.

Why not have a look for yourself?  


No comments:

Post a Comment