Lots of good things come from New Zealand … Bungy jumping, Easiyo®
make-your-own yoghurt, the All Blacks®, Bega® and Mainland® cheese, good Sav-blanc wines, Crowded
House, Russell Crowe (well, maybe not the last one).
But what also comes from New Zealand, and has done so for a
very long time, are creative and competitive ideas pursued by skilled and
knowledgeable market researchers.
The latest I have come across is a rather cool online tool
developed by Irene Rix and her business partner Josh Bondy, at their aptly
named ‘CodeKiwi’ business, albeit based in Melbourne.
Some of you will know that I can go on a bit about the
quality of data files that I am given to analyse. Indeed, some of the leading interviewing
packages still don’t seem to have ‘got’ the idea that their great online
interview scripting capability can still result in data files that are painful
to analyse.
In most cases, asking your data provider to give you output in
SPSS .sav format will minimise my pain (and that of any analyst). But not always. In any case, a lot of times I still receive
data files in .csv or .xlsx format, and that can raise a whole lot of issues
that normally just don’t occur with SPSS data, e.g. (to name just a few):
- No labelling information for code frames or questions
- Answers to multi-response questions recorded in just one field, with responses separated by commas
- ‘Pick any’ responses recorded as 1, 2, 3, etc instead of 0/1.
I could go on.
In most cases, I therefore prefer to refer people to the
excellent guide to SPSS data file preparation, available on the Survey Analysis
website http://surveyanalysis.org/wiki/SPSS_Data_File_Specifications.
The CodeKiwi team’s approach, in contrast, stems from their
recognition that in the wider data and analytics field, the .csv (and .txt)
data file format is pretty much pervasive.
Their online tool thus allows anyone (not just market researchers
and analysts) to upload these types of data files, and have comprehensive and automatic checks and
reports run on the data structure, format, missing value patterns,
distributional forms and characteristics, values requiring recoding, string
lengths and a host of other information.
In addition, a unique ‘Data Health Index’ is provided, that
is based on the pattern of missing values, mix of variable types, and their
skew, variance, and concentration of values.
In their own words:
“Knowing your data
before you get started can save hours of pain.
Learn what's lurking in that shiny new datafile, and save your time for the clever stuff...”
The “Know thy Data” initiative is a first-rate effort to
come to grips with a key issue that is faced by all data analysts,
whether in market research or in other fields of data endeavour.
Why not have a look for yourself? https://knowthydata.io/
********************************************