Irwin Greenberg (http://clicks.robertgenn.com/irwin-greenberg.php) apparently said:
“Find the artists who are on your wavelength and continuously increase that list. Learn from the masters, learn from artists alive today whether it's someone you may never meet in person or it’s a close artist friend.
Visit museums and galleries, buy books and magazines, take classes.
Embrace the life of a student, no matter your age or ability, and you will become a better artist”.
[Courtesy: Ray Frisken: artnewsflash@gmail.com]
What's it got to do with us? Substitute 'analyst' for 'artist' and substitute 'Go to conferences and workshops' for 'Visit museums and galleries' :-)
Thursday, 28 February 2013
Tuesday, 26 February 2013
Does Excel™ really excel ?
A good friend of mine, when asked exactly what it is I do
for a living, sometimes replies “he does sums”.
There is a fair amount of truth in this; indeed much of the quantitative analysis that
anyone does can involve doing lots of ‘sums’.
And to help us, we use various software tools such as SPSS,
Q, R, Minitab, Stata, Excel,etc.
But occasionally we can notice something slightly weird,
with an unexpected result, or a computation that doesn’t seem to lead to where
it should.
Very often, this is simply a data cleaning issue (or,
rather, a lack of data cleaning issue).
For the heavy quant people, there are well-known mantras such as “Step
1: clean your data; Step 2: clean your data again; Step 3: repeat Steps 1 and 2”.
Or: “95% of advanced analysis is getting the datafile into
shape”.
Or even, as lamented by Sherlock Holmes, in the Conan-Doyle
story “The adventure of the Copper Beeches”:
‘"Data! Data! Data!" he cried impatiently. “I can’t make bricks without clay!”’
Another good quant analysis rubric is: “If it looks unusual, it’s probably wrong.”
In the above vein, I recently came across a rather
disturbing article(a), published only a couple of years ago, that
deals with a claimed plethora of computational errors that are literally built into Excel™.
After conducting a large number of tests (admittedly, some
of them using datasets that might be described as ‘slightly esoteric’), the
authors nonetheless conclude that “…it is not safe to assume that Microsoft
Excel’s statistical procedures give the correct answer. Persons who wish to conduct statistical
analyses should use some other package.”
I discussed this with a senior statistical consultant, who
replied:
“This paper criticising
Excel freaked me out when I first read it ...
However, over time, I have become less concerned. I looked into some of the tests … a few
general conclusions:
a) It is disappointing
that Microsoft doesn’t fix these things.
b) The errors are at the
margins. That is, we are talking about
inaccuracies that tend to occur when the techniques are unreliable anyway
(e.g., severe multicollinearity).
c) There is more than a
degree of unfairness in the critique. For
example, in the case of Solver, I have found it repeatedly to do a better job
than the various optimisers in R.”
So, given all the above, and all other things being equal,
it is probably best to take the bad news concerning Excel with a grain of
statistical salt. Nonetheless, it may
sometimes be wise to use two alternative computational means when
working with something really critical, just to be sure.
Tuesday, 19 February 2013
What data can't (and can) do ...
I’m indebted to Julie Houston, from Nitty Gritty Research http://www.nittygritty.net.au/ for
this very recent item (at least, very recent at the time of writing this post):
The subject of the article is “the strengths and limitations
of data analysis”.
Bit of a dry topic, you might think?
But the really interesting bit (are?) is the 250+ detailed comments
posted in response to the article.
Worth a slow read over a glass of red, I think :-)
Thursday, 14 February 2013
Amazing statistical fact #2
In any gathering of people,
how many must there be to be 50% sure that at least two people will share the
same birthday?
Answer: At least 27 people.
How many people at that
gathering must there be to be virtually certain that two of them will share the
same birthday?
Answer: At least 57 people.
As people enter a room one
at a time, which one is most likely to be the first to have the same birthday
as someone already in the room?
Answer:
The 20th person to arrive.
What is the average number
of people (selected at random) required to find two with the same birthday?
Answer: On average, 25 people are required.
[Example: There have been 27
Prime Ministers of Australia. Paul
Keating, the 24th Prime Minister, and Edmund Barton, the first Prime Minister,
share the same birthday, 18 January.]
Source: All this and
more can be found at: http://en.wikipedia.org/wiki/Birthday_problem
Monday, 11 February 2013
A quick word ...
A few years ago, Wordles™ were a popular means of presenting lots of
open-ended text. Here’s an example
developed using the text on my own website.
I still think Wordles can be pretty cool.
But now, along has come the new, even cooler, version known
as Wordyup™.
Developed by Garreth
Chandler and his team at Twist of Lime www.twistoflime.com.au,
Wordyup “ … turns the usual 1000's of open ended responses on a survey into
real insights with dynamic key word analysis quickly, easily and what's more ... it's
fun!”
And more to the point, as Garreth says, he “… can't stop
playing with it...”
Have a look for yourselves: https://www.wordyup.com/ and let me know
what you think. Better still, let
Garreth know what you think.
Thursday, 7 February 2013
Amazing statistical fact ...
2013 is the International Year of Statistics.
So, to recognise that, I thought you might be interested in the following:
So, to recognise that, I thought you might be interested in the following:
Suppose there is a medical test that is designed to detect
whether you have an illness/infection/whatever.
Suppose the chance of anyone actually having that illness/infection/whatever
is 5%.
Suppose that if you do have that illness, then the chance of
that test detecting that you have it is 95%.
That sounds pretty good, doesn’t it?
Suppose the chance that the same test will indicate you have
that illness, if you actually don’t, is just 5%.
That sounds pretty good too.
Fairly straightforward statistical analysis will show,
irrefutably, that if that apparently reliable test indicates you have that
illness, the chance that you actually do have it is only 50%.
Scary. But it’s true.
Suppose the chance of anyone having that illness is actually
much lower, say 1%.
Then if the test indicates you have that illness, the chance
that you actually do have it is only 16% !
I learned about the above from Kerry Mengersen, whose course
“Bayes for Beginners” I undertook back in 2006:
http://www.statsoc.org.au/CPD16
Monday, 4 February 2013
Statistical goldmine !
Some years ago, it was possible to download (for free) a
comprensive statistical text from www.statsoft.com
. It was great, just sat on my desktop
until I needed it.
These days, you can get the same thing as an online
resource, again for free http://www.statsoft.com/textbook/
.
I’ve just been made aware of something that is arguably as
good, if not better http://surveyanalysis.org
.
Whilst still under development, it already contains a
massive amount of information, of interest to anyone who works in the advanced
analytics area.
Sunday, 3 February 2013
Round the twist
Weirdly, I have found this post http://alandgraf.blogspot.com.au/2012/06/rounding-in-r.html to be fascinating. Maybe I need to get out more?
That blog post deals with how to round up (or down) when you have a number that isn’t a whole number.
For example, if you have an observation or data point that is equal to 4.5 and you want to use only whole numbers in your analysis, should you round down to 4 or up to 5?
It appears that there is no hard and fast rule for doing this; some argue for down and some for up. Similarly, some software rounds down in this instance and some up.
There is actually an international standard that applies; ISO/IEC/IEEE 60559:2011 which is identical to the IEEE Standard for Floating-Point Arithmetic (IEEE 754) established in 1985. [ISO/IEC/IEEE 60559:2011 covers a zillion other aspects of numerical computing and took seven years to produce.]
In relation to the above rounding issue, ISO/IEC/IEEE 60559:2011 apparently says, in effect:
“… round numbers ending in "1, 2, 3, and 4" down, and numbers that end in "6, 7, 8, 9" up. Then, specifically regarding "5", if the preceding digit is odd, round up and if the preceding digit is even, round down.”
The advantage of this is that 50% of the numbers will be rounded up, and 50% rounded down, instead of rounding up 5/9th's of the time, and so introducing a bias.
As one statistician (and a much better one than I am) I asked about this confirmed “…the clever thing about rounding to evens is that the average is not biased when this is done.”
That blog post deals with how to round up (or down) when you have a number that isn’t a whole number.
For example, if you have an observation or data point that is equal to 4.5 and you want to use only whole numbers in your analysis, should you round down to 4 or up to 5?
It appears that there is no hard and fast rule for doing this; some argue for down and some for up. Similarly, some software rounds down in this instance and some up.
There is actually an international standard that applies; ISO/IEC/IEEE 60559:2011 which is identical to the IEEE Standard for Floating-Point Arithmetic (IEEE 754) established in 1985. [ISO/IEC/IEEE 60559:2011 covers a zillion other aspects of numerical computing and took seven years to produce.]
In relation to the above rounding issue, ISO/IEC/IEEE 60559:2011 apparently says, in effect:
“… round numbers ending in "1, 2, 3, and 4" down, and numbers that end in "6, 7, 8, 9" up. Then, specifically regarding "5", if the preceding digit is odd, round up and if the preceding digit is even, round down.”
The advantage of this is that 50% of the numbers will be rounded up, and 50% rounded down, instead of rounding up 5/9th's of the time, and so introducing a bias.
As one statistician (and a much better one than I am) I asked about this confirmed “…the clever thing about rounding to evens is that the average is not biased when this is done.”
Friday, 1 February 2013
I don't want your help, I just need your advice ...
Some of you
will know of the hilarious late 80’s spoken piece by Fred Dagg (aka John Clarke),
in which he ‘translated’ real-estate agent speak. Here’s a short extract:
‘Owner transferred - reluctantly
instructs us to sell’ means that the house is for sale.
‘Genuine reason for selling’ means that the house is for sale.
‘Rarely can we offer …’ means that the house is for sale.
‘Superbly presented delightful
charmer’ doesn’t mean anything really, but it’s probably still for sale.
‘Most attractive immaculate
home of character in prime dress-circle position’ means that the thing that’s
for sale is a house.
There is lots more from Dagg/Clarke, of the same ilk.
My experience is that there are almost direct parallels in our own
businesses, e.g.
‘I just need to pick your
brain’ means that I want you to give/tell me something for free, on the basis of your extensive experience
and knowledge that you have spent many, many years acquiring.
‘I have an interesting challenge for you’ means that I want you to
give/tell me something for free etc.
‘I wonder if you could come in for a meeting and help us sort out
what we need to do’ means that I want you to take 2 or 3 hours out of your day and
give/tell me something for free etc.
‘Quick question’ means that I am going to ask you something complicated
that you will need to think about for a while, and I want you to do it for
free.
‘Exciting new project’ means that I need your input for our proposal, and I
don’t want to pay for it.
And probably the best of
all (and I am not making this up) ….
‘I don't want your help, I just
need your advice’ !!
Now, before you write me off as just another grumpy researcher (which
isn’t actually too far from truth), please be assured that I almost invariably
do respond positively to requests like the above. I estimate that it takes me up to around an
hour per week. That’s around 50 hours
per year, or a week-and-a-half per year that I could otherwise spend on project
work/going to the beach/walking in the park/<insert your own words here>.
And I am the first to admit that I can myself be guilty of exactly
the same sin, that is, phoning up a contact and asking for some free
advice/input … but maybe that’s the payback: I help others, and someone else
helps me?
It’s a fine line … and I know this general area has also been a
hot topic amongst the IRG’ers (AMSRS’s Independent Research Group) in the
recent past, admittedly more in the context of “We’ve commissioned you to do X,
and we now want to add Y to the project, but we don’t want to pay any more,
because there will be more work for you down the track”.
So where does one draw the line.
Indeed, should one draw the line?
Or is it simply a case of ‘you
scratch my back, and I’ll scratch yours’?
Subscribe to:
Posts (Atom)