<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Squarespace Site Server v5.9.2 (http://www.squarespace.com/) on Thu, 11 Mar 2010 10:59:51 GMT--><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"><title>data vs dogma</title><subtitle>data vs. dogma</subtitle><id>http://www.symbolix.com.au/blog/</id><link rel="alternate" type="application/xhtml+xml" href="http://www.symbolix.com.au/blog/"/><link rel="self" type="application/atom+xml" href="http://www.symbolix.com.au/blog/atom.xml"/><updated>2010-03-11T01:06:29Z</updated><generator uri="http://www.squarespace.com/" version="Squarespace Site Server v5.9.2 (http://www.squarespace.com/)">Squarespace</generator><entry><title>It might just be worse than that</title><category term="statistics"/><id>http://www.symbolix.com.au/blog/2010/3/10/it-might-just-be-worse-than-that.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2010/3/10/it-might-just-be-worse-than-that.html"/><author><name>Stu</name></author><published>2010-03-09T23:39:07Z</published><updated>2010-03-09T23:39:07Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>I rarely get to blog here, so I am taking the opportunity to point out something above the general decline in mathematics students - the losing of knowledge.</p>
<p>Take a look at these feeds from the Australian:</p>
<p><a class="offsite-link-inline" href="http://www.theaustralian.com.au/news/nation/mathematics-students-in-serious-decline/story-e6frg6nf-1225838901032" target="_blank">Mathematics Students in Serious Decline</a></p>
<p><a class="offsite-link-inline" href="http://www.theaustralian.com.au/higher-education/equation-for-maths-warns-of-disaster/story-e6frgcjx-1225838873328" target="_blank">Equation for maths warns of disaster</a></p>
<p>I was teaching at University when we as a society ripped these students off, replacing core problem solving with vapid histories and philosophies. Such a change had a lot to do with the academia of mathematics becoming disconnected with the application of its theory. And this is where the problem lies.</p>
<p>I&#8217;ll give you an example. I sought to take a Graduate Certificate in Applied Statistics recently. Not because I wanted another piece of paper, but because I am looking for some more knowledge to protect myself and my clients from stupid misapplications of theory. I couldn&#8217;t find one that was more than a simple training course in certain statistical software packages. Needless to say, I haven&#8217;t enrolled. I need, as do all analysts more than the mindless application of a software package.</p>
<p>The need arose when the tried and proven ANOVA test failed on me. ANOVA is the draught horse of multiple testing applications. I had a test returning p values that experience tells were way too small. The underlying data, was violating a number of assumptions of the ANOVA test, and I could find a transform that would fix it. I still haven&#8217;t found a transform, and have had to move on without that test, making my story that much longer as I now have to justify the use of &#8220;unorthodox&#8221; testing procedures.</p>
<p>This is not the first time that I&#8217;ve come across underlying short-comings in standard procedures. My argument comes from the fact that often I have to go right back to 1930&#8217;s papers by deities like Fisher, or early works by Tukey to find underlying mechanics and discussions. Far too many papers simply state the software package they used, and the outcomes, never addressing whether the package should have been applied in the first place.</p>
<p>I wonder how often it is that an assumption has been made regarding validity, and never checked.</p>
]]></content></entry><entry><title>Join us for a cuppa at the Clean Energy Council Conference</title><category term="about us"/><category term="natural resources"/><id>http://www.symbolix.com.au/blog/2010/2/23/join-us-for-a-cuppa-at-the-clean-energy-council-conference.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2010/2/23/join-us-for-a-cuppa-at-the-clean-energy-council-conference.html"/><author><name>admin</name></author><published>2010-02-23T05:58:17Z</published><updated>2010-02-23T05:58:17Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>Symbolix is proud to be a <a class="offsite-link-inline" href="http://www.bcse.org.au/cec/mediaevents/cec_conference_2010/Sponsor-exhibit/Sponsor.html" target="_blank">sponsor of the upcoming Clean Energy Council National Conference</a>.&nbsp; It&#8217;s being held in Adelaide from the 3rd-5th May 2010.&nbsp;</p>
<p>The conference is attended by over 700 delegates from all areas of the clean energy sector, and promises some interesting discussion and insight.</p>
<p>We&#8217;ll be there throughout, and are sponsoring a break time on Tuesday, so drop down to the exhibition hall and have a coffee on us.</p>
]]></content></entry><entry><title>The median stripped bare? Well....</title><category term="reviews"/><category term="statistics"/><id>http://www.symbolix.com.au/blog/2010/2/1/the-median-stripped-bare-well.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2010/2/1/the-median-stripped-bare-well.html"/><author><name>admin</name></author><published>2010-01-31T22:20:54Z</published><updated>2010-01-31T22:20:54Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>The Age newspaper today published <a class="offsite-link-inline" title="The medan stripped bare 31/1/10" href="http://bit.ly/b9jnp0" target="_blank">an article</a> analysing differences in the way different market research companies report the median selling price for different suburbs.&nbsp; This is an important point to discuss, but I was not concerned by the analysis as much as this definition of median:</p>
<blockquote>
<p>&#8220;However, it is worth keeping in mind that the median price is not the same as thing as the average price. It is simply the middle sale price when all property sales are arranged <span style="text-decoration: underline;">chronologically</span>.&#8221;</p>
</blockquote>
<p>Um, no actually.&nbsp; For a given month, the middle sale price when sales are arranged chonologically (in time) would be the price received around the 15th of the month (assuming an even sales rate through the month).</p>
<p>For the record, here are the definitions you need.&nbsp; When we talk &#8220;average&#8221; we may mean one of three measures.</p>
<ul>
<li>The <em>median</em> is the middle value, when prices are arranged in order from lowest to highest. </li>
<li>The <em>mean </em>(most commonly just called the <em>average) </em>is just the sum of all the prices, divided by the number of sales. </li>
<li>The <em>mode</em> is the most common sale price.&nbsp; </li>
</ul>
<p>In many cases, these three measures are very similar, but not always.</p>
<p>It&#8217;s worth noting that the mean is highly susceptible to outliers - a one off $10 million dollar property sale will inflate the mean price, but leave the median less affected.&nbsp; This is why the median is a more stable measure of things like house prices, which are likely to have a number of small outliers (very low or very high prices).</p>
<p>Now that&#8217;s settled, the rest of the article is worth reading - it discusses why understanding the drivers behind changes in these measures is so important.&nbsp; For example, if a jump in median house prices reflects a drive by investors moving on high end properties, it does not necessarily translate to making a killing on selling a low end property to a young first buyer market.</p>
<p>This is important to think about, but the first step is to understand the basics of what the measures actually mean.</p>
]]></content></entry><entry><title>The hidden danger of averages</title><category term="natural resources"/><category term="risk"/><category term="statistics"/><id>http://www.symbolix.com.au/blog/2009/12/16/the-hidden-danger-of-averages.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2009/12/16/the-hidden-danger-of-averages.html"/><author><name>reidy</name></author><published>2009-12-15T22:38:20Z</published><updated>2009-12-15T22:38:20Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<h2><strong>&nbsp;</strong></h2>
<p>At 7:51am this morning on ABC Melbourne, as part of the news broadcast, it was mentioned that today&rsquo;s forest fire index rating for the state is about 47 <a class="offsite-link-inline" href="http://bit.ly/6cFYSt" target="_blank">http://bit.ly/6cFYSt</a></p>
<p>Given that this is generated from at least five numbers (representing the five fire forecast regions), this tells us next to nothing about the actual risk.</p>
<p>Let&rsquo;s assume that of the five regions four of them have a fire index of 20 (negligible) whilst the remaining region has an index of 150 (comparable to Ash Wednesday).&nbsp; This gives the STATE a fire index average of 46.&nbsp; However for that one region with a Code Red &ndash; Catastrophic rating, any fire that starts will be devastating, but the state as a whole will be fine.</p>
<p>While it is impossible to announce the fire index rating for each and every city, town and locality in the state, giving a state-wide &lsquo;average&rsquo; index is pointless.&nbsp; The state was divided into the five fire forecast regions because they were identified as having very separate fire risks; shouldn&rsquo;t they be reported individually?</p>
<p>In a weather report there are eight capital cities which get mentioned, and to give the current forecast for all eight takes less than a minute when done expeditiously.&nbsp; Surely it would take less time again to broadcast the fire index ratings for each of the five regions individually.</p>
<p>Averages are incredibly useful tools when used properly, I myself use them many times each day.&nbsp; The forest fire index should NOT be given as a state wide average, it masks the true risk to the state.&nbsp; There once was a man who drowned crossing a river that had an average depth of 3 feet&hellip;</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.symbolix.com.au/storage/post-images/drowning.png?__SQUARESPACE_CACHEVERSION=1260916953131" alt="" /></span><span class="thumbnail-caption" style="width: 415px;">http://www.flawofaverages.com/</span></span></p>
<p>﻿</p>
]]></content></entry><entry><title>24 hour time is NOT decimal</title><category term="Data 2.0"/><category term="data mining"/><category term="humour"/><id>http://www.symbolix.com.au/blog/2009/12/9/24-hour-time-is-not-decimal.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2009/12/9/24-hour-time-is-not-decimal.html"/><author><name>reidy</name></author><published>2009-12-09T05:12:39Z</published><updated>2009-12-09T05:12:39Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>&nbsp;</p>
<p>If I had to do this only once  in my data crunching life I could excuse that.&nbsp; However this is  now the third time in a year that I&rsquo;ve had to perform this same miraculous  transformation of time.&nbsp;</p>
<p>When recording data, I love  it when people think to use a 24hr time format, as this saves a lot  of messing around trying to work out if it&rsquo;s meant to be 6AM or 6PM,  it&rsquo;s somewhat more difficult to get 6 and 18 confused.&nbsp; However&hellip;there&rsquo;s  a worrying trend starting to appear in the data that crosses my desk.&nbsp;  If you make a note of something at quarter past two in the afternoon,  please, for the love of data crunching monkeys EVERYWHERE, write it  as &ldquo;14:15&rdquo;.&nbsp; That magical extra dot which transforms a decimal  point into a colon saves many hours of headaches when checking data  prior to analysis.&nbsp; &ldquo;14.15&rdquo; and &ldquo;14:15&rdquo; are very different  values when you&rsquo;re working with times.&nbsp;</p>
<p>If you insist on using decimal  points in time, please us a Julian Date format instead of a normal 12  or 24 hour format.&nbsp; Julian Dates are MEANT to have decimal places,  and data crunchers should be just as happy with them as normal times. ﻿</p>
<p>&nbsp;</p>
]]></content></entry><entry><title>Wandering in the (data) Wilderness</title><category term="data mining"/><category term="decision analytics"/><id>http://www.symbolix.com.au/blog/2009/11/18/wandering-in-the-data-wilderness.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2009/11/18/wandering-in-the-data-wilderness.html"/><author><name>Stu</name></author><published>2009-11-18T04:06:54Z</published><updated>2009-11-18T04:06:54Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>I was working on a proposal for a significant firm to look at employing some pretty hefty business analytics. I was (naturally) laying out a project plan, and noticed I was spending a lot of time in a risk mitigation phase of determining the underlying quality of the data sources.</p>
<p>The roadmap we ended up with looked a lot like that for an old-world exploratory expedition (see below). This was the only way I could ensure, with reasonable confidence, the quality of the potential outcomes for the project. It involved identifying the desired goal (Gulf of Carpentaria), the potential routes and risks (Dry land up the middle, with not much water), and <strong>most</strong> importantly, the current status.</p>
<p><span class="full-image-block ssNonEditable"><span><img src="http://www.symbolix.com.au/storage/post-images/datamining.png?__SQUARESPACE_CACHEVERSION=1258517130509" alt="" /></span></span></p>
<p>It occurred to me that the quickest, quickest way to get lost on a (data) mining expedition, was to not know where you were initially. That is, to get lost it&#8217;s best to start lost. Given you are going to be traversing uncharted territory in most mining applications, it suddenly becomes really important to make sure you start out from where you thought you were starting out from. Otherwise you can get a very expensive, random walk out into the aether, because all the assumptions behind your planned route map will be nonsense.</p>
<p>It also highlighted for me how crucial it is to treat the technology with great care. To continue the analogy, if you start of your GPS way point path at the wrong spot, the little handheld unit will be dead certain you are at the Sydney Harbour Bridge, oblivious to the large rock and red dust in front of you. Just so, a poorly fired analytic will tell you with absolute confidence that you should change the Call Centre settings&#8230;.but if the initial data source wasn&#8217;t what you thought it was, you might find yourself standing in a wind blown desert wondering what went wrong.</p>
]]></content></entry><entry><title>Its ok to have feelings, just don't blame science.</title><category term="evidence based management"/><category term="natural resources"/><category term="science communication"/><id>http://www.symbolix.com.au/blog/2009/11/2/its-ok-to-have-feelings-just-dont-blame-science.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2009/11/2/its-ok-to-have-feelings-just-dont-blame-science.html"/><author><name>admin</name></author><published>2009-11-02T04:35:21Z</published><updated>2009-11-02T04:35:21Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>I&#8217;m all for evidence-based everything, which is no surprise I&#8217;m sure.</p>
<p>I was interested to read about the chief drugs advisor to the UK government being sacked for speaking out against the government&#8217;s drug policy. Professor David Nutt argued that cannabis had been reclassified as a higher risk drug against scientific evidence and was promptly sacked for commenting on policy.&nbsp;</p>
<p>He is quoted as saying: &#8220;So this government has made the law and then it has gone back to the advisory council and said &#8216;could you find some evidence to support our decision?&#8217;&nbsp; Now we&#8217;ve said &#8216;no, we will stick to science, our scientific guns, we will produce the evidence and if you go and legislate inappropriately, we will continue to point out the evidence does not support you&#8217;.&#8221; (<a href="http://bit.ly/VUJpW">http://bit.ly/VUJpW</a>)</p>
<p>This quote echoed a number of conversations that took place at the Enivironment Institute of Australia and New Zealand Conference last week.&nbsp; In particular, there was a lot of frustration at groups and individuals that don&#8217;t want a development in their locality (for whatever cultural, social or personal reason) and who build (often spurious) scientific arguments to validate their opinion.&nbsp;</p>
<p>Considerable time and effort is often spent in putting together good, unbiased scientific evidence to address the &#8220;scientific&#8221; concerns, only to be ignored, deflected and demeaned, because it does not support the community sentiment.</p>
<p>Too often this situation decends into a &#8220;my expert is better than your expert&#8221; skirmish, and noone wins.&nbsp; And you may be surprised, but as a scientific consultant, my livelihood is dependant on me giving my clients the best evidence-based answer, not the one they want.&nbsp; And it&#8217;s a kick in the guts to go into those situations in good faith and wanting a good scientific debate, only to have my scientific integrity brought into question.&nbsp;</p>
<p>And you know what I reckon?</p>
<p>I think that it is actually valid for a community to stand up and say, &#8220;we have heard the evidence, we accept the value of the proposal but we don&#8217;t want it because it will make my backyard ugly/change the nature of our community/change the road I take to work&#8221;.&nbsp; Politicians should stand up and say &#8220;that is a fair and reasonable scientific argument, but parents in my constituency just don&#8217;t want another thing to worry about, no matter the risk, so we will go against the scientific advice for now.&#8221;</p>
<p>It&#8217;s ok to have social, political, religious or emotional responses, and for these to be taken into account.&nbsp; But they cannot be taken into account, if they are not out on the table.</p>
<p>Just don&#8217;t use science as a cover up.&nbsp; Thanks.</p>
]]></content></entry><entry><title>What exactly are you asking? Thoughts from EIANZ</title><category term="Papers&amp;Articles"/><category term="evidence based management"/><category term="natural resources"/><category term="statistics"/><id>http://www.symbolix.com.au/blog/2009/10/30/what-exactly-are-you-asking-thoughts-from-eianz.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2009/10/30/what-exactly-are-you-asking-thoughts-from-eianz.html"/><author><name>lib</name></author><published>2009-10-30T00:24:56Z</published><updated>2009-10-30T00:24:56Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>Good policy is ok, but how do we implement policy that we are also able to monitor and evaluate over time?&nbsp; How do we know that the activities we undertake on the ground will have the impact we expect?</p>
<p>This is a complex issue, especially in the evironmental sector.&nbsp; Here turning good policy into good practice involves a clever mix of</p>
<ul>
<li>community engagement</li>
<li>science</li>
<li>trust (i.e. avoiding the pitfalls of the &#8216;my expert&#8217;s better than your expert syndrome&#8217;)</li>
</ul>
<p>&nbsp;and utimately, it requires policy that actually can be translated into action, monitored, evaluated and improved.</p>
<p>This was a topic of much discussion at last week&#8217;s Environment Institute of Australia and New Zealand (EIANZ) conference in Canberra.&nbsp; There were a plethora of examples, workshops and ideas (stayed tuned, and I&#8217;ll post a link when the presentations go online).</p>
<p>For our part, we argued that good practice is impossible unless policy and investment is planned with a view to the evidence and monitoring that will be gathered to track the effectiveness of the activity.&nbsp; Incorporating statistical survey design is one such approach.</p>
<p>Traditionally survey design is done just prior to going out in the field.&nbsp; However, the first, most critical aspect of good survey design is having a well defined question to answer.&nbsp; This, we argue, is where policy comes in.&nbsp; During planning (whether planning a compliance or an investment program) we need to consider the long and medium term resource outcomes we which to achieve but ALSO asking what are the most appropriate metrics that can be monitored on the ground to track our progress towards goals.</p>
<p>This involves considering the size of the effect we are looking for and the power and the significance we can acheive through monitoring.&nbsp;</p>
<p>But enough from me - you can download the <a href="http://www.symbolix.com.au/publications/PPT_EIANZSurveyDesign_V2online_091020.pdf">full presentation here</a></p>
]]></content></entry><entry><title>A confession</title><category term="musings"/><category term="science communication"/><category term="statistics"/><id>http://www.symbolix.com.au/blog/2009/10/29/a-confession.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2009/10/29/a-confession.html"/><author><name>Stu</name></author><published>2009-10-29T02:24:38Z</published><updated>2009-10-29T02:24:38Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p>I do confess that I read textbooks. Mostly they are obscure methods and applications, but occassionaly they are graduate or college level, and even introductory. I do this because I need to keep up with new tricks, but also because there is something special in trying to teach a newbie analytics.</p>
<p>I mean, it is all about creating the right mental image so they can see the data shape and flow, and so instinctively recognise how to work with it. And introductory text are full of simplififed, yet correct, views of the world of data.</p>
<p>So, what is my confession? I normally gloss over the sections about &#8220;descriptive statistics&#8221; (i.e using stats to describe the overall &#8220;feel&#8221; of the data - 1 in 5 kids skip breakfast, etc).</p>
<p>They never interested me. I am a more &#8220;inferential modelling&#8221; type of fella (i.e. using hypothesis testing and other hefty tools to infer insight and generate correlations and predictions about a data set).</p>
<p>If it isn&#8217;t predicting an outcome, or telling me how to do things better or faster, why would I care? I should acknowledge that this often gets me into trouble with the Communications section, who are charged with creating the visualisations and stories around the inferences I find.</p>
<p>But I had a moment the other day when I was challenged on why I had made a choice of design. My answer had to do with the underlying nature of the data set&#8230;yep. Descriptive Statistics held the key.</p>
<p>So now, as well as sending my models down to Communications to have them make the pretty visuals for the output, now I speak to them before I model too, to see if the descriptive stuff holds any secrets that make the modelling and data mining more efficient, or more appropriate.</p>
<p>This blog was written without duress. No inference should be made about my continuuing support for the Communications department and the wonderful work they do.</p>
<p>Mmmm humble pie&#8230;&#8230;</p>
]]></content></entry><entry><title>Don't squeeze all your numbers into the one black box</title><category term="data mining"/><category term="musings"/><category term="software"/><category term="statistics"/><id>http://www.symbolix.com.au/blog/2009/9/29/dont-squeeze-all-your-numbers-into-the-one-black-box.html</id><link rel="alternate" type="text/html" href="http://www.symbolix.com.au/blog/2009/9/29/dont-squeeze-all-your-numbers-into-the-one-black-box.html"/><author><name>reidy</name></author><published>2009-09-29T02:00:47Z</published><updated>2009-09-29T02:00:47Z</updated><content type="html" xml:lang="en-AU"><![CDATA[<p><span class="full-image-float-left ssNonEditable"><span><img style="width: 350px;" src="http://www.symbolix.com.au/storage/post-images/spreadsht.jpg?__SQUARESPACE_CACHEVERSION=1253860192914" alt="" /></span></span></p>
<p>Spreadsheet packages store,  visualise and manipulate data, typically in the form of numbers.&nbsp;  In my role as a &ldquo;Data Cruncher&rdquo; I spend a lot of time delicately  pushing, pulling, twisting and squeezing numbers, and a lot of my work  is done right there in those spreadsheet packages.</p>
<p>Once I know what information I need to extract from the data, I set about doing what I do best&hellip;turning  data into information.&nbsp; After putting it through a series of checks  to ensure that the data actually contains the information I&rsquo;m after,  I get to have what I call fun.&nbsp;</p>
<p>There is an almost limitless  number of ways that data can be &ldquo;crunched&rdquo;, this also means that  there&rsquo;s no typical way to deal with the data, every job is unique.&nbsp;  For me this has the added bonus that for many jobs I require customised  tools to handle the data.&nbsp; The bonus comes from creating these  customised tools, I get to develop tools and methods which ensure the  data is handled and analysed correctly and efficiently.&nbsp;</p>
<p>Custom built tools allow for  a complete understanding of exactly where every piece of data starts  off, where it goes and what happens to it along the way to turning it  into information.&nbsp; The greatest disadvantage that a data cruncher,  like myself, can have is to work with a &ldquo;black box&rdquo; tool, one in  which you have no way of knowing what happens between one end of the  system and the other.&nbsp;</p>
<p>If you treat the numbers poorly  the information can come out the other end bruised and battered, sometimes  without any obvious signs of abuse.&nbsp; The tools I build to handle  your numbers have nice, clear perspex cases on them&hellip;meaning I can  see what&rsquo;s happening to your data, and stop it getting bruised.&nbsp;</p>
<p>Spreadsheets are a good platform  for data crunching, however with an expert user driving them coupled  to completely customised software they become transparent, efficient  data crunching powerhouses.&nbsp;</p>
<p>&nbsp;</p>
]]></content></entry></feed>