For those of you who still have a memory, undaunted by either age or drugs, there is a simple formula for determining the mean (average) value of a sampled set.
Take a measurement of a thing (X). The average value of X is equal to Mu (μ ). Say you have ten things, like the length of beans. One bean is 3 inches long. Another is 4 inches long. The others are 3.2, 4.2, 2.6, 3.7, 3.9, 2.1, 4.9 and 3.3 inches long. μ = the average value of all those lengths, or, the combined value of all those lengths distributed over the range of observations made.
If you make a single observation, the mean is equal to the observation. That is to say, if you measure one bean, the average length of that bean is equal to its length. While it would be true that the length of that bean is equal to its average length, the question should be asked, "why do we care?"
And, in fact, we don't.
The length of any one single thing is always equal to its mean (or average) value.
The question then becomes, how do we take a look at a class of things, in this case beans, and determine an expectation of value? This is important if we are designers of bean cans. If we intend to can whole beans it only makes sense to manufacture cans that allow us to put entire beans inside those cans. If we only intended to sell partial beans, can size is less important. Cut beans can pretty much be crammed into any size can we produce.
But let's take a look at the value of beans listed above. What are the chances that the values of bean length aren't the value of bean length (ρ, or rho)? Since we measured the beans, the chances of the lengths being different than the actual lengths is 1. This is a mathematical way of saying that the length of the beans is equal to the lengths of the beans. (Do you remember the Identity Property?)
The Sigma (Σ), or sum, or any equation when multiplied by 1 is equal to the sum. (1 is the Rho part. We measured the beans, so the probablity that the length we measured is equal to the length we measured. When a thing is a thing, the mathematical expression of this is 1.) So, the n=10 value of our beans is the sum of the length of beans divided by the number of beans we counted, or 34.9 inches (Sigma of X).
So, a restatement of the mean:
says, that the the sum of all the numbers χ (Chi), from the first number to the nth (last or final) number, when multiplied by the reciprocal, or inverse, of the amount of observations of n (our n, again is equal to ten) gives us the mean, or average of our observations.
So, we did the sum of the first x (shown with a subscript 1) plus the second x (with subscript 2) until we added all ten numbers together. The sum of x is 34.9, n = 10. The inverse of 10 is one over ten.
Or, 34.9 divided by ten.
Average length? 3. 49, or, rounding up, 3.5 inches.
How many of our beans were 3.5 inches long? Well, none. But we're not done yet. We're going to do something with our beans. We're going to can them, and to make sure we order cans that will allow us to fit entire beans by length into these cans, we need to order cans that will serve our needs, in most cases.
One of our beans was pretty long. 4.9 inches! If we made our cans to include this monster, our cans would bigger than would be necessary for more normal sized beans! Talk about a waste.
So how do we develop an understanding of what "normal sized beans" means in terms of our demand for cans?
Old guys who do math see this as a problem that can be solve with a question. How normal is the size of this sample of beans? Or, if we look at these beans as being demonstrative of the length of beans, how "normal" are these length values?
Didja ever wonder about the word "normal"? I'm either normal or not. Are you "normal"? And what are the attributes of normalcy that you must adhere to--voluntarily or not--in order for you to claim adherence to some outwardly conceived admission of normalcy?
When we examine beans, we have limited descriptive statements that can be used to determine the limits of what is or isn't a bean.
It's green. It's a longer, rather than fatter, vegetable. It is green. And it has a normal length.
But, what is a bean's normal length?
In our example of testing, we found an average length of bean to be 3 and a half inches long. (Which, if you know anything about beans, depending upon the type of bean you're growing, is a pretty average length!)
But, how "normal" is our average length, in terms of our sample?
This is a graph known as the normal Bell Curve. If you are reading this, I know that you've come across this curve at least once in your life. Mebbe it was when you took a standardized test back in high school and learned that you should end your life pumping gas. As a high-light.
There are variations on the "normally" distributed curves. These are known as "skewed" curves. If you have any idea how curves could be skewed beyond or above the normal mean value, you can skip the rest of this test. (Give yourself a B+.)
One of the more interesting questions that can be asked of the graph above is, what is meant by 1SDV, 2SDV...etc.
SDV in this case refers to Standard Deviation. ( I usually use the shorthand sd.) For all you deviates out there, this could be good news. We can measure how close to "normal" your "deviation" may be.
And cooler still, we can take your deviation, or the deviation in the length of beans, and determine, statistically, whether or not you, or the beans, fall within 68 percent of all weird deviancies--or bean length--or not.
How to do this?
First, we compute the mean for the data. We did this. 'Member the number? (3.5)
Then, we compute the deviation by subtracting the mean from each value.
3 inches long, another is 4 inches long, the others are 3.2, 4.2, 2.6, 3.7, 3.9, 2.1, 4.9 and 3.3 inches long. So, we get
3.5 minus 3.
3.5 minus 4.
3.5 minus 3.2...etc.
"Standard Deviation" isn't some magic number that only math guys can do. If we have a mean (average) value, then any value that we have in our sample that isn't exactly the same value of that mean (average) value, deviates from that value. Ain't the same, it deviates.
We're going to find out what the deviance is--the difference between each value in our sample and the mean--for each sampled value. And then we're going to "normalize" the difference of these sums. We're not simply going to take the average (mean) of the difference, we're going to take a look at the average value of the difference in terms of the mean.
Some of the differences that we came up with were negative. Our average was 3.5. Some of our beans were only 3.2 or 2.6 inches long. Because of the "identity" property of math, what we deal with is that differences that are "positive" or "negative" are erased, because what we're looking for isn't a value that is described as either negative or positive, but as an absolute.
(For any of you who don't get the idea that negative one times negative one is equal to one, give me a note. It took me two years of asking stupid questions of professors until I found one that took the time to take me past "doing the math" into understanding the math. )
So, we're going to take the diffences of each sampled value, less the mean, and come up with a number that we're going to square, in order to remove the negative sign...that is, to come up with an "absolute" value of the difference.
So you get differences like 0.5, -0.5...0.2. We're going to remove the postive and negative signs by squaring (n²) the differences. We've ten differences. We're going to "square" (multiply each difference by that same difference) each of the individual differences.
So, the first difference is -0.5.
What is -0.5 squared? It's -0.5 x -0.5. Or, 0.25.
The second bean length was 4.00 inches.
The difference is 0.5 inches. 0.5 x 0.5 equals 0.25 inches. (So, the "absolute" value of the diffence of two numbers is still the same!)
We go ahead and finish off the rest of the differences between our mean value and the actual value of our sample and find that the values of the differences are
So, what are the squares of these values?
We've already done the first two.
0.25 and 0.25.
What are the next eight? I'll do a couple, then you do the rest. Math isn't hard. And statistics is just math.
-0.3 squared? 0.09.
0.7 squared? (0.7 x 0.7) 0.49.
Did you find all the squares of the differences?
Cool. Here's a note.
Most of the kids I TA'd in beginning Stats didn't have the math to do this. College Algebra is a misnomer. It is neither "college" level or taught to a level that allows children to understand math. But, I digress.
So, what is the sum of our squares? (You're not doing the math. The answer is 6.05.) (Slacker.)
So, now we divide by n-1, or one less than the sample size. (Our sample size was ten.)
Or, 6.o5 divided by 9.
Shall we? 6.05 over nine is 0.672.
We have a number! We have a number!
But, what does this number mean?
Hella good question. See the graph?
We have one more step to take. Remember that we took the "squares" of the differences? One last step. We're now going to find the square root of the the average differences of our squares.
Yep. We transformed our diffences into squares and then we reduced that mean by one n less, and now we're going to find the square root of the difference.
What is the square root of 0.672? Remember, we're talking about the length of beans.
But, we don't deal with this. We ended up with three decimals to the right of the decimal. So, our answer is 0.82.
So, within the first standard deviation of our bean, we could add or subtract 0.82 inches to come up with a length that would fall in our first Standard Deviation.
The mean or average length of our beans was 3.49 inches. And we have found that if the lengths of beans are normally distributed, that 68 percent of all beans are within 0.82 inches of 3.49 inches, or at the low end of the range 2.67 inches and at the high end of the range, 4.31 inches.
But "normal" distribution has nothing to do with "normal" beans. All of our beans were within the same field, from the same seeds. Geographic distances were small. Field watering was consistent. But three of our beans, at 2.1 inches, 2.6 inches and 4.9 inches, fell outside of the first standard deviation of "beans" in our normally distributed curve.
Words that mathematicians and statisticians use have very explicit meaning. Otherwise, we wouldn't know what we were talking about.
Normal distribution refers to unbiased sampling results. If we found we had biased sampling results, we have a vocabulary to deal with that bias.
But normal in a mathematician's vernacular (a statistician is a mathematician) has to do with those things we would expect to be evenly distributed. As if they were random. That there isn't a bias.
And that is, my friend, the purpose for this post.
If you have been following the "Climaquiddick" or "Climategate" posts occuring around the intertubes, you may have noticed that a great deal of the criticism being brought upon the Global Warmers has been based upon criticisms of their mathematical models.
In my simple explanation of statistics in this post, we looked at the values of lengths of beans at one particular time. We determined an average (mean) length and the expected values for bean length within one standard deviation (sd). We found that if bean length was normally distributed around the mean, that the predicted value of 68 percent of the sampled bean length being within one sd of mean was confirmed. Three of the bean lengths were found to be outside of the predicted value of the sd.
What happens to a statistical model as the number of beans sampled in a statistical measurement increases? Are there tests to find out if our statistical findings are significant? Can we draw a statistical inference about the reliability of our measurements?
There are a lot of things that occur within the study of statistics. There are schools of statistics. Not schools, like George Washington University versus Texas Tech. There are fundamental assumptions of what we use for test for significance, difference and distribution that can vary widely and give us different results as to whether or not our findings are of any interest, or not. That is, there are closely held beliefs between certain schools of thought as to how some information should, or should not, be interpreted.
What I've shared with you today are the fundamentals of statistical analysis that are shared by most schools of statistical inference. There aren't two schools of what the definition of "mean" is. Nor are there two schools--or more--of what defines standard deviation.
But what does this analysis leave us with?
What would happen to our analysis of bean length if I'd only sample one bean, and it's length was 4.9 inches?
What would happen if I came back to this same field the following year and found another bean of 4.9 inches in length, and I decided to report my findings?
A coupla things would have happened. And this is why the discussions being undertaken by serious mathematicians, statisticians and climatologists are occuring around the globe.
There is significant attention being addressed by the "skeptics" at the methods that were employed by the "consensus" scientists who had gained the main stage on the debate of global climate change. One of these issues centers around the number of sites that were used in reporting temperature data. That is, in some instances, only one bean length was counted.
The "science" of statistics is readily accessible to all of us. If you have a background in calculus, obviously a lot of the equations are simpler. But if you rely upon algebra, there is nothing in this post that will stop you from determining for yourself that limits upon sample size will obviously affect the reported values of any statistical analysis.
Your choice is to do the math. Or, at last have enough conversance with the process that you can read a post about the statistical method and be conversant enough to follow the criticisms of the authors. The folks who are attempting to claim consensus rely upon you to have a certain deficiency in math tools to sway you with their moral suasion.
Please, don't fall for their planned moral suasion. It's easily falsifiable. But, it's up to you to understand the basics, and then ask questions.