MadSci Network: Genetics
Query:

Re: In a Chi-Square Table, why do the numbers go up as % certainty increases?

Date: Wed Jun 1 10:34:21 2005
Posted By: Mark Huber, Assistant Professor, Mathematics and Statistics
Area of science: Genetics
ID: 1117204700.Ge
Message:

Question: In a Chi-Square Table, why do the numbers go up as %
certainty increases?
From: Beth
Grade: 10-12

To answer your question, let me start by looking at a typical example of where a chi-squared statistic might be used.

Suppose that I have the following model that I want to test. If you only could choose one of Coke, Diet Coke, or Vanilla Coke to drink, 70% would prefer Coke, 25% prefer Diet Coke, and 5% prefer Vanilla Coke. I talk to 100 people selected at random and get back the following results: 75 prefer Coke, 14 prefer Diet Coke, and 11 prefer Vanilla Coke.

OK, so given this data, how much confidence do I have in my model? If my model was correct, on average I would have 70 Coke, 25 Diet, and 5 Vanilla drinkers, but since I chose people at random I expect there to be some variation. The question is: how do I tell if there is too much variation?

The chi-square statistic is one way to check such things. First, look at the differences between the actual data and the expected data: (75 – 70) = 5 for Coke, (14 – 25) = -7 for Diet, and (5 – 11) = -6 for Vanilla. Next, we are only interested in the size of the difference between the actual data and the expected value, so it shouldn't matter whether or not the difference is positive or negative. One way to make sure that the numbers are positive is to square them. So (75 – 70)^2 = 25 for Coke, (14 – 25)^2 = 49 for Diet, and (5 – 11)^2 = 36 for Vanilla.

Right now Diet seems to be the largest squared difference with 49, but Diet also had a larger expected value. Divide the squared difference by the expected value to get the final stat for each type: (75 – 70)^2 / 70 = .357 for Coke, (14 – 25)^2/25 = 1.96 for Diet, and (5 – 11)^2/5 = 7.2 for Vanilla. Add these all together to get about 9.517.

If this number is too big, that means that the differences between the data and my model predications are very large, which would make me more confident that my model is wrong. This is why, in your chi-squared table, as the level of confidence rises, the numbers increase, indicating a larger difference between your model and your data.

In this example, there are two degrees of freedom, because there were 3 choices/categories total, but if I told you any two of the data numbers (say 14 Diet and 11 Vanilla) you could figure out the third number (75 Coke). Looking at my table (actually I used a computer program that figures out such things), I see that for a chi-squared statistic of 9.517, I can be 99.14% confident that my model is wrong.

The chi-squared statistic tables is really only accurate for larger n than the 100 people sampled here, but hopefully this gives you an idea why large numbers lead to larger confidence in rejecting the model.

Mark Huber
www.math.duke.edu/~mhuber


Current Queue | Current Queue for Genetics | Genetics archives

Try the links in the MadSci Library for more information on Genetics.



MadSci Home | Information | Search | Random Knowledge Generator | MadSci Archives | Mad Library | MAD Labs | MAD FAQs | Ask a ? | Join Us! | Help Support MadSci


MadSci Network, webadmin@madsci.org
© 1995-2005. All rights reserved.