I could not understand the following: We have the whole population of people. We sample N=100 people. We calculate the mean mu.
Thanks in advance for your help! I have been wondering about that for quite some time. Radoslav Ste... Retagbot ♦ 
You can find a lengthy discussion of the smallsample N/(N1) multiplier in Correction Factor. Briefly, we use it when estimating the population variance from the sample. If we want just the sample variance, or if we are computing the population variance from the population itself, there is no need for the correction factor. When speaking of the variance of the sample, we compute the average squared deviation from the sample mean. It is not an estimate. (Unless, of course, we are theoreticians proving theorems about estimates of the sample variance! :) In this course, if we speak of estimates they are usually estimates of population parameters. However, sometimes Sebastian will ask you to estimate something about the sample statistics rather than the population statistics. For N>30, it makes little difference whether we do a correction or not, which is why Sebastian is bypassing discussion of it in Unit 24. To be rigorous, use N1 in computing the variance estimate if you are averaging squared deviations from the sample mean; use N if using the true population mean. Those are unbiased estimators of the population variance. (The sample standard deviation, however, is not an unbiased estimator of the population standard deviation. See Unbiased estimation of standard deviation if you really want to get into it.) Kenneth I. L... Thanks for the answer! But then it is best to always use the unbiased estimator, right? I mean, if we are interested in the real populations parameter.
(24 Jul '12, 17:13)
I just saw the second answer and it answered my question when to use the maximum likelihood estimator and when the unbiased estimator, so disregard my previous post. Thanks, guys!
(24 Jul '12, 17:16)
1
@Ken quote: "To be rigorous, use N1 in computing the variance estimate if you are averaging squared deviations from the sample mean; use N if using the true population mean. Those are maximum likelihood estimators of the population variance." I believe this not entirely correct. See my earlier post. With N1 you do have an unbiased estimator, but it is not the max likelihood estimator. Using N instead gives you the max likelihood estimator, which however is (slightly) biased.
(24 Jul '12, 17:22)
Again, in practice the difference is negligible. Theoretical perspective: if you use a max likelihood estimator, you have the biggest chance to come up with the right parameter. But if you use an unbiased estimator, and doing many estimates, you're right on average. If you draw a graph of the distribution of the estimator, the max likelihood gives you the point where it has its highest point, while an unbiased estimator gives you the mean value.
(24 Jul '12, 17:33)
Hi, MrBB. You caught me! I posted my answer, saw yours that was posted minutes before, and changed my ML to "unbiased," all within about thirty seconds. I agree that N gives a maximum likelihood but biased estimate if using the sample mean to estimate the population variance. It is biased because the sample clusters more closely around its own mean than it does around the population mean, so we have to "give up one degree of freedom" and broaden our estimate by N/(N1) to compensate for the bias. If we know and use the true population mean, however  which is what I said in my post  the computed sample variance already includes that extra spread. We don't have to "give up a degree of freedom" and don't have to use a correction factor. I haven't verified that this is the MLE solution, but it seems correct  and the N1 version seems definitely incorrect for this case. For nonstatisticians trying to follow this: Remember that the maximum likelihood parameter estimates are those that [jointly] give the greatest probability of drawing the observed sample values, over the space of any parameter values that we could have chosen. This is similar to a Bayes' Rule computation: we start with observed data, but we are estimating the parameter that best explains how that data might have been generated. Unfortunately, the MLE is not always an unbiased estimator. (It unbiasedly estimates whatever it estimates, of course, but that may differ slightly on average from the mean or variance of a Gaussian or other source population.) Sometimes we prefer the MLE, sometimes an unbiased estimator, sometimes some other kind of estimator. Often it makes little difference; in other cases we choose one that is easy to compute; but for very difficult problems it may be necessary to choose just the right theoretical constructs.
(24 Jul '12, 18:38)
Hi, Radoslav! As you suggest, it's usually best to use the unbiased estimator  except for theoretical work in which you wish to maintain the MLE property throughout a series of computations. However, note that you don't use the N/(N1) correction if computing the variance of the population itself. (Nor would you use if if reporting the sample variance, independent of estimating the population variance.) Software developers and calculator designers have sometimes built in the smallsample correction and sometimes left it out. Usually they leave it out, unless they can provide both. (The correction causes a division by zero for samples of size 1, which is rather inconvenient.) In any case, it's something we need to be aware of when dealing with other people's work. One more rabbit hole to trip into.
(24 Jul '12, 18:56)
So guys, let me try to get it right: We have a population with size N = 30. In this population there are only 3 values  1, 2, 3. Each of these values has equal representation in the population. Therefore, from these 30 points, 10 have the value of 1, 10 have the value of 2, and 10 have the value of 3. Assume that we do not know anything about the population. We want to find the true population variance. We take a sample of 5 data points and it turns out to be 2,1,1,3,1. See that in this case the biased variance is closer to the true population variance than the unbiased.
sigma^2 = ((22)^2 + (12)^2 + (12)^2 + (32)^2 + (12)^2)/5 = 4/5 Now, we see that the estimate of 4/5 is NOT equal to the true population variance of 2/3, but is close. QUESTION 1: When we know the true population mean mu and we have a sample, is it best to disregard the sample mean and variance and just use the difference between the sample data points and the mean?
However, imagine that we can take only ONE sample. If it turns out that the sample variance is already close to the true population variance, the multiplication with n/(n1) will make the estimate EVEN LARGER than the true population variance, like in the example above. If we have only two data points the use of n1 doubles the estimate! QUESTION 2: When we estimate the true population variance using the UNBIASED variance of a SINGLE sample, don´t we take a large risk to get an unrealistic large estimate? QUESTION 3: However, if we use the BIASED variance of a SINGLE sample, I realize that the sample variance is generally going to be lower than the true population variance. For what sample sizes is it better to use the biased and for what sample sizes the unbiased? If you have a sample size of 5, which estimator would you prefer to use  the biased or the unbiased? Why?
(25 Jul '12, 08:13)
I think in your first example you just showed that if you had used the true population mean as the pivot to determine your sample variance you get the unbiased sample variance, or something close to it. (Note that what you are computing in your first example is actually the 'variance' of your sample w.r.t. the true population mean. It's not the true population variance which remains at 2/3.) When your sample set is smaller than the total number of data values that your sample set takes (in this case 3), there is no guarantee of what results you'll get! At the very least, your sample size should be large enough to be somewhat capable of being representative. But yes, you do not know this before hand. Moral being that with restricted samples you can pretty much blow your confidence intervals out of the water. Anything is possible then.
(25 Jul '12, 10:20)
I think everything you do is correct (though didn't chech each calculation in detail). Q1: you can best disregard the sample mean, and indeed use mu instead with N=5. Now you have an estimator that is max likelihood and unbiased at the same time: best of both worlds. Last point, but guess that speaks for itself, from one specific sample outcome example (like the one you use in your calculations), you cannot derive general conclusions. There is samples where the ML estimator gives you the best result and there is samples where the unbiased estimator gives the best result. In statistics it is all about averages and expectations....
(25 Jul '12, 10:35)
I'll leave the discussion to younger, clearer heads. We are getting into graduate level issues, best addressed with formulas and proofs. In discussing single samples, note that "sample" is ambiguous. It can refer to either a single draw from a random population or to a set of such draws collected together. To reduce the ambiguity, we can speak of "a single draw" or "a sample of one." "Single sample" then refers to a collection of draws, although I wouldn't trust the phrase. If you use the sample mean and compute the N/(N1) unbiased population variance estimate for a sample of one, you get a division by zero. In other words, it can't be done. A statistician might say that you have no degree of freedom with which to compute the variance estimate. If you have a sample of two, you can estimate the population variance but it will be a lousy estimate. At these extremely small sample sizes, we get into the Hiawatha problem of estimators missing the target completely. Roshan is correct that any observed sample is likely to be misleading if it is too small to represent the variability in the population  though I wouldn't suggest that samples from continuous distributions need to contain an infinite number of draws. For known distributions, it is possible to calculate the sample size needed to achieve specific confidence levels. For general work, statisticians have rules of thumb about samples sizes they would recommend. Typically a sample of 20 is considered acceptable, and typically a statistician  or at least a statistics student  will fudge that down to about 5 if necessary.
(25 Jul '12, 12:47)
This forum is extremely cool :) Thanks to all for the great input!
(25 Jul '12, 15:29)
showing 10 of 11
show 1 more comments

There is a famous tale in free verse concerning Hiawatha shooting arrows. The rather overstated lesson is that one may have to choose between a biased process that generally hits to one side of a bullseye and one that is perfectly centered and unbiased but so widely scattered that arrows seldom hit the target board at all. See Hiawatha Designs an Experiment by English statistician Sir Maurice George Kendall (1907  1983). Kenneth I. L... 
mrBB4 