a lot of people are having troubles with the median question. IMHO, the question in the HW is testing your ability to write a procedure, not your ability to understand the meaning of the median function. Dave explained what the median function is but, unfortunately, he chose values for the examples in which the median was also the average so it has created some confusion. Let's rephrase his explanation.
Repeating Dave's words verbatim: ".. and outputs the median which is the middle value of the three numbers". When Dave says "middle" he is refering to "order". Now.. say that I give you 5 numbers, say '5 1 800 3 6'... what is the median? Well.. since you need the middle value you first put them in order, ie., sort them: "1 3 5 6 800". Now you have 5 numbers so the middle value is, well, the one in the middle.. the 3rd value. It does not really matter if you sort the values in ascending or descending order.. sorting them in descending order gives you exactly the same answer. The median of '800 6 5 3 1" is, again, the 3rd number.. the one in the middle, i.e., 5
Now, let's repeat Dave's examples and add a few more:
The homework is asking you to do this for a set of 3 numbers. you have to decide which one is, as Dave said, in "the middle".
Additional long-winded explanation that gives you a sense of what is the purpose of the median function and how to find its value when you have an even number of elements:
Do not worry about the case when the number of values in your set is even. You are not being asked about this. Still, fyi, If you have an even number of values in your set which one do you choose? In this case you can define the median in different ways: it can be the (n/2)-th value, the ((n+1)/2)-th value or it can be defined to be their average. Thus, depending on your definition you can have
Strictly speaking, the correct definition of the median would have been the third one (i.e. the average of the 2 middle values) but in practice, any of the 3 definitions is fine because the merit of the median filter is not to disambiguate the exact value in "the middle" but to discard values on the fringes, also known as outliers. For example, suppose that you want to find a representative value for the anual income of the players of my all-time favorite basketball team: the Chicago Bulls in their golden era (these values are made up.. I probably am insanely off):
income(M. Jordan) = 20 million
If you choose the aveage as the representative income you will get 23.95/6 = 4 million. However, this value does not represent the income of the players at all. Instead, it is totally inflated because of the income of a single person: M. Jordan. In other words, you have an outlier. A much better representative is found using the median. In this case you have:
Now, the median will be
The point here is that the exact value of the median, regardless of the definition is, in practice, not important. 0.8, 0.95 and 0.875 are all values that are equally good representatives of the income of the players and any of them is a much better representative than 4 million. The value of the median was to help us get rid of the incomes that were disproportionatelly high or low. By the same token, the median will help you find that the wealth of a resident of Redmond, WA is more likely to be around $1M given by the median. The value given by the average of, say, $50 million, would have been disproportionally inflated by the wealth of the top executives of MS, which are not representatives of the community at all.
Thank you very much, great post.
answered 04 Mar '12, 21:50
Juan Carlos ...
Ohhh... Until now, I had confused mean with median. Thank you very much for posting.
answered 04 Mar '12, 22:01
I know I'm late to this, but I am still confused about the case where there are two identical numbers, for example (7,8,7). I have found the definition of median as "the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half." So how do we decide if the second 7 is in the lower, or higher half? If the second 7 is the median, it is not true that the first 7 is below it, so 7 is not really between the two halves.
answered 11 Mar '12, 16:43