# Sampling Distribution of the Sample Proportion as an Example of the Sampling Distribution of the Sample Mean

Proportion problems can be likened to a coin toss, where we think of a "heads" as 1, and a "tails" as 0. Proportions are related to percentages and probabilities, but proportions are (like probabilities) always given as numbers between 0 and 1. Examples include the proportion of

• Americans in favor of the death penalty,
• water samples that turn up E.Coli,
• paint samples that turn up lead,

etc.

If all samples of paint turn up lead, then the proportion of contaminated samples is 1; if no samples turn up lead, then the proportion is 0; and generally, the truth lies somewhere between!

You might wonder how one could ever know whether a coin is fair (that is, the chance of a head is the same as the chance of a tail). The truth of the matter is, you can't. All conditions might point to a fair coin -- it may be perfectly symmetric, etc. -- but you'll never know for sure. This points to the existence of a parameter, which we call $\left.\pi \right.$ , and which indicates the true underlying proportion of heads (say). Note well: this $\left.\pi \right.$ is not the same as the $\left.\pi \right.$ that plays such an important role in the study of circles.

From a "frequentist's" perspective, the only way to understand $\left.\pi \right.$ is to toss the coin forever and see what happens to the ratio of heads to tosses. That's how we find the underlying parameters of the coin:

$\pi =\lim _{n\rightarrow \infty }{\frac {\Sigma _{i=1}^{n}x_{i}}{n}}$ Then the variance of that coin is

$\sigma ^{2}=\lim _{n\rightarrow \infty }{\frac {\Sigma _{i=1}^{n}(x_{i}-\pi )^{2}}{n}}=\lim _{n\rightarrow \infty }{\frac {\Sigma _{i=1}^{n}(x_{i}^{2}-2x_{i}\pi +\pi ^{2})}{n}}$ or

$\sigma ^{2}=\lim _{n\rightarrow \infty }{\frac {\Sigma _{i=1}^{n}x_{i}}{n}}-2\pi \lim _{n\rightarrow \infty }{\frac {\Sigma _{i=1}^{n}x_{i}}{n}}+\lim _{n\rightarrow \infty }{\frac {\Sigma _{i=1}^{n}\pi ^{2}}{n}}$ (because $x_{i}^{2}=x_{i}$ for the coin toss); or

$\sigma ^{2}=\pi -2\pi ^{2}-\pi ^{2}=\pi -\pi ^{2}=\left(1-\pi \right)\pi$ Therefore

$\sigma ={\sqrt {(1-\pi )\pi }}$ Therefore, according to the theory of the sampling distribution of the sample mean, the parameters of the distribution of the sample mean are

$\mu _{\overline {x}}=\pi$ ,

and

$\sigma _{\overline {x}}={\sqrt {\frac {(1-\pi )\pi }{n}}}$ .

The proportion problem is interesting for (at least) two reasons:

1. once the mean is given, the standard deviation is known (somewhat unusual); and
2. Our rule for when the normal distribution assumption is valid casts a shadow on the old lie that $\left.n=30\right.$ is magic: for the proportion problem the rule we use is:
$0<\pi -3\sigma _{\overline {x}}<\pi +3\sigma _{\overline {x}}<1$ That is: $\left.n=30\right.$ won't save you: it's a rule of thumb, but it depends entirely on the underlying distribution of x....