One-coin-toss sampling technique
Providing cover for answering embarrassing questions
One way of providing anonymity to respondents in a survey is to provide them "deniability": their answer may or may not reflect their actual state or feelings. Herein we present one way of doing this: the one-coin-toss sampling technique.
We start with a fair coin (that is, a coin that comes up heads half the time and tails half the time, on average). But the fact that we should get half heads "on average" means that sometimes we won't get half heads. That randomness is going to cause there to be some noise (errors) in our calculations.
Then the question of interest is posed to the respondents: the question has two possible answers (one of which is "embarrassing" or "sensitive").
Respondents consult the coin to see what answer they should give to a question:
- if a coin lands heads, then a standing answer is given (the "embarrassing" answer);
- if the coin comes up tails, then the honest answer is given (which may be the embarrassing answer or not, depending on the actual state of the respondent).
This method provides cover to respondents so that they may answer embarrassing questions, but it does so at a cost: we lose half of our information (we lose "power"), since we expect that half of the answers are junk. For example, if we asked a room full of 100 people to use this method to determine the rate of space aliens in the population, about 50 would admit to being space aliens. If exactly 50 tossed heads, then we would correctly deduce that 0% of people are space aliens. If more or fewer heads came up, then we would either deduce a small positive or negative rate of alienness -- but we know that, in any event, this is merely an estimate (I'm assuming a zero rate of alienness, but I may be wrong!).
Let's see how the calculation proceeds via an example or two.
Example One
Let's consider the space alien question mentioned above. Suppose that 52 respondents report that they're space aliens. We want to estimate the true rate of space-alien-ness in the population.
We expect 50 junk answers, if there are half heads and half tails -- hence we would expect to see 50 reports of space aliens just because we expect 50 heads to be tossed. Those don't really contain any information -- the respondents were obligated to reply "Yes, I'm a space alien." So we want to throw out those 50 junk answers. So we calculate the true number of yesses as
(true yesses)
Now, that's 2 out of how many? Not 100, because we don't have 100 good pieces of information. Half of our information is junk, so we also throw out the expected junk respondents -- 50 -- from the total group of respondents:
(good data)
Then we estimate the rate of space-alien-ness as
or 4% space aliens in the population.
Now that might seem a little high: but that's how the coins flipped; that's how the cookie crumbled. The randomness has caused us to get the wrong answer; but the good news is that we're in the ballpark.
The calculation simplified
We can do our calculation a little more easily if we work it out in general: suppose that we have N respondents, and that they toss their coins and we obtain Y yesses. What is the estimated rate?
Half of our answers are junk: that works out to N/2. Not only that, but they're all yesses. Hence
(True Yesses)
(Good Data)
and the estimated rate of our embarrassing phenomenon is
(estimated Rate)
So this is our formula:
estimated Rate
For purposes of comparison with the two-coin-toss sampling technique, note that this can be rewritten as estimated Rate (the two-toss method has formula estimated Rate ).
Example Two
It's possible for the estimated Rate to be negative. What do you suppose that you should do in that case? (Zero!)
Let's reconsider the space alien question from above. Suppose that 48 respondents report that they're space aliens. We want to estimate the true rate of space-alien-ness in the population.
According to our formula,
estimated rate
We find it very hard to believe that there is -4% alienness in our environment. Since negative answers don't make any sense, we interpret them as zeros.
Example Three
One more example: let's suppose that we're looking for the true rate of AIDS in a population, where we expect 5% of the people to have the disease. We sample 500 people using the one-coin-toss technique, and 263 people answer yes, that they have AIDS. We estimate the true rate of AIDS to be
estimated rate
We estimate that 5.2% of the population has AIDS.
Cover costs
With 263 people answering that they have AIDS, do you think that the 13 who truly had AIDS would feel comfortable "disappearing into that crowd?"
The amount of cover that we provide to those who most require anonymity must correspond to the level of fear that they have of being discovered. But cover in this case means that we're losing data -- half of it. And that reduces our power to correctly estimate the rates that we're after. Can we do better? Yes! Surprisingly, there's a technique which requires two flips of the coin, but it permits us to use all the data. We might call it the two-coin-toss sampling technique.