Stop talking about “statistical ties”: What are the actual chances your favorite VP will win?

Note: I do some statistics here. If that bores you, skip to the very end of the page for the answer to the question in the title. Then read the whole post.

Comparative-VP_FCA23F37C2544692B0859AA790615323

Rappler reports that according to the latest Pulse Asia survey, Bongbong Marcos, Leni Robredo and Chiz Escudero are in a “statistical tie” with each other. The term “statistical tie” here refers to the three candidates’ proportions being within the margin of error of the poll when compared to each other. Marcos has 25% while Robredo and Escudero have 23%. Pulse Asia reports the margin of error for their 1,800-person survey to be 2% at the 95% confidence level, so due to sampling error alone, if the elections were held the day of the survey, there is a 95% chance that the interval between 23% and 27% will include Marcos’s true vote share, and there is a 95% chance that the interval between 21% and 25% will include Robredo’s or Escudero’s true vote share. Since these intervals overlap, the three are therefore said to be “statistically tied.”

This is bullshit.

There is no such thing as the “margin of error of a poll”.

Where does the 2% figure that Pulse Asia is calling the margin of error of the poll come from? It is a shorthand for 1.96 multiplied by the upper bound of the standard error of a proportion from a survey question with only two choices.

I’ll unpack that. Suppose we have only two candidates, Ruby and Sapphire, and we conduct a poll among 1,800 people asking whether they would vote for Ruby or Sapphire if the elections were to be held right now. Also suppose that everyone answers either Ruby or Sapphire; there are no “don’t know” or “abstain” answers, and no one refuses to answer the question.

Suppose our poll yields 1,100 people going for Ruby and 700 people going for Sapphire. This means approximately 61% of people are going for Ruby and 39% of people are going for Sapphire.

This looks good for Ruby. However, we’re not done yet. Knowing that we only polled 1,800 people out of tens of millions of eligible voters, we need some way to estimate how much our 61-39 results could change if we could do the exact same poll on the exact same day over and over again with a different group of 1,800 people. This estimate is known as the standard error. It is calculated as follows:

\sqrt{\frac{0.61*0.39}{1800}}

This gives us 0.0115. We then multiply this standard error by 1.96 to get our margin of error. Why 1.96? 1.96 is the magic number if we specifically want a 95% confidence interval around our reported vote share. In this case, 1.96 multiplied by 0.0115 is 0.02254, which we can just round to 2.2%, or 2%. We can then say that there is a 95% chance that the interval between 59% and 63% contains the true percentage of voters who will opt for Ruby, and that there is a 95% chance that the interval between 37% and 41% contains the true percentage of voters who will opt for Sapphire. This looks pretty bad for Sapphire.

What if we want to know the margin of error for our question before we even ask it? Technically we can’t do this, but we can figure out what the highest possible margin of error would be. All we have to do is to assume an imaginary survey where Ruby and Sapphire both get exactly 50% of the vote. Then

\sqrt{\frac{0.5*0.5}{1800}}

will be the highest possible standard error, and 1.96 multiplied by that will be the highest possible margin of error. It turns out that in relatively close races, there isn’t much of a huge difference between this upper bound and what we might actually get. The above equation evaluates to 0.0118, which for all intents and purposes is the same as 0.0115. We aren’t measuring airplane parts here – having precision to the fourth decimal place is an illusion. When all is said and done this still gives us a margin of error of 2%. Voila, that’s where Pulse Asia and other surveys such as SWS get their margin of error.

There are therefore two major problems with that 2% figure and other similar reported figures:

1.) The above is the margin of error for an estimated proportion, where there are only two choices – not the margin of error of an entire poll, which doesn’t make any conceptual sense unless your entire poll consists of nothing but questions with only two choices.

2.) The vice-presidential race – and the presidential race, for that matter – has more than two choices!

We have no less than six people in serious contention for vice-president: Marcos, Robredo, Escudero, Cayetano, Honasan and Trillanes. We also have a seventh choice: “don’t know/refused/none”, which was the answer of 4% of respondents in this survey.

When we’re playing with things like margins of error, what we’re really doing is trying to test the hypothesis that what looks like a difference between two candidates’ vote shares isn’t different enough from 0 to be worth talking about. This means that in the case of a question with seven choices, we need a separate margin of error for every pair of candidates.

Let’s work with our real data now. From the Pulse Asia survey, Marcos has 25%, or 0.25, and Robredo and Escudero have 23%, or 0.23. The formula for the standard error of the difference between Marcos and Robredo (or Escudero) if there are more than two choices is:

\sqrt{\frac{0.25(0.75) + 0.23(0.77) + 2(0.25)(0.23)}{1800}}

which gives us 0.0163. Multiplying that by 1.96 gives us 3.2%, or 3%.

This means that there is a 95% chance that the interval between 22% and 28% contains the true vote share for Marcos, and a 95% chance that the interval between 20% and 26% contains the true vote share for Robredo. Meanwhile, applying the formula to find the margin of error for the difference between Escudero and survey fourth-placer Cayetano will give us 2.7%, so it seems safe to say that Escudero is far ahead of Cayetano.

Okay, but your margins of error are larger than Pulse Asia’s. That means Marcos, Robredo and Escudero are in fact even more “statistically tied” than ever, right?

Yes, it does, and that’s meaningless. I haven’t actually gotten to why being “statistically tied” is bullshit yet – I just explained why the way we’ve been thinking about the margin of error has been wrong all along.

Here’s the thing, see – if two candidates are “statistically tied”, it just means that we are less than 95% certain that a bunch of confidence intervals don’t overlap. It does not mean that two statistically tied candidates have an equal chance of winning. Marcos and Robredo/Escudero’s vote shares in this survey may be very close, but Marcos’s 2-point lead means that he does in fact have, at least according to this survey, a greater probability of winning the vice-presidential raceif the election were to be held that day and if the survey is properly representative of the voting population.

Rather than mess around with complicated formulas, the best way to get a close approximation of what exactly each vice-presidential candidate’s chances of winning are given this Pulse Asia survey is to simulate. Formally, the situation where we have 1,800 people each selecting one of 7 possible categories follows what is called a multinomial distribution. We can simulate 1,000,000 surveys of 1,800 people each using the proportions reported by the survey with the following code in the R programming language:

set.seed(9999)
simulation <- rmultinom(1000000, 1800, c(0.25, 0.23, 0.23, 0.14, 0.06, 0.05, 0.04))

where 0.25, 0.23, 0.23, 0.14, 0.06, 0.05 and 0.04 are the vote shares for Marcos, Robredo, Escudero, Cayetano, Honasan, Trillanes and “don’t know/refused/none” respectively. I’ve assigned the results to the variable called simulation. This will give us 1,000,000 possible outcomes for 1,800 people going to the polls where each individual’s chance of choosing one of the seven candidates is given by the survey results. The line with set.seed is there to ensure that the outcome of the simulation, which is a random process, can be reproduced if someone else runs this code.

One possible outcome might look like this, with Marcos winning:

marcoswins

While another possible outcome might look like this, with Robredo winning:robredowins.png

Or this, with Escudero winning:

escuderowins.png

The following code goes through each of the 1 million simulations, one by one, and returns a table of how often each candidate won:

table(apply(simulation, MARGIN = 2, FUN = which.max))

which results in the following:

simtable.PNG

Out of 1 million draws, Marcos won 821,550, Robredo won 89,411 and Escudero won 89,039. Translating these into probabilities, if the election were held that day and if the survey was properly representative of the voting population, Marcos has an 82% chance of winning the vice-presidency, while Robredo and Escudero have a 9% chance each of winning, and all of the other candidates have a 0% chance of winning.

This is good news or bad news depending on your preferences. My aim was to show, however, that talking about “statistical ties” is not a useful or realistic way of gauging candidates’ chances.

EDIT: Some people have been asking for more details on the simulation: if I simulate 1,000,000 draws from a probability distribution with certain parameters, won’t I just get those parameters back?

The answer is yes, I will get those parameters back. That’s why I do 1,000,000 trials, in order to ensure that the empirically simulated results are extremely close to the theoretical distribution.

What I do above is not to determine what the probabilities are from the simulation. The probabilities are the vote shares obtained through the survey. We already know that if I simulate enough times, we will get those probabilities back.

I am not computing vote shares – I am using a distribution derived from vote shares in order to compute the probability that each candidate will have the most votes.

In mathematical terms:

Let p_1, p_2, p_3, p_4, p_5, p_6, and p_7 be each candidates’ vote shares (0.25, 0.23, 0.23, 0.14, 0.06, 0.05 and 0.04) respectively.

Now let X_1, X_2, X_3, X_4, X_5, X_6, and X_7 be random variables that are obtained from one instance of a multinomial distribution with parameters p_1, p_2, p_3, p_4, p_5, p_6, and p_7.

I use the simulation to compute P(X_1 > X_2) \cap P(X_1>X_3) \cap P(X_1>X_4) \cap P(X_1>X_5) \cap P(X_1>X_6) \cap P(X_1>X_7),

which is the probability that X1, here meaning Marcos, would lead all other candidates. For Robredo, it would be

P(X_2 > X_1) \cap P(X_2>X_3) \cap P(X_2>X_4) \cap P(X_2>X_5) \cap P(X_2>X_6) \cap P(X_2>X_7). Etc.

Theoretically, these probabilities are what we would get if we repeated this survey an infinite number of times. In practice, 1,000,000 is large enough to approximate this very well.

If the difference between vote share and probability of winning confuses you, think about it this way: If Trillanes gets 5% of the vote in a survey, it doesn’t mean that Trillanes’s probability of winning is 5%.

Advertisements

2 thoughts on “Stop talking about “statistical ties”: What are the actual chances your favorite VP will win?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s