The latest surveys by SWS and Pulse Asia are out. Here are 5 things that social scientists, journalists, policymakers, political strategists, and other people who care about these sorts of things should keep in mind when reading the press releases:
1. Technically, each question has its own margin of error.
The margin of sampling error is calculated using the formula for the standard error of a binomial proportion multiplied by 1.96:
where is the percentage of some response to a question and is the sample size. For example, if 80% of people in a survey of size 1,500 said that they were satisfied with President Duterte, then is 0.8 and is 1,500. The margin of sampling error would then be or 2%.
Each question will have a different , therefore each question has its own margin of sampling error. It is generally considered too bothersome for the purposes of a press release to give every single number its own margin of sampling error. Therefore, press releases err on the side of caution.
The largest possible value for the margin of sampling error is achieved when . This will result in a margin of sampling error of 2.5% for a survey of size 1,500, as with SWS, or 3% for a survey of size 1,200, as with Pulse Asia. Since it is better to assume more uncertainty than less, pollsters simply take the maximum margin of sampling error and say that it’s the margin of sampling error for every question.
2. The margins of error for “net” questions and for changes over time are ~much~ larger.
SWS in particular likes to report “net” statistics. For example, they report that President Duterte has a “net satisfaction rating” of 48, which is calculated by taking the % who said they were satisfied minus the % who said they were dissatisfied.
The formula for calculating the margin of error from a “net” statistic – technically, the standard error of the difference between two proportions from a multinomial distribution multiplied by 1.96 – is as follows:
where, for example, would be % satisfied and would be % dissatisfied. Again, this means that each question would have its own margin of error; and again, for simplicity, we would just assume the maximum margin of error and apply that to all questions. The maximum margin of error is achieved if both and are 0.5. For a survey of size 1,500, then:
, or 5%.
The margin of error for a net statistic is twice the reported margin of sampling error. Keep this in mind when reading SWS reports. Pulse Asia isn’t a fan of reporting net statistics.
SWS also likes to report the change in the “net” over time. For example, it reports that President Duterte’s net satisfaction rating fell from 66 in June 2017 to 48 in Sept 2017. Guess what? The margin of sampling error for the change in the net statistic is even larger.
Let and be % satisfied and % dissatisfied in June 2017, and let and be % satisfied and % dissatisfied in Sept 2017. Both surveys have the same sample size of . Then the margin of sampling error for the 18-point change in Duterte’s net satisfaction rating is:
The maximum value of this margin of sampling error is when all of those p’s are 0.5, which comes out to 7.2%.
So if you want to look at the % of people in a single survey who were satisfied, then the margin of sampling error is 2.5%. If you want to look at the “net satisfaction”, the margin of sampling error is 5%. If you want to look at the change in the net satisfaction over two periods, the margin of sampling error is 7.2%.
3. The reported margin of error is almost certainly too low.
As stated above, SWS reports a margin of sampling error of 2.5% for their surveys of size 1,500, while Pulse Asia reports a margin of sampling error of 3% for their surveys of size 1,200. These numbers are accurate according to the formulas above. They are also wrong.
The formulas above all assume independent observations from a simple random sample. That is, they imagine a process where we have a list of every single eligible adult in the Philippines, and we randomly pick 1,200 or 1,500 people from that list and interview them.
That list doesn’t exist.
Instead, what polling firms do is they take a list of every single municipality and city in the Philippines, and select some number of them to conduct interviews in. Municipalities and cities with more households in them are more likely to be selected.
Then they take a list of every single barangay in the selected areas, and select some number of them to conduct interviews in. Barangays with more households in them are more likely to be selected.
Then they select five households in each barangay to interview by choosing a random starting point and sampling every 7th house (or 5th, or whatever, depending on the size of the barangay).
Finally, they knock on the door of a household, ask how many people age 18 and above live there, and choose one of them at random to interview.
Now the issue is that people from the same area are more likely to share the same opinions than people who aren’t from the same area. In other words, a sample of 1,500 obtained like this doesn’t actually contain 1,200 independent observations; it contains 240 clusters of 5 people each whose opinions are somewhat more similar to each other.
The formula for calculating the margin of sampling error from a multi-stage cluster sampling design like this is quite complicated. However, it will certainly be larger than what is reported. Unfortunately, I cannot recalculate the margin of sampling error myself, because I would need to know the mean response for every single cluster, and that isn’t happening without access to the raw data.
As an example, however, consider the survey conducted by Princeton Survey Research Associates International on behalf of the Pew Research Center in the Philippines. (Click the link and search “Philippines”). The survey had a sample size of 1,000. If SWS or Pulse Asia had done that survey, they would have reported a margin of sampling error of 3.1%; however, taking clustering into account, PSRAI reports the margin of sampling error to be 4.3%.
4. There is no consistent, objective measure of socioeconomic class.
Both SWS and Pulse Asia like reporting out statistics for “Class ABC”, “Class D”, and “Class E”. There is no accepted definition as to who falls in what class. I emailed Ronald Holmes, Pulse Asia Research President, asking him about how Pulse Asia determines whether a respondent is in class ABC, D, or E, and he replied:
A number of factors are used to classify households that are sampled. These factors include total household income; household facilities/furnishings; occupation of household head; educational attainment; home ownership; home maintenance; durability of the home; and, conditions of the neighborhood, among others. The indicators are culled from prior social science/market research and our enumerators document these indicators for subsequent classification into socio-economic classes of the sampled respondents/household.
There are exact criteria but there are also other criteria subject to the judgment of the enumerator. We do regularly ask about occupation of the household head, educational attainment and home ownership but not regularly on household income.
In other words, the field staff have to make some judgments about aspects of a person’s socioeconomic class via observation, and then SWS and Pulse Asia determine socioeconomic class via some index that may differ between the two firms.
The Class ABCDE construct is itself largely an invention of market research. For example, Nielsen divides the population of the Czech Republic into eight classes – A, B, C1, C2, C3, D1, D2 and E – and fits a regression model on variables such as household composition, occupation of the household head, household equipment, household income, education of the household head, and region of the country to assign each person a ‘score’. The top 12.5% are considered class A, the next 12.5% class B, and so on until the bottom 12.5% go to class E, such that an equal number of people are in each class.
However, SWS and Pulse Asia do not do it this way. I do not know exactly how they construct their socioeconomic class measure, but class ABC typically makes up less than 10% of the sample. This implies larger margins of error for statistics calculated over class ABC only, just like how Luzon, Visayas and Mindanao have larger margins of error.
For example, if we assume that only 150 out of 1,500 people in an SWS survey are class ABC, the margin of sampling error for class ABC would be 8%. The margin of sampling error for the change over time between two measures would be 11%.
As previously discussed, the margin of sampling error for a “net satisfaction” rating would be 8% doubled, or 16%. The margin of sampling error for the change over time in the net satisfaction rating would be a whopping 22.6%. This means that all but the most cataclysmic change in the net satisfaction rating of class ABC would be within sampling error.
Class D typically makes up about 60% of a sample, while class E typically makes up about 30% of a sample.
5. The margin of sampling error is purely statistical; it does not include error that comes from contextual factors, from undercoverage, or from nonresponse.
All sorts of things can affect what response someone gives to a question. Here’s a (picture of a) slide from the University of Michigan summarizing these things:
Survey research literature has found, for example, that:
- Older people and people with less education tend to give more agreeable responses (satisfied, approve, trust, etc.), regardless of what the question is asking about.
- Many respondents will give the “socially desirable” answer rather than their true answer when asked about topics such as how often they smoke, or their views about poor people, etc. This is exacerbated when a live interviewer is present, as is the case with SWS and Pulse Asia, but is also present when the respondent is speaking over the phone. It pretty much disappears online where the respondent has more anonymity.
- Many respondents will also give “socially desirable” answers if the interviewer looks like they might prefer it, or if someone else is present in the room. Interviewers generally request that they be allowed to survey someone alone, but they can’t really force it. If you are interviewing someone about their trust in Duterte, and their spouse is in the room wearing a Duterte shirt, then you can expect that they will indicate trust in Duterte regardless of what their true beliefs are. Some LGUs also insist that surveys be conducted with LGU officials present, which makes political measures less trustworthy.
- According to SWS and Pulse Asia, respondents tend to open up to females more. This means that their field interviewers are all female.
- The longer a survey goes, the more a respondent wants to get it over with. (You don’t really need a lit review to figure this out). This may result in “default” answers.
Undercoverage is also a real problem. For obvious reasons, the list of barangays will not include heavily conflict-affected barangays where the interviewer’s life would be threatened. Filipinos who are abroad also have 0 probability of getting selected (though balikbayans who happen to be in the country could still get sampled). Time of day may also affect who is available. I do not know exactly what field protocols are, but if, for example, interviewers are only in the field during working hours, then the sample will largely consist of people who are unemployed or who work from home. And what about exclusive gated communities, where you can’t even wander around without getting past a security guard? The government might be able to pull it off with official census workers, but private firms will have less luck.
Nonresponse is another problem that we have almost no information on. According to Pulse Asia,
Respondents sampled who were not available during first attempt were visited again with a maximum of two valid call backs. If the respondent remained unavailable after two valid call backs, a substitute who possessed the same qualities (in terms of gender, age bracket, working status and socio-economic class) as the original respondent was interviewed. The substitute respondent was taken from another household beyond the covered intervals in the sample barangay by continuing the interval sampling.
The primary concern here is that people who are willing to respond to surveys may have different opinions compared to people who are not willing to respond to surveys. To my knowledge, this has never been studied to my knowledge in the Philippines. I’m not even sure how you would do so. The Pew Research Center study of telephone nonresponse in the United States was able to do things like compare telephone surveys, which have very low response rates, to much more expensive face-to-face surveys with high response rates, or to publicly available voter records, in order to check whether the responding sample looked systematically different. We can’t do anything comparable in the Philippines. On the other hand, response rates in Philippine face-to-face surveys reportedly hover around more than 50%, so it isn’t as big of a problem to my mind compared to undercoverage and contextual effects.
Here’s the summary: Numbers from a survey have much greater uncertainty than the polling firms claim. Surveys are useful indicators of public opinion, but we should not assign too much gravitas to, say, a 7% decline in ‘net satisfaction’ between two surveys. (By contrast, the 18% decline is worth taking into consideration.)