Explaining our model
Attempting to predict an election is an inherently uncertain undertaking. Most election projections, including our own, make use only of opinion polls and previous election results. This means that most election projections attempt to answer the question of what would happen if the election were held today. Some projections will attempt to model the evolution of public opinion over time, trying to make a proper prediction for an election further into the future, but in most cases, the factors considered in such a model are quite close to subjective speculation.
Some projections, most notably in the US, will factor in approval ratings for current government leaders, bias from individual polling firms based on past performance, and correlation with electoral results in similar elections elsewhere, whether predicted or observed, for example correlating the nearby and relatively similar US states of Florida and Georgia. Such a model is simpler to do in a stable two-party system like the US, where, while federal elections are run independently by the states, the national political climate crosses state lines.
In Europe the electoral systems are numerous, the very existence of many major parties between elections is in constant flux, and the political climate of each member state is, at least in what can be reliably modelled, in many ways independent of its neighbours.
How do wEE do it?
Europe Elects makes use of a probabilistic projection of opinion polls in individual countries to a total European election result. To achieve this, an initial projection is made independently for each constituency, be that a nation or a region (such as the divisions in Belgium or Ireland). The independently calculated constituency results are then combined probabilistically to achieve a total result for the European Parliament that is not the simple sum of the calculated national results, but a function of them.
In each constituency, Europe Elects considers polls published with a sample size in the previous 30 days, using only the latest poll published by each firm. Should there be fewer than five such polls, the model will reach back up to 90 days to collect five reliable polls. Lacking a published sample size, the lowest sample size in our dataset published by the pollster is considered; lacking even that the poll is considered too unreliable and not included in the model. Polls asking directly about the European Election are preferred, but since those are rare and national polls are plentiful, the latter are often used. Should no polls be available, as with the case of the German-speaking community in Belgium, the 2014 election results are used.
Bear with me through some math
Opinion polls usually only give information on a national level. In the case of a single national constituency, this works just fine on its own, but in some cases, European elections are decided in subnational constituencies. To account for this, other election projections often take the national gain or loss from the last election for each party to the result of the previous election: a party that won 22% of the national result and 15% in a certain constituency in 2014 and is now polling at 24% nationally will be modelled as receiving 15 + 2 = 17% in the same constituency. This works fine for large parties, but for smaller parties, of which there are oh so many in Europe, this method would wield very large swings and potentially calculate negative results: a party that received 6% nationally and 2% in one constituency in 2014 now polling at 3% would be calculated as receiving 2- 3 = -1% of the vote in that constituency.
Europe Elects opts for a different method: multiplicative uniform regional swing. Instead of adding the difference, the proportional swing is calculated and multiplied by the last constituency result.
In the first example above the 15% constituency result from 2014 would be multiplied by 24/22 = 1.1, modelling a 2019 result in the hypothetical constituency of 16.5%. This method is better for calculating the results for smaller parties but may be a little overzealous for larger parties, making a rising already large party swing considerably in districts where it has already performed well, when gains may instead be coming from districts where it performed poorly in the past. For new parties, their poll numbers are assumed to be uniform in all constituencies unless a better distribution can be assumed, for example a split-off of another party will start out with the same profile as its parent. In the cases of Ireland, Northern Ireland, and Malta, which use the Single Transferable Vote system, their constituencies are modelled as D’Hondt, making them the only exceptions to our use of the correct local electoral law in the next step.
The constituency results are calculated probabilistically for each poll included in the projection using the correct local electoral law, a factor that is unique to the Europe Elects model, using weighted Monte Carlo simulations, essentially simulating the election many times over and over again on a poor suffering laptop that runs for 2 days to achieve the probability distribution of the results.
The results for each constituency are combined by averaging the probability mass functions (those graphs that look like a bell) calculated for each poll. For example, what has been calculated so far is essentially the probability that the Italian PD-S&D in Italy gets x number of seats, and independently of the probability that British Lab-S&D gets y number of seats, and so on.
These result are then combined into a single European result by convolving the probability mass functions of the parties in each European Parliament group, for example calculating for S&D the probability that the PD will get x number of seats plus Labour will get y seats plus Spanish PSOE will get z seats and so on for every party in S&D; the same is done for each group independently.
In short, as elections in EU member states are independent of each other by any practical measurement, in order to accurately represent the local political climate and factor in the correct local electoral law, the election is simulated independently for each country then the results are combined probabilistically for each EP group independently, a total EU-wide simulation not being practical or useful and often factually incorrect as it will only very approximately account for local electoral systems.
This method leaves us with separate bell curves each one of which we take the median result from as the predicted number of seats for the EP group. The sum of the medians will not add up to 751, one more step is required for that, but it will be close to it.
The reason for the medians not adding up to the total seats can be understood imagining 3 parties competing for a single seat in one constituency, each one having an equal probability of winning. The probability mass function for each party will show it with a 1/3 probability of winning one seat and a 2/3 probability of netting 0 seats. The median result will, therefore, be 0 for all three, with a confidence interval of 0-1. The sum of the medians 0+0+0 = 0, not 1, yet the result is probabilistically correct.
The median result for each EP group will have to be normalized to a total of 751, making up for all the cases like the one described above by distributing the missing seats proportionally.
Once a week, a poor suffering laptop goes through all the above steps: projecting national polls onto constituencies, simulating the national election many times to achieve probabilistic results per party, averaging said results, calculating an EU wide probabilistic result and normalizing to 751 seats. That laptop won’t live long.
Alex runs the Europe Elects website, hopefully from a different laptop.