Quick Summary

Vote splitting leads to the less preferred side of the political spectrum winning 12%-18% of all MLA elections in BC. Right-of-center parties have benefited disproportionately: in 345 general election and by-election contests since 2005, the right has been elected with a smaller share of the vote 49 times (14% of all contests). In contrast, the left has won with a minority share just twice.

Introduction

This brief analysis explores the impacts of vote splitting on elections in British Columbia. Central to this conception vote splitting is the left/right political spectrum. I examine the binary case in which most parties can be characterized as being “right of center” (RoC) or “left of center” (LoC). The assumption is that citizens—due to philosophical convictions, personal circumstances, or some other reason—have reasonably strong and stable preferences for one side of the political spectrum.1

Vote splitting occurs when (a) more than two candidates/parties are vying for a seat and (b) the candidates/parties are distributed unevenly along the political spectrum. Consider the following example:

In a standard first-past-the-post (FPTP) election, the LoC party wins with 40% even though 60% of voters express a preference for an RoC platform. A real example of this (with polarities reversed) is the Vancouver-False Creek riding in the 2017 General Election in BC. Sam Sullivan, representing the RoC Liberal party, was elected to the Legislative Assembly with 42.6% of the vote even though two LoC parties (New Democratic Party and the Green Party) won more that 55% of the vote.

By the logic of FPTP and the plurality rule, Sam Sullivan is the legitimate representative of the riding because he (and/or the Liberal Party) was preferred over any other candidate/party. However, if one accepts for a moment that the left/right political distinction is meaningful, and takes at face value the assertion that we govern ourselves through majoritarian democracy, then such an outcome is troubling. More people voted against Sam Sullivan than voted for him. And more critically, the majority voted not only for different individuals and parties, but for individuals and parties on the other side of the political spectrum. I assume that the Subaru crowd in False Creek is a bit miffed.

The question that interests me is whether non-majoritarian vote splitting is rare or common. Is Sam Sullivan’s victory a weird fluke or just one of many similar outcomes?2 I explore this question by examining historical election data and imposing the left/right dichotomy on the results. What I find, as noted in the summary above, is that non-majoritarian vote splitting is a fairly common outcome in British Columbia and is likely a problem worth solving. The natural follow-on question is how do we solve it. The “how?” is not part of this analysis. It is sufficient at this point to acknowledge the lessons of Social Choice Theory: no procedure for aggregating the preferences of individuals is clearly dominant. Each has some downside, trade-off, or lurking pathology.3

The purpose of this analysis is simply to restate something that is already pretty well-known (our current method of running elections can lead to sub-optimal outcomes) and use the power of open data and open source software to make the analysis transparent and mutable.4

Setup

I have used R, the open source data language, and RMarkdown (this document). This allows me to provide a blow-by-blow account of the analysis. Readers with little interest in data or programming languages can skip the code blocks and focus on the results. Of course anyone can download the free version of R Studio and copy and paste the code blocks to replicate and improve the analysis. Tips and suggestions from hardcore R nerds are especially welcome.

My basic methodology was to (a) download the data, (b) do some basic exploratory analysis, and (c) try to create The Chart that brings it all together and enlightens all. As is often the case in analytics, The Chart has proved to be elusive. So I have had to add many words to explain where the graphics came from and what they might mean.

Libraries

First, we need some libraries (functionality that is not part of base R). I decided to use this analysis as an opportunity to increase my familiarity with the Tidyverse and GGPlot2 libraries. Other libraries were added as required. For example, the gmodels library provides a handy contingency table.

library(tidyverse)
library(readr)
library(scales)
library(gmodels)

Data

The data for this analysis comes from the Elections BC website. I show the CSV file being read directly in the code below, but I actually downloaded the file and worked with a local copy. Two reasons for this: (a) the file is large and R just seems to hang while loading it and (b) I wanted to play with the data in Excel first. Having said that, it was the limitations and drudgery of doing this in Excel which led me to fire up R Studio instead.

#direct link to source data (warning: Government of BC URL WILL change)
res <- read_csv("https://catalogue.data.gov.bc.ca/dataset/44914a35-de9a-4830-ac48-870001ef8935/resource/fb40239e-b718-4a79-b18f-7a62139d9792/download/provincial_voting_results.csv")

Parties and Election Events

Here I take a superficial look at the Elections BC data. Specifically:

  • Which election events are covered in the data set?
  • How many parties participate in BC elections?
  • How important/popular is each party?

I use tidyverse pipes (the ugly and confusing but useful %>%) to chain together some commands without having to create a bunch of new data structures.5 I could have graphed the result, but the raw numbers tell the basic story pretty well.

#check largest affiliations and events
group_by(res, EVENT_NAME) %>% summarise(votes=sum(VOTES_CONSIDERED)) %>% arrange(desc(votes)) 
group_by(res, AFFILIATION) %>% summarise(votes=sum(VOTES_CONSIDERED)) %>% arrange(desc(votes))

So here is my reading: The data set covers general elections and by-elections from 2005-2017. And in these elections, the major party affiliations were the Liberals and NDP. The Green party is in the second tier, and then popularity/importance of other parties drops off pretty quickly.

Some data cleansing

The data includes reject counts, which I can’t use. I also toyed with using general election events only and filtering out by-elections. But I commented this filter out in my final analysis. By-elections are elections too.

#limit valid votes and major elections
res <- res %>% filter(res$VOTE_CATEGORY != "Rejected")
events = c("General Election 2017", "General Election 2013", "General Election 2009", "General Election 2005")
#res <- res %>% filter(EVENT_NAME %in% events)

Knowing Left from Right

Here I define two vectors called roc and loc. These are lists of parties on the right-of-center and the left-of-center respectively. You might quibble about how the parties are assigned (if so, you can change them in your own code). But the most important parties (Libs, NDP, Greens, and Conservatives) are fairly easy to situate.

#map parties to spectrum
roc <- c("BC Liberal Party",
        "Conservative",
        "BC Social Credit Party",
        "BC Reform",
        "Libertarian",
        "Christian Heritage Party of B.C.",
        "British Columbia Party")
loc <- c("BC Green Party",
        "BC NDP",
        "BC Marijuana Party",
        "Communist Party of BC")

I then use the vectors and the %in% membership checker to create a new column called (spectral) position: Left, Right, or Other.

#recode affiliation
res <- res %>% mutate(position = case_when(
  AFFILIATION %in% roc ~ "Right",
  AFFILIATION %in% loc ~ "Left",
  TRUE                 ~ "Other")
)

Summing by Spectral Position

This is perhaps not the most efficient way to do it (some kind of ninja dplyr::gather might be more elegant), but it works. I add additional columns to show:

  1. The vote count for all the parties on the left, right, and other.
  2. The actual winning spectral position (based on election results).
  3. The preferred spectral position (based on the sum of vote counts by position rather than parties).
  4. The match column (an indicator variable to show whether the actual winning spectral position is preferred).
  5. The delta column to measure the difference in percent between the preferred and winning spectral position.
#vote counts by position on spectrum                        
res <- res %>%
  mutate(roc.count = if_else(AFFILIATION %in% roc, VOTES_CONSIDERED, as.integer(0)))
res <- res %>%
  mutate(loc.count = if_else(AFFILIATION %in% loc, VOTES_CONSIDERED, as.integer(0)))
res <- res %>%
  mutate(other.count = if_else(!(AFFILIATION %in% roc) & !(AFFILIATION %in% loc), VOTES_CONSIDERED, as.integer(0)))

#set winner
res <- res %>% mutate(winner = if_else(ELECTED=='Y', position, ""))

#summarize results by event + riding
tots <- res %>% group_by(EVENT_NAME, ED_NAME) %>%
    summarize(right = sum(roc.count),
                  left = sum(loc.count),
                  other = sum(other.count),
                  total = sum(VOTES_CONSIDERED),
                  win = max(winner))

#calculate percent totals and other variables
tots <- tots %>% mutate(roc.pct = right/total,
                        loc.pct = left/total,
                        other.pct = other/total,
                        pref = case_when(right > left & right > other ~ "Right",
                                         left > right & left > other ~ "Left",
                                         other > right & other > left ~ "Other",
                                         TRUE ~ "Tie"),
                        match = if_else(pref == win, 1, 0),
                        delta = case_when(
                          pref == "Right" & win == "Left" ~ roc.pct - loc.pct,
                          pref == "Left" & win == "Right" ~ loc.pct - roc.pct,
                          TRUE ~ as.double(0))
                        )

Results

Here is the code I used to decide whether to continue the analysis. It provides the proportion of ridings in each general election in which the preferred spectral position did not win.

#quick look at general elections only
tots %>% filter(EVENT_NAME %in% events) %>% summarize(sum(match), n(), (n()-sum(match))/n())

This is where the 12%-18% number in the summary comes from. For example, in the 2017 General Election, 18.4% of ridings were won by a candidate on the non-preferred side of the spectrum. The task now is to investigate further and generate The Graph.

Crosstab of Preferred vs. Actual

Crosstabs are good for summarizing two-dimensional data (the same result can be achieved in Excel using a pivot table). The crosstab below shows the preferred spectral position in the rows and the winning spectral position in the columns. So, in the top-left cell, the preferred spectral position (tot$pref using R notation) is “Left” whereas the actual winning spectral position (tot$win) is also “Left”. The frequency count at this intersection is 151 contests. That means, of a grand total of 345 election events in the data set, the LoC was preferred and won in 151 instances. Staying in the same row but moving to the win=“Right” column, however, we see 49 instances in which a RoC party won even though LoC parties had more votes. The row percentage indicates that this happens about a quarter of the time. That is, the probability of the RoC winning given the LoC has the larger number of combined votes is 24.5%. Ouch.

#summary for all years
CrossTable(tots$pref, tots$win, expected=FALSE, prop.t=FALSE, prop.c=FALSE, prop.chisq=FALSE)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  345 
## 
##  
##              | tots$win 
##    tots$pref |      Left |     Other |     Right | Row Total | 
## -------------|-----------|-----------|-----------|-----------|
##         Left |       151 |         0 |        49 |       200 | 
##              |     0.755 |     0.000 |     0.245 |     0.580 | 
## -------------|-----------|-----------|-----------|-----------|
##        Other |         0 |         2 |         0 |         2 | 
##              |     0.000 |     1.000 |     0.000 |     0.006 | 
## -------------|-----------|-----------|-----------|-----------|
##        Right |         2 |         0 |       141 |       143 | 
##              |     0.014 |     0.000 |     0.986 |     0.414 | 
## -------------|-----------|-----------|-----------|-----------|
## Column Total |       153 |         2 |       190 |       345 | 
## -------------|-----------|-----------|-----------|-----------|
## 
## 

On the other hand, when the right is preferred, the left only wins 1.4% of the time. We can drill down on these two RoC wins by applying a filter and tidying up the results:

tots %>% filter(match == 0 & pref == "Right") %>%
  select(c("EVENT_NAME", "ED_NAME", "roc.pct", "loc.pct", "pref", "win")) %>%
  mutate(roc.pct = percent(roc.pct),
         loc.pct = percent(loc.pct))

The issue in both the 2012 Chillwack-Hope by-election and the 2013 Skeena general election appears to be a RoC split between the Liberals and Conservatives.

A subsidiary takeaway from the crosstab (apart from the existence of a a non-trivial number of off-diagonal instances) is in the marginal totals: the RoC has won more contests than the LoC whereas the LoC has been preferred in a majority of cases.

The Chart

The Holy Grail in analytics is the visualization that brings it all together. For my attempt, I used the GGPlot2 library and did much fiddling and Google searching. I looped through elections to create a separate graph for each major event. Looping is perhaps not the most elegant way to do this in R, but I am old school.

#simplify and filter the by-riding totals for plotting
grph <- tots %>% filter(match == 0) %>%
    select(c(1:2,7:10,13)) %>% 
    gather(key="spec", value="pct_vote", 4:6) %>%
    mutate(Riding = paste(ED_NAME, " (winner=", win, ")", sep=""))

#loop through major election events
for (evt in events) {
  print(
    grph %>% filter(EVENT_NAME == evt) %>%
    ggplot( mapping = aes(x=reorder(Riding, delta, sum),
                          fill=reorder(spec, pct_vote, sum),
                          y=pct_vote)) +
    coord_flip() +
    geom_col(position="fill") +
    geom_text(mapping=aes(label=scales::percent(pct_vote)),
              position=position_stack(vjust=0.5), color="black") +
    scale_fill_manual(values = alpha(c("grey", "blue", "red"), .3),
                      labels=c("Other", "Right", "Left"),
                      name="Spectral Position") +
    scale_y_continuous(labels = scales::percent, name = "Percent of Vote") +
    scale_x_discrete(name = NULL) +
    ggtitle(evt) +
    theme(panel.background = element_blank(),
          legend.position="top",
          axis.ticks = element_blank())
  )
}

Only once I had this result did I realize that I had merely replicated the chart created by Tara Carman of the CBC. The primary difference is that I use the right/left vote proportions rather than party vote counts. I also use the delta measure to sort by the egregiousness of difference between the preferred and winning spectral position. Below is the raw egregiousness ranking across all election events:

tots %>% filter(match == 0) %>%
  select(EVENT_NAME, ED_NAME, win, roc.pct, loc.pct, delta) %>%
  arrange(desc(delta)) %>%
  mutate(roc.pct = percent(roc.pct),
         loc.pct = percent(loc.pct),
         delta = percent(delta))

Conclusions

It is relatively easy to find commentary insisting that vote splitting is not a problem in BC. These denials seem to rest on the assertion that left/right preferences are not meaningful or stable. Thus, although both the NDP and Green Party are seen as being LoC (and are grouped as such in this analysis), the “second choice” of an NDP member might be an RoC party. Unfortunately, there is no way to confirm this assertion without actually asking voters about their second and third choices.

As much as I dismiss the left/right spectrum in my footnote below, the government’s position on the spectrum clearly has significant policy implications for citizens. This is especially true for core economic issues, like taxation, regulation, and provision of public goods.6 So I have a hard time believing that the graphs above do not indicate a pretty serious problem for British Columbians.7


  1. Okay, here is my personal political disclaimer: I don’t care that much. The whole notion that two political parties can have, circa 2018, mutually incompatible theories of how economies work strikes me as shocking and a bit pathetic. I assume some vestigial tribalism is at the root of our attachment to political parties. Having said that, in most cases, my votes in provincial elections have fallen on the RoC. It is not that I do not support—at least in theory—much of the progressive agenda. But I live in rural BC and it seems that the right-of-center candidates in this part of the world tend to be a bit more qualified, a bit better at execution, and a bit less likely to support something (I regard as) dumb. So, in that sense, the conclusions of this analysis are contrary to my (revealed) preferences.

  2. I have nothing against Sam Sullivan and use this example simply because his case sorts near the top of my list. I do not know Sam (although I did meet him once in an elevator). I certainly respect and admire what he has accomplished as an individual.

  3. The inability of British Columbia to make headway in electoral reform is not surprising in light of Social Choice Theory. Complexity and lack of a single best solution for all situations work against simple, universally-supported solutions. In this context, a referendum on the nitty-gritty of electoral processes makes about as much sense as a referendum on the minutiae of VAT mechanisms. We have a representative democracy for a reason.

  4. I have used this as an opportunity to brush up on my R. I doubt I would have done it if Apex Mountain had more snow over the 2018-19 Christmas holidays.

  5. I would have preferred to use SQL on the data frames (tibbles, in tidyverse-speak). But alas, it does not appear that this functionality yet exits. I am sure it will someday.

  6. Preferences may be less stable for non-core issues like the environment. The classic example in BC is the dissonance between the green LoC and the trade union LoC. Card-carrying 30-something construction workers may support the NDP, but this support may not transfer to any anti-big-project platform that threatens their truck payments.

  7. No, I do not think Proportional Representation is a viable solution. Both STV and the party-centric hairballs offered up in the 2018 referendum are, in my view, too complex to be transparent. And transparency has to be one of the the desiderata in social choice.