AIMdata
  • Part One
  • Part Two

On this page

  • Introduction: racialisation of elections
  • Ethnic makeup of parties and coalitions
  • The collapse of UMNO and Islamic politics
  • Fragile coalitions
  • Predictions for the GE-16
    • Perikatan Nasional
  • Conclusions
  • Appendices
    • Data
    • Reference table
    • Models
      • Pakatan Harapan models
      • Perikatan Nasional models
      • Comparing Barisan Nasional against Perikatan Nasional
  • References

Malaysian Elections

Ethnicity and Election Predictions

Author

Sean Ng

Modified

May 5, 2026

Acronyms

BN Barisan Nasional
DAP Democratic Action Party
GPS Gabnungan Parti Sarawak
GRS Gabungan Rakyat Sabah
GS Gagasan Sejahtera
MCA Malaysian Chinese Association
MIC Malaysian Indian Congress
PAS Parti Islam Se-Malaysia
PH Pakatan Harapan
PKR Parti Keadilan Rakyat
PN Perikatan Nasional
SUPP Sarawak United People's Party
UMNO United Malays National Organisation
USA United Sabah Alliance


Read the addendum on malapportionment here.

Introduction: racialisation of elections

According to an Architects of Diversity survey, 64% of Malaysians reporting experiencing some for of discrimination in the past 12 months, with 32% saying it was related to ethnicity.

Pusat KOMAS states that election years tend to see “sharp increases in racial discrimination narratives […] Political parties have consistently used race and religion as mobilization tools”.

They add, “politics accounted for just over 43% of all recorded racial discrimination incidents between 2015 and 2024” and that “racial and religious politics remain the primary diver of racial discrimination in Malaysia”. The next general election is likely to witness another surge in racial politics, particularly with PAS and Bersatu continuing to consolidate their Malay-Muslim voter bases.

43% (97 out of 222 constituencies) have ever had multiethnic electoral races (where the candidates were from different races). These constituencies align largely with the areas with the highest concentrations of non-Bumiputeras. The latest redelineation was done in 2006 and the plots in this section will focus on the years since then.



Given the racialised nature of Malaysian politics, Bumiputera candidates are far more likely to win in areas that have high proportions of Bumiputera candidates. Non-bumiputera candidates almost never win seats in constituencies with more than 75% Bumiputera residents (126 out of the 222 federal constituencies have at least 25% of their populations be racial minorities) and Bumiputera candidates tend only to win when a constituency has a population that is more than 50% Bumi.



Barisan Nasional and, to a lesser extent, Perikatan Nasional have been the primary beneficiaries of this racialised dimension to elections.

Almost all of BN’s electoral wins come from constituencies which are more than 75% Bumiputera. This is the same for Perikatan Nasional. Similarly, constituencies which are more than 75% Bumiputera almost never vote for parties under the Pakatan Harapan coalition (or its predecessor Pakatan Rakyat).



A significant minority of non-Bumiputera MPs were from BN. Additionally, most of those wins came from constituencies with higher Bumiputera populations, indicating that party loyalty (which itself has racial connotations) was, at one point, a stronger factor than the ethnicity of individual candidates.




Ethnic makeup of parties and coalitions

Amongst the various coalitions, East Malysian Coalitions and Gagasan Sejahtera were the most racially homogeneous. Interestingly, Pakatan Harapan and Pakatan Rakyat both fielded proportionally more Malay candidates than Barisan Nasional.

However, as noted by Azzubair, Kartini and Nazri, whilst BN might be multiethnic in nature, the coalition was not consociational, but was “sustained more by UMNO’s control and patronage than by genuine power-sharing”.



The plot below shows the breakdown of candidates by ethnicity and party. Amanah, Bersatu, PAS and UMNO are Malay parties; DAP, Gerakan and the MCA are Chinese parties. The MIC is an Indian party, PBB is a Sabahan party and WARISAN is a Sarawakian party. BEBAS and PKR are the only true multiracial parties.

However, despite all coalitions having representation from all major ethnicities, the idea of a multiracial Malaysia nation (Bangsa Malaysia) has failed at a cultural-cognitive level, with Law and Zaini arguing that this ideology is neither internalised nor taken for granted. One academic expert in their study states, “[Malaysians] fall back on primordial identity… no one identifies himself or herself as Bangsa Malaysia”. They ultimately conclude that “moments of public disillusionment with identity politics are insufficient to overturn deep-seated institutional [inequality]”.





The collapse of UMNO and Islamic politics

According to Deivasagayam, the two main cleavages in Malaysian society – race and religion – have converged, contributing to UMNO’s downfall. He explains, “UMNO consistently positioned itself as the exclusive advocate and guardian of Malay rights and privileges”. However, by engaging in an “Islamisation Race” and promoting state-engineered political Islam to stave off PAS, UMNO inadvertently legitimised PAS in the eyes of Muslim voters; and when it advocated for Malay Islamic Supremacy, it set that stage for the convergence for race and religion which would later ultimately benefit PAS and BERSATU, which demonstrated the political viability of a Malay-Muslim governance structure which excluded minorities.

We see, from the plot below, the extent of UMNO’s electoral collapse. From its peak of 109 seats under Badawi, it now holds less seats than either PAS, DAP, PKR or Bersatu.



However, when we examine their share of votes, we see that their parliamentary dominance was always more fragile than was perceived. At no time after 1960 did UMNO ever win more than 40% of the votes. Additionally, the Badawi peak of 109 seats (49% of seats) was accomplished with only 35% of the national vote, highlighting the unfairness of a first-past-the-post system. Though now, UMNO (and PKR) are penalised by the first-past-the-post system,.

Finally, though their share of votes is lower than it has ever been, UMNO still, marginally, received more votes than any other party in the most recent general election.



Echoing Deivasagayam’s arguments, Syaza concludes that “the sense of Islamic identity and belonging often supersedes the allure of economic development for Malay youths when choosing how to cast their vote”.

This speaks to not only to the Islamisation race that UMNO, PAS and Bersatu have engaged in – Syaza reports that 65% of respondents in a Merdeka Center Muslim Youth Survey saying that Muslims should only vote for Muslim leaders – but also to an unfair economic system where whatever economic boons offered are too meagre and piecemeal, and insufficient to overcome racial and religious identity politics despite economic concerns being the top issue amongst voters.

Pakatan Harapan, furthermore, seems unwilling to engage in the Islamisation Race or put forward a competing vision of a more moderate Islam, possibly – I hypothesise – out of fears that doing so would simultaneously alienate its core non-Muslim base and anger conservative Muslims.




Fragile coalitions

Whilst public support for BN has faded considerably, between BN and PN, there was majority support for a Malay-supremacist (Ketuanan Melayu) government, picking up more than 50% of votes in GE-15, an increase in the vote share garnered by Barisan and GS (PAS’s previous coalition) in GE-14.

The current unity (Madani, PH-BN-GPS-GRS) government is heavily reliant on BN for both a parliamentary majority as well as a majority of the vote share.



The two largest coalitions in parliament are now PH and PN. But since neither were able to win a commanding mandate in GE-15 (or able to compromise with each other), BN was left to play kingmaker.



The plots below show vote share amongst the major coalitions in state elections. We see that that PN has managed to garner a lead over PH in state polls.

Recalling our criticism of Anwar in the addendum on malapportionment, it is possible that his bid with GRS to increase the number of seats in Sabah is meant to engineer a more lasting parliamentary majority without needing to win the popular vote. We would emphasise that any further malapportionment would be, ultimately, an anti-democratic exercise.



Syaza argues that there is a strong dichotomy between Malay and non-Malay voters’ perceptions of the government. 85% of Malay voters felt that the Perikatan Nasional government protects the interest of the Malays, while 77% of non-Malay voters believed the government does not treat all races equally.

Given the intense racialisation of Malaysian politics, we will next use demographic data to predict the performances of Pakatan Harapan and Perikatan Nasional in the upcoming GE-16.




Predictions for the GE-16

Below, the performance (the share of votes they received) of PH in GE-15 is plotted against their predicted performance in GE-16 using the combination of a stepwise and a glmnet model, with each point being a federal constituency. It is predicted that PH’s vote share will shrink in GE-16.

We predict that Pakatan is expected to win only 72 out of the 222 total seats in parliament, down from 82 in GE-15.



These predictions also underscore the instability of the Madani government’s coalition. The results have been re-plotted below to show the GE-15 winner of each constituency. Marginal seats (roughly in the middle of the graph below) where PH may seek to expand its reach will largely only come at the expense of BN and vice versa.

There are a fair number of marginal seats (largely three-cornered fights) where PH received less than 40% of the votes in GE-15, but were still won by them. These are all vulnerable for pick-up by BN or PN.



Based on our predictions of PH’s performance, below is the projected breakdown of seats within the coalition. The DAP will remain the largest party in the coalition, whilst PKR will be faced with the highest amount of difficult races.





Perikatan Nasional

Let’s use the same combination of stepwise linear modelling and glmnet to predict the performance of Barisan Nasional against Perikatan Nasional.

According to the model, BN is expected to lose even more ground to PN in the upcoming 16th general election, making further inroads into Johor, Pahang and Perak. BN will be largely dependent on its competitiveness in East Malaysia to ensure that it has any representation in parliament.

In the plot below, negative values on the y-axis indicate Perikatan winning more votes than Barisan. We note that the model predicts PN winning more votes than BN in almost all the constituencies that BN won in GE-15.

Clustered at x = 0 are the constituencies where BN and PN were closest to each other in number of votes: these are the constituencies where they are most competitive (with each other) and not competitive at all (likely losing by a large margin to PH or another coalition).



Perikatan Nasional is predicted to be be largest coalition in parliament in 2016. Using the same combination of stepwise and glmnet models as before, we predict that PN will win 95 out of the 222 seats in parliament, up from 74 in GE-15, making further inroads into Selangor, Johor and Melaka. Perikatan remains uncompetitive in East Malaysia.



The model does not predict, however, an outright win. And Perikatan Nasional will still need to partner with another coalition to form the government.

PAS is anticipated to be the largest party within the coalition, and it will also contest fewer marginal seats than Bersatu.





Conclusions

Perikatan Nasional are anticipated to win 95 seats, whilst Pakatan Harapan are anticipated to only win 72 seats. As a small consolation to PH, there is projected to only be a very small difference between PH’s and PN’s shares of the national vote. But to have a chance at being the largest coalition, Pakatan must win all of Selangor and extend its reach in Johor and Pahang – an unlikely prospect.



It might not be possible to re-form the Madani government. This is dependent on the extent of BN’s losses and the scale of PN’s victory. We were not able to develop a model that accurately predicted BN’s district-level vote share using demographic data, probably due to BN’s performance being largely determined by non-demographic factors (i.e. running a particularly strong candidate in Titiwangsa or the strengthening of unfavourable perceptions of the coalition). It is possible that they will not even be in a position to play kingmaker.



The models (the details of which are in the appendices below) we developed also highlight the electoral challenges facing Barisan: whilst it may be more competitive than Perikatan in areas that are more ethnically diverse, it will still likely lose those seats to Pakatan Harapan. It may be more competitive in East Malaysia, but support there has swung towards native and indigenous coalitions. And whilst it can still win largely homogeneous Malay communities, these constituencies tend to be extremely small in terms of population.



If we take a look at the most marginal seats from GE-15 in the plot above, BN’s largest competitors are PH and PN. If support for BN to further erode and Malay voters to consolidate behind PN, as we predict, PH would no longer be sure to win in three-cornered fights (where it has picked up most of its post-2018 wins). The primary beneficiary of three-cornered fights after the collapse of UMNO has been Pakatan Harapan.



However, the margin by which PH has won has been trending downwards, especially in contrast to its predecessor, Pakatan Rakyat. Its footprint is also narrowing, especially with one its key component parties, PKR, focusing on safe seats. It seems that they too are predicting losses for PH and are seeking to minimise the extent of their defeat.



James Chin at the Commonwealth Roundtable argues that there is no longer any middle ground left in Malaysian politics, with the DAP (Pakatan) on one end and PAS (Perikatan) on the other. This erosion of the middle ground will have a profound negative impact on Pakatan Harapan’s election chances, especially as it refuses to engage in the Islamisation race or put forward an alternative vision that can win over Malay youth.

The convergence of the two social cleavages of race and religion, as well as UMNO’s downfall, have made it most probable that Perikatan Nasional will be the largest coalition in parliament.




Appendices

Read the addendum on malapportionment here.

Data

The data for this report comes primarily from Thevesh, who compiled the Malaysian Election Corpus, containing federal and state-level election results since 1955.

Additional data, including the digitised census datasets, come from JJean95 and Jane Loh.




Reference table

Note that the models sometimes predict negative vote shares. We have left these in as the models are overall, fairly accurate and explainable and these instances of negative vote shares are quite few in number.





Models

Pakatan Harapan models

Below is a summary of the stepwise model used to predict Pakatan Harapan’s performance in GE-16. The population density and percentage of Indians in a constituency (pc_indian) both have positive correlation with PH electoral performance. The percentage of Bumiputeras (either peninsula or East Malaysian) have a negative correlation with electoral performance.



Call:
lm(formula = pc_ph_votes ~ population_density + pc_bumi_peninsula + 
    pc_bumi_east_malaysia + pc_indian, data = training, direction = "backward")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.55472 -0.04182  0.00348  0.05732  0.30681 

Coefficients:
                       Estimate Std. Error t value             Pr(>|t|)    
(Intercept)           -0.177547   0.201678  -0.880             0.379645    
population_density     0.030490   0.004816   6.331        0.00000000138 ***
pc_bumi_peninsula     -0.681463   0.054287 -12.553 < 0.0000000000000002 ***
pc_bumi_east_malaysia -0.702340   0.060945 -11.524 < 0.0000000000000002 ***
pc_indian              0.634739   0.183275   3.463             0.000643 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1053 on 217 degrees of freedom
Multiple R-squared:  0.8145,    Adjusted R-squared:  0.8111 
F-statistic: 238.2 on 4 and 217 DF,  p-value: < 0.00000000000000022


These are the coefficients for the final glmnet model used to predict Pakatan Harapan’s performance in GE-16. Somewhat similarly to the stepwise model, the percentage of Indians and Chinese in a constituency have both strong positive correlations with PH electoral performance, as does the population density.

The predicted values of these two models were averaged to obtain the final predictions.


13 x 1 sparse Matrix of class "dgCMatrix"
                                          s1
(Intercept)                      0.262403348
population_density               0.052445954
pc_bumi_peninsula                .          
pc_bumi_east_malaysia           -0.002501242
pc_chinese                       0.110418109
pc_indian                        0.070609987
gini                             .          
income_avg                       0.009194990
poverty_incidence                .          
labour_force_participation_rate  .          
sex_ratio                        .          
total_dependency_ratio           .          
average_household_size          -0.004289741




Perikatan Nasional models

The stepwise model used to predict Perikatan’s election performance is more complex than what was needed to predict Pakatan Harapan’s performance.

The percentage of Chinese in a constituency has the strongest negative correlation with PN performance, followed by the percentage of East Malaysia Bumis. The negative correlation with the percentage of peninsula Bumis might indicate that homogeneous Malay constituencies still have a tendency to vote for UMNO (though its t-value is comparatively lower).

There are also negative correlations with average income (indicating more disadvantaged areas are more likely to vote for PN) as well with the sex ratio. Additionally, whilst the effect is not as strong as it is with Pakatan Harapan, population density does have a positive correlation with PN’s vote share – it does have quite widespread popularity in urban areas in northern and eastern peninsula Malaysia.



Call:
lm(formula = pc_pn_votes ~ population_density + pc_bumi_peninsula + 
    pc_bumi_east_malaysia + pc_chinese + pc_indian + income_avg + 
    labour_force_participation_rate + sex_ratio + average_household_size, 
    data = pn_bn_train, direction = "backward")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.37103 -0.05884 -0.00045  0.05233  0.21923 

Coefficients:
                                  Estimate Std. Error t value  Pr(>|t|)    
(Intercept)                     -9.5311680  2.3121258  -4.122 0.0000538 ***
population_density               0.0225824  0.0066899   3.376  0.000876 ***
pc_bumi_peninsula               -2.2023859  0.7390763  -2.980  0.003220 ** 
pc_bumi_east_malaysia           -2.7599301  0.7380656  -3.739  0.000237 ***
pc_chinese                      -2.8779996  0.7365207  -3.908  0.000125 ***
pc_indian                       -2.6080959  0.7435358  -3.508  0.000552 ***
income_avg                      -0.1439593  0.0388154  -3.709  0.000266 ***
labour_force_participation_rate -0.0024594  0.0016470  -1.493  0.136868    
sex_ratio                       -0.0028026  0.0008931  -3.138  0.001942 ** 
average_household_size           0.0293539  0.0175218   1.675  0.095354 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09997 on 212 degrees of freedom
Multiple R-squared:  0.8403,    Adjusted R-squared:  0.8335 
F-statistic: 123.9 on 9 and 212 DF,  p-value: < 0.00000000000000022


Similar to the stepwise model, Perikatan’s glmnet model is also more complex than Pakatan’s. But the most notable coefficients are once again the negative correlations with the Chinese and East Malaysian Bumiputera populations (highlighting the racialised nature of elections as well as PN’s uncompetitiveness in East Malaysia).


13 x 1 sparse Matrix of class "dgCMatrix"
                                            s1
(Intercept)                      0.29930301289
population_density               0.04826454524
pc_bumi_peninsula               -0.00009654532
pc_bumi_east_malaysia           -0.21573230792
pc_chinese                      -0.12156289541
pc_indian                       -0.02510089756
gini                            -0.00751455022
income_avg                      -0.05429391965
poverty_incidence               -0.00406165244
labour_force_participation_rate -0.01110480385
sex_ratio                       -0.02476660938
total_dependency_ratio          -0.00954639173
average_household_size           0.02215130048


As with the Pakatan Harapan model, the predictions for both the Perikatan models were averaged to provide the final predictions.




Comparing Barisan Nasional against Perikatan Nasional

Finally, below is the stepwise model used to predict the difference between Barisan and Perikatan vote shares. We see that Barisan is more likely to win less dense areas (BN is the primary beneficiary of malapportionment, as we note in our addendum on malapportionment), as well in areas that were more ethnically diverse. Additionally, BN is more likely than PN to win in areas that have higher incomes and more men than women.

However, we should note than the r-squared is quite low, when compared to the earlier two models. BN’s performance is actually quite difficult to model from demographic data, probably due to the several non-demographic factors influencing their popularity (corruption, dissatisfaction over their collaboration with PH).



Call:
lm(formula = bn_pn_difference_pc ~ population_density + pc_bumi_peninsula + 
    pc_bumi_east_malaysia + pc_chinese + pc_indian + income_avg + 
    poverty_incidence + sex_ratio, data = pn_bn_train, direction = "backward")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.46638 -0.15331 -0.00777  0.13209  0.57626 

Coefficients:
                       Estimate Std. Error t value    Pr(>|t|)    
(Intercept)           18.738879   4.965230   3.774    0.000208 ***
population_density    -0.073957   0.013813  -5.354 0.000000222 ***
pc_bumi_peninsula      3.863510   1.536743   2.514    0.012674 *  
pc_bumi_east_malaysia  4.055272   1.535345   2.641    0.008872 ** 
pc_chinese             4.406952   1.531523   2.877    0.004416 ** 
pc_indian              3.860262   1.544246   2.500    0.013182 *  
income_avg             0.380519   0.082110   4.634 0.000006238 ***
poverty_incidence      0.481238   0.278603   1.727    0.085558 .  
sex_ratio              0.006456   0.001841   3.507    0.000553 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2092 on 213 degrees of freedom
Multiple R-squared:  0.4011,    Adjusted R-squared:  0.3786 
F-statistic: 17.83 on 8 and 213 DF,  p-value: < 0.00000000000000022


Similarly, the glmnet model indicates that BN is more likely to get more votes than PN in areas that are more ethnically diverse and those which have higher incomes, as well as in East Malaysia.


13 x 1 sparse Matrix of class "dgCMatrix"
                                          s1
(Intercept)                     -0.090301807
population_density              -0.139529828
pc_bumi_peninsula                1.090110633
pc_bumi_east_malaysia            1.241925612
pc_chinese                       0.625629235
pc_indian                        0.172821898
gini                             0.009154618
income_avg                       0.113724353
poverty_incidence                0.031944592
labour_force_participation_rate  0.020891175
sex_ratio                        0.056442928
total_dependency_ratio           0.017825940
average_household_size          -0.012826333


Ultimately, these models highlight the electoral challenges faces Barisan: whilst it may be more competitive than Perikatan in areas that are more ethnically diverse, it will still likely lose those seats to Pakatan Harapan. It may be more competitive in East Malaysia, but support there has swung towards native and indigenous coalitions. And whilst it can still win largely homogeneous Malay communities, these constituencies tend to be extremely small in terms of population.

We could very conceivably improve these predictions if we had access to additional data, such as education levels and religious attitudes. Additionally, opinion polling in Malaysia is far less developed than in the US and UK and do not occur at a sufficient frequency or sample size to be included in the models we have developed.




References

Architects of Diverity (2023). State of Discrimination Survey 2023. Retrieved from https://www.aodmalaysia.org/sods.

Azzubair, Kartini & Nazri Muslim (2026). A Systematic Literature Review of Malaysia’s Coalition Politics, 2021-2025. Frontiers in Political Science, Elections and Representation section, 8-2026. https://doi.org/10.3389/fpos.2026.1721966

Chin, J. (2023) Anwar’s long walk to power: the 2022 Malaysian general elections, The Round Table, 112:1, 1-13, DOI: 10.1080/00358533.2023.2165303

JJean95 (2022). Malaysia General Election Results (GE12-GE15). https://www.kaggle.com/datasets/jjean95/malaysia-general-election-datasets

Law Y. F. & Mohamad Zaini Abu Bakar (2025). The Dynamics of New Multi-Ethnic Political Parties in Malaysia with Special Reference to Parti Bangsa Malaysia (PBM). Asian Journal of Research in Education and Social Sciences, 7(8) 537-545. https://doi.org/10.55057/ajress.2025.7.8.44. Retrieved from https://mysitasi.mohe.gov.my/uploads/get-media-file?refId=f7b5ff9d-7a8a-416a-baad-b5c84b43e4a2

Loh, J. (2023). Malaysia Census Demographic 2020. https://www.kaggle.com/datasets/janeloh/malaysia-census-demographic-2020

Deivasagayam, A.D. (2025). Explaining UMNO’s Downfall Post GE14 and GE15: The Strengthened Convergence Between Two Cleavage Systems. Malaysian Journal of Social Sciences and Humanities, 10(2). https://doi.org/10.47405/mjssh.v10i2.3248

Merdeka Center for Opinion Research (2025, 23 June). Mid-Term Survey May 2025 Report Final. Retrieved from https://merdeka.org/mid-term-survey-may-2025-report-final/

Nora M. (2026, 16 April). Avoiding Weak Seats a Realistic Strategy, Says Analyst. Free Malaysia Today. https://www.freemalaysiatoday.com/category/nation/2026/04/16/avoiding-weak-seats-a-realistic-strategy-says-analyst

Pusat KOMAS (2024). Malaysia Racial Discrimination Report: a Decade in review (2015-2024). Retrieved from https://komas.org/download/Malaysia-Racism-Report_A-Decade-In-Review-2015-2024.pdf

Syaza S. (2024). Why Young Malay Voters in Malaysia are “Turning Green”. Trends in Southeast Asia, ISEAS – Yusof Ishak Institute, 12. Retrieved from https://www.iseas.edu.sg/wp-content/uploads/2024/04/TRS12_24.pdf

Thevesh T. (2026). Malaysian Election Corpus: Federal and State-Level Election Results since 1955. https://github.com/Thevesh/paper-meco-results/tree/main

Source Code
---
title: "Malaysian Elections"
subtitle: "Ethnicity and Election Predictions"
author: "Sean Ng"
organization: "AIMdata"
date-modified: "5 May 2026"
execute: 
  echo: false
---

```{r setup, include = FALSE}

knitr::opts_chunk$set(echo = FALSE, 
                      warning = FALSE, 
                      message = FALSE, 
                      fig.width = 9)


library(tidyverse)
library(here)
library(janitor)
library(scales)
library(tidytext)
library(widyr)
library(ggraph)
library(patchwork)
library(kableExtra)
library(viridis)
library(stringr)
library(sf)
library(ggmagnify)
library(ggthemes)
library(glmnet)
library(caret)
library(plotly)
library(ggguides)
library(DT)

`%out%` <- Negate(`%in%`)
options(scipen = 100)
theme_set(theme_light())
range_wna <- function(x){(x-min(x, na.rm = TRUE))/(max(x, na.rm = TRUE)-min(x, na.rm = TRUE))}

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

last_comma <- function(x, sep = ","){
  x <- sapply(strsplit(x, sep),"[")
  trimws(x[[length(x)]])
}
```




```{r data}

census_dun <- read_csv("./data/census_dun.csv") |> 
  mutate(population_bumi = ethnicity_proportion_bumi * population_total / 100, 
         population_chinese = ethnicity_proportion_chinese * population_total/ 100, 
         population_indian = ethnicity_proportion_indian * population_total / 100, 
         population_other = ethnicity_proportion_other * population_total / 100)


gini <- read_csv("./data/hh_inequality_parlimen.csv") |> 
  filter(date == "2024-01-01")


sf::sf_use_s2(FALSE)

parlimen_geo <- read_sf("./data/parlimen.geojson") |> 
  st_make_valid() |> 
  mutate(area_km2 = st_area(geometry) / 1000000, 
         area_km2 = as.numeric(area_km2))

federal_territories <- tribble(
  ~parlimen, ~population_total, ~pc_bumi, ~pc_chinese, ~pc_indian, ~pc_other, ~sex_male, ~sex_female, ~population_density, ~income_avg, ~expenditure_avg, ~poverty_incidence, ~area_km2,
  "P.114 Kepong", 106199, .202, .729, .063, .006, .588, .412, 8850, 13087, 6500, .003, 12,
  "P.115 Batu", 219132, .566, .301, .129, .004, .601, .399, 10957, 11019, 5817, 0, 20,
  "P.116 Wangsa Maju", 215600, .617, .306, .07, .008, .537, .463, 13475, 11153, 5768, 0, 16,
  "P.117 Segambut", 253715, .324, .558, .107, .012, .512, .479, 4975, 20521, 9845, 0, 51,
  "P.118 Setiawangsa", 147095, .697, .214, .085, .004, .519, .481, 9193, 12281, 6534, .003, 16, 
  "P.119 Titiwangsa", 122096, .677, .233, .076, .014, .519, .481, 8140, 11921, 6823, .005, 15,
  "P.120 Bukit Bintang", 120529, .271, .565, .155, .009, .565, .435, 5739, 13677, 7401, 0, 21, 
  "P.121 Lembah Pantai", 148094, .591, .262, .138, .009, .509, .491, 7405, 13081, 6186, 0, 20,
  "P.122 Seputeh", 322511, .311, .573, .112, .004, .510, .490, 10404, 13489, 7764, .005, 31,
  "P.123 Cheras", 135823, .322, .596, .076, .007, .527, .473, 8489, 12031, 5979, .002, 16,
  "P.124 Bandar Tun Razak", 191318, .679, .234, .083, .004, .509, .491, 7653, 11266, 6002, 0, 25,
  "P.166 Labuan", 95120, .862, .117, .011, .01, .514, .486, 1034, 8319, 4097, .031, 96,
  "P.125 Putrajaya", 109202, .979, .012, .006, .003, .509, .491, 2229, 12840, 7980, .004, 49
  
)

census_fed <- census_dun |> 
  mutate(population_poor = poverty_incidence * population_total / 100) |> 
  group_by(parlimen) |> 
  summarise_at(
    c("population_total", 
      "population_bumi", 
      "population_chinese", 
      "population_indian", 
      "population_other", 
      "population_poor", 
      "sex_male", 
      "sex_female"), 
    ~ sum(.x, na.rm = TRUE)
  ) |> 
  mutate_at(
    c("population_bumi", 
      "population_chinese", 
      "population_indian", 
      "population_other", 
      "population_poor", 
      "sex_male", 
      "sex_female"), 
    ~ .x / population_total
  ) |>  
  rename(
    pc_bumi = population_bumi, 
    pc_chinese = population_chinese, 
    pc_indian = population_indian, 
    pc_other = population_other, 
    poverty_incidence = population_poor) |> 
  left_join(
    census_dun |>
      mutate(
        income = income_avg * household_total,
        expenditure = expenditure_avg * household_total) |>
      group_by(parlimen) |>
      summarise(
        household_total = sum(household_total),
        income = sum(income),
        expenditure = sum(expenditure)) |>
      mutate(
        income_avg = income / household_total,
        expenditure_avg = expenditure / household_total) |>
      select(parlimen,
             income_avg, 
             expenditure_avg), 
    by = "parlimen") |> 
  left_join(
    parlimen_geo |> 
      st_drop_geometry() |> 
      select(parlimen, area_km2), 
    by = "parlimen"
  ) |> 
  mutate(population_density = population_total / area_km2) |> 
  rbind(federal_territories) |> 
  left_join(
    census_dun |> 
      distinct(parlimen, state), 
    by = "parlimen"
  ) |>
  left_join(
    gini |> 
      select(parlimen, gini), 
    by = "parlimen"
  ) |> 
  mutate(state = case_when(
    is.na(state) & str_detect(parlimen, "Labuan") ~ "Labuan", 
    is.na(state) & str_detect(parlimen, "Putrajaya") ~ "Putrajaya", 
    is.na(state) ~ "Kuala Lumpur", 
    TRUE ~ state
  )) |> 
  mutate(east_malaysia = ifelse(str_detect(state, "Sabah|Sarawak|Labuan"), 
                                1, 0))

ballots <- read_csv("./data/consol_ballots.csv") |> 
  mutate(
    age = ifelse(age == -1, NA_real_, age)
  ) |> 
  mutate(federal = ifelse(
    str_detect(seat, "P.0|P.1|P.2"), 
    "Federal", 
    "State"))

stats <- read_csv("./data/consol_stats.csv")

parlimen_geo <- read_sf("./data/parlimen.geojson") |> 
  st_make_valid() |> 
  mutate(area_km2 = st_area(geometry) / 1000000, 
         area_km2 = as.numeric(area_km2))

multiracial_races <- ballots |> 
  mutate(ethnicity = ifelse(ethnicity == "Orang Asli", "Other", ethnicity)) |> 
  mutate(count = 1) |> 
  # I think this filter is fine
  filter(rank <= 3) |> 
  group_by(state, seat, date, federal, election) |> 
  summarise(malay_candidates = sum(count[ethnicity == "Malay"]), 
            chinese_candidates = sum(count[ethnicity == "Chinese"]), 
            indian_candidates = sum(count[ethnicity == "Indian"]), 
            sabah_candidates = sum(count[ethnicity == "Bumi Sabah"]), 
            sarawak_candidates = sum(count[ethnicity == "Bumi Sarawak"]), 
            other_candidates = sum(count[ethnicity == "Other"]), 
            .groups = "drop") |> 
  mutate(total_candidates = malay_candidates + chinese_candidates + indian_candidates + 
           sabah_candidates + sarawak_candidates + other_candidates) |> 
  pivot_longer(cols = malay_candidates:other_candidates, 
               names_to = "candidate_ethnicity", 
               values_to = "value") |> 
  mutate(candidate_ethnicity = str_remove_all(candidate_ethnicity, "_candidates")) |> 
  filter(value != 0) |>
  mutate(multiracial = ifelse(value != total_candidates, 1, 0)) |> 
  mutate(combination = ifelse(
    multiracial == 0, paste0(candidate_ethnicity, "_only"), "multiethnic"
  )) |> 
  distinct(state, seat, date, federal, election, multiracial, combination)

census_alt <- read_csv("./data/MALAYSIA_CENSUS_DEMOGRAPHIC_2020_2.csv") |> 
  janitor::clean_names() |> 
  mutate(state = str_replace_all(state, "W.P. ", ""), 
         district = str_replace_all(district, "W.P. ", ""), 
         state = ifelse(state == "KualaLumpur", "Kuala Lumpur", state), 
         district = case_when(
           district == "Tasik Gelugor" ~ "Tasek Gelugor", 
           district == "Kulim - Bandar Bahang" ~ "Kulim-Bandar Baharu", 
           district == "PetraJaya" ~ "Petra Jaya", 
           district == "KotaRaja" ~ "Kota Raja",
           TRUE ~ district)
         ) |> 
  left_join(
    census_fed |> 
      mutate(district = str_sub(parlimen, start = 7L)) |> 
      distinct(state, district, parlimen), 
    by = c("state", "district")
  ) 


```


**Acronyms**
```{r table-acronyms, echo = FALSE, message=FALSE, warning=FALSE}
tribble(
  ~acronym, ~name, 
  "BN", "Barisan Nasional",  
  "DAP", "Democratic Action Party",
  "GPS", "Gabnungan Parti Sarawak",
  "GRS", "Gabungan Rakyat Sabah", 
  "GS", "Gagasan Sejahtera", 
  "MCA", "Malaysian Chinese Association", 
  "MIC", "Malaysian Indian Congress", 
  "PAS", "Parti Islam Se-Malaysia", 
  "PH", "Pakatan Harapan", 
  "PKR", "Parti Keadilan Rakyat", 
  "PN", "Perikatan Nasional", 
  "SUPP", "Sarawak United People's Party", 
  "UMNO", "United Malays National Organisation", 
  "USA", "United Sabah Alliance"
) |>  
  kable(col.names = NULL, format = "html") |> 
  kable_classic(bootstrap_options = c("condensed"), 
                full_width = FALSE, 
                position = "float_left")
```

<br>


Read the addendum on malapportionment [here](https://aimdata-labs.github.io/malaysian_elections/part_two.html). 

# Introduction: racialisation of elections

According to an [Architects of Diversity survey](https://www.aodmalaysia.org/sods), 64% of Malaysians reporting experiencing some for of discrimination in the past 12 months, with 32% saying it was related to ethnicity. 

[Pusat KOMAS](https://komas.org/download/Malaysia-Racism-Report_A-Decade-In-Review-2015-2024.pdf) states that election years tend to see "sharp increases in racial discrimination narratives [...] Political parties have consistently used race and religion as mobilization tools". 

They add, "politics accounted for just over 43% of all recorded racial discrimination incidents between 2015 and 2024" and that "racial and religious politics remain the primary diver of racial discrimination in Malaysia". The next general election is likely to witness another surge in racial politics, particularly with PAS and Bersatu continuing to consolidate their Malay-Muslim voter bases. 

43% (97 out of 222 constituencies) have ever had multiethnic electoral races (where the candidates were from different races). These constituencies align largely with the areas with the highest concentrations of non-Bumiputeras. The latest redelineation was done in 2006 and the plots in this section will focus on the years since then. 


<br>

```{r map}

parlimen_geo |> 
  left_join(
    multiracial_races |>
      filter(federal == "Federal") |>
      select(seat, multiracial, combination) |>
      group_by(seat) |>
      slice_max(order_by = multiracial) |>
      ungroup() |>
      distinct(seat, multiracial, combination), 
    by = c("parlimen" = "seat") 
  ) |> 
  mutate(multiracial = ifelse(multiracial == 1, "Multiethnic", "Single ethnicity")) |> 
  ggplot() + 
  geom_sf(aes(fill = multiracial), 
          linewidth = .01, 
          colour = "grey70") + 
  scale_fill_viridis_d(option = "magma", 
                       begin = .2, 
                       end = .9) + 
  # Only one inset can work at one time
  # Solution proposed here https://github.com/hughjonesd/ggmagnify/issues/30 
  # Does not work
  geom_magnify(
    from = c(101.223422,  101.953542, 2.805661, 3.383667), 
    to = c(105, 108, 3.5, 5.9), 
    shadow = FALSE,
    proj = "single"
  ) + 
    #theme_map() + 
  labs(
    fill = "",
    title = "Federal constituencies which have ever had multiethnic electoral races"
  ) + 
  theme_void() +

parlimen_geo |> 
  left_join(
    census_fed |>
      select(parlimen, pc_bumi), 
    by = "parlimen"
  ) |> 
  ggplot() + 
  geom_sf(aes(fill = pc_bumi), 
          linewidth = .01, 
          colour = "grey70") + 
  scale_fill_viridis(option = "magma", 
                       begin = .2, 
                       end = .9, 
                     labels = percent, 
                     breaks = seq(.1, .9, .1)) + 
  # Only one inset can work at one time
  # Solution proposed here https://github.com/hughjonesd/ggmagnify/issues/30 
  # Does not work
  geom_magnify(
    from = c(101.223422,  101.953542, 2.805661, 3.383667), 
    to = c(105, 108, 3.5, 5.9), 
    shadow = FALSE,
    proj = "single"
  ) + 
    #theme_map() + 
  labs(
    fill = "",
    title = "Federal constituencies by the proportion of their population who are Bumiputera"
  ) + 
  theme_void() +
  
  plot_layout(ncol = 1)
  
```

<br>

Given the racialised nature of Malaysian politics, Bumiputera candidates are far more likely to win in areas that have high proportions of Bumiputera candidates. Non-bumiputera candidates almost never win seats in constituencies with more than 75% Bumiputera residents (`r census_fed |> filter(pc_bumi > .75) |> nrow()` out of the `r census_fed |> nrow()` federal constituencies have at least 25% of their populations be racial minorities) and Bumiputera candidates tend only to win when a constituency has a population that is more than 50% Bumi. 

<br>


```{r winner-bumi}
ballots |>
  filter(date > "2006-01-01") |> 
  left_join(
    multiracial_races, 
    by = c("state", "seat", "date", "federal", "election")
  ) |> 
  left_join(
    census_fed |> 
      select(parlimen, pc_bumi), 
    by = c("seat" = "parlimen")
  ) |> 
  filter(result %in% c("won", "won_uncontested")) |> 
  mutate(
    bumiputera = ifelse(ethnicity %in% c("Malay", "Bumi Sabah", "Bumi Sarawak"),
                        "Bumiputera winner",
                        "non-Bumi winner"),
    east_malaysia = ifelse(str_detect(state, "Sabah|Sarawak|Labuan"), 
                           "East Malaysia", "Peninsula")) |> 
  ggplot(aes(x = pc_bumi, fill = east_malaysia)) + 
  geom_histogram() + 
  scale_fill_viridis_d(begin = .1, end = .9) +
  scale_x_continuous(labels = percent) +
  facet_wrap(~ bumiputera) + 
  labs(
    title = "Electoral races by % of the constituency who are Bumiputera", 
    subtitle = "Shows races since the 2006 redelineation", 
    fill = "", 
    y = "Number of electoral races", 
    x = "% of constituency who are Bumiputera"
  ) + 
  theme(strip.background = element_rect(fill = "grey20"))

```

<br>

Barisan Nasional and, to a lesser extent, Perikatan Nasional have been the primary beneficiaries of this racialised dimension to elections. 

Almost all of BN's electoral wins come from constituencies which are more than 75% Bumiputera. This is the same for Perikatan Nasional. Similarly, constituencies which are more than 75% Bumiputera almost never vote for parties under the Pakatan Harapan coalition (or its predecessor Pakatan Rakyat). 

<br>

```{r coalition-bumi}
ballots |>
  filter(date > "2006-01-01") |> 
  left_join(
    multiracial_races, 
    by = c("state", "seat", "date", "federal", "election")
  ) |> 
  left_join(
    census_fed |> 
      select(parlimen, pc_bumi), 
    by = c("seat" = "parlimen")
  ) |> 
  filter(result %in% c("won", "won_uncontested")) |> 
  mutate(
    bumiputera = ifelse(ethnicity %in% c("Malay", "Bumi Sabah", "Bumi Sarawak"),
                        "Bumiputera winner",
                        "non-Bumi winner"),
    east_malaysia = ifelse(str_detect(state, "Sabah|Sarawak|Labuan"), 
                           "East Malaysia", "Peninsula")) |> 
  ggplot(aes(x = pc_bumi, fill = coalition)) + 
  geom_histogram() + 
  scale_fill_manual(values = c(
    "ALONE" = "grey50", 
    "BN" = "#1D1147FF", 
    "GPS" = "#51127CFF", 
    "GRS" = "#822681FF", 
    "GS" = "#B63679FF", 
    "PH" = "#E65164FF", 
    "PN" = "#FB8861FF", 
    "PR" = "#FEC287FF", 
    "USA" = "#FCFDBFFF"
  )) +
  scale_x_continuous(labels = percent) +
  facet_wrap(~ bumiputera) + 
  labs(
    title = "Electoral races by % of the constituency who are Bumiputera", 
    subtitle = "Shows races since the 2006 redelineation", 
    fill = "Coalition", 
    y = "Number of electoral races", 
    x = "% of constituency who are Bumiputera"
  ) + 
  theme(strip.background = element_rect(fill = "grey20"))
```

<br> 

A significant minority of non-Bumiputera MPs were from BN. Additionally, most of those wins came from constituencies with higher Bumiputera populations, indicating that party loyalty (which itself has racial connotations) was, at one point, a stronger factor than the ethnicity of individual candidates. 

<br><br><br>

# Ethnic makeup of parties and coalitions

Amongst the various coalitions, East Malysian Coalitions and Gagasan Sejahtera were the most racially homogeneous. Interestingly, Pakatan Harapan and Pakatan Rakyat both fielded proportionally more Malay candidates than Barisan Nasional. 

However, as noted by [Azzubair, Kartini and Nazri](https://www.frontiersin.org/journals/political-science/articles/10.3389/fpos.2026.1721966/full), whilst BN might be multiethnic in nature, the coalition was not consociational, but was "sustained more by UMNO’s control and patronage than by genuine power-sharing". 


<br>

```{r coalition-ethncity}
ballots |> 
  filter(date > "2006-01-01") |> 
  mutate(count = 1) |> 
  filter(rank <= 3) |>
  group_by(coalition, ethnicity) |> 
  summarise(count = sum(count), .groups = "drop") |> 
  group_by(coalition) |> 
  mutate(total_candidates = sum(count)) |> 
  ungroup() |> 
  mutate(pc = count / total_candidates) |> 
  ggplot(aes(x = pc, y = fct_rev(coalition))) + 
  geom_col(aes(fill = ethnicity)) + 
  geom_vline(xintercept = .25, linetype = "longdash", colour = "white", alpha = .4) +
  geom_vline(xintercept = .5, linetype = "longdash", colour = "white", alpha = .4) +
  geom_vline(xintercept = .75, linetype = "longdash", colour = "white", alpha = .4) +
  scale_fill_viridis_d() + 
  scale_x_continuous(label = percent) + 
  labs(x = "% of candidates", 
       y = "Coalition", 
       title = "Candidates fielded by ethnicity", 
       subtitle = "Only showing politicians who ran since the redelineation in 2006", 
       fill = "Ethnicity")
  
  
```

<br>

The plot below shows the breakdown of candidates by ethnicity and party. Amanah, Bersatu, PAS and UMNO are Malay parties; DAP, Gerakan and the MCA are Chinese parties. The MIC is an Indian party, PBB is a Sabahan party and WARISAN is a Sarawakian party. BEBAS and PKR are the only true multiracial parties. 

However, despite all coalitions having representation from all major ethnicities, the idea of a multiracial Malaysia nation (*Bangsa Malaysia*) has failed at a cultural-cognitive level, with [Law and Zaini](https://mysitasi.mohe.gov.my/uploads/get-media-file?refId=f7b5ff9d-7a8a-416a-baad-b5c84b43e4a2) arguing that this ideology is neither internalised nor taken for granted. One academic expert in their study states, "[Malaysians] fall back on primordial identity... no one identifies himself or herself as Bangsa Malaysia". They ultimately conclude that "moments of public disillusionment with identity politics are insufficient to overturn deep-seated institutional [inequality]". 


<br> 


```{r ethnicity-party-candidates}
ballots |> 
  filter(date > "2006-01-01") |> 
  mutate(count = 1) |>
  mutate(ethnicity = ifelse(ethnicity == "Orang Asli", "Other", ethnicity)) |> 
  mutate(party = fct_lump(party, n = 12)) |> 
  group_by(party, ethnicity) |> 
  summarise(count = sum(count), .groups = "drop") |> 
  group_by(party) |> 
  mutate(total_candidates = sum(count)) |> 
  ungroup() |> 
  mutate(pc = count / total_candidates) |> 
  ggplot(aes(x = pc, y = fct_rev(fct_relevel(party, 
                                             c("Other", "UMNO", "PAS", "PKR", 
                                               "BEBAS", "DAP", "MCA", "BERSATU", 
                                               "AMANAH", "WARISAN", "PBB","GERAKAN", "STAR"
                                             ))))) + 
  geom_col(aes(fill = ethnicity)) + 
  geom_vline(xintercept = .25, linetype = "longdash", colour = "white", alpha = .4) +
  geom_vline(xintercept = .5, linetype = "longdash", colour = "white", alpha = .4) +
  geom_vline(xintercept = .75, linetype = "longdash", colour = "white", alpha = .4) +
  scale_fill_viridis_d() + 
  scale_x_continuous(label = percent) + 
  labs(x = "% of candidates", 
       y = "Party", 
       title = "The only true multi-ethnic parties are BEBAS and PKR", 
       subtitle = "Only candidates since 2006; parties ordered by number of candidates", 
       fill = "Candidate\nethnicity")
  
  
```


<br><br><br>

# The collapse of UMNO and Islamic politics

According to [Deivasagayam](https://msocialsciences.com/index.php/mjssh/article/view/3248/2134), the two main cleavages in Malaysian society -- race and religion -- have converged, contributing to UMNO's downfall. He explains, "UMNO consistently positioned itself as the exclusive advocate and guardian of Malay rights and privileges". However, by engaging in an "Islamisation Race" and promoting state-engineered political Islam to stave off PAS, UMNO inadvertently legitimised PAS in the eyes of Muslim voters; and when it advocated for Malay Islamic Supremacy, it set that stage for the convergence for race and religion which would later ultimately benefit PAS and BERSATU, which demonstrated the political viability of a Malay-Muslim governance structure which excluded minorities. 

We see, from the plot below, the extent of UMNO's electoral collapse. From its peak of `r filter(ballots, election == "GE-11" & result %in% c("won", "won_uncontested") & party == "UMNO") |> nrow()` seats under Badawi, it now holds less seats than either PAS, DAP, PKR or Bersatu. 

<br>


```{r parties-mp-share}
ballots |> 
  mutate(count = 1) |> 
  filter(result %in% c("won", "won_uncontested")) |> 
  filter(str_detect(election, "GE")) |> 
  mutate(year = year(date)) |> 
  group_by(election) |> 
  mutate(year = min(year)) |> 
  ungroup() |> 
  mutate(party = fct_lump(party, n = 12, w = votes)) |> 
  group_by(year, party) |> 
  summarise(mps = sum(count), 
            .groups = "drop") |> 
  ggplot(aes(x = year, y = mps, group = party)) + 
  geom_line(aes(colour = party), 
            linewidth = 1.05, 
            alpha = .8) + 
  # scale_colour_manual(
  #   values = c(
  #     "UMNO" = "#30123BFF",
  #     "MCA" = "#4454C4FF",
  #     "MIC" = "#4490FEFF",
  #     "BERSATU" = "#1FC8DEFF",
  #     "PAS" = "#29EFA2FF",
  #     "GERAKAN" = "#7DFF56FF",
  #     "PBB" = "#C1F334FF",
  #     "PBS" = "#F1CA3AFF",
  #     "USNO" = "#FE922AFF",
  #     "SUPP" = "#EA4F0DFF",
  #     "PKR" = "#BE2102FF",
  #     "DAP" = "#7A0403FF",
  #     "Other" = "grey50"
  #     
  #   )
  # ) +
  scale_colour_viridis_d(option = "turbo") +
  scale_x_continuous(breaks = seq(1955, 2026, 5)) +
  scale_y_continuous(breaks = seq(0, 120, 10)) +
  guides(colour = guide_legend(override.aes = list(linewidth = 2, 
                                                   alpha = 1))) + 
  labs(x = "", 
       y = "Number of MPs", 
       title = "The collapse of UMNO has left PAS and DAP as the largest parties", 
       subtitle = "Results from federal elections", 
       colour = "") + 
  theme(axis.text.x = element_text(size = 7))
```
<br>

However, when we examine their share of votes, we see that their parliamentary dominance was always more fragile than was perceived. At no time after 1960 did UMNO ever win more than 40% of the votes. Additionally, the Badawi peak of `r filter(ballots, election == "GE-11" & result %in% c("won", "won_uncontested") & party == "UMNO") |> nrow()` seats (49% of seats) was accomplished with only 35% of the national vote, highlighting the unfairness of a first-past-the-post system. Though now, UMNO (and PKR) are penalised by the first-past-the-post system,. 

Finally, though their share of votes is lower than it has ever been, UMNO still, marginally, received more votes than any other party in the most recent general election. 

<br> 

```{r parties-vote-share}
ballots |>  
  filter(str_detect(election, "GE")) |>
  mutate(year = year(date)) |> 
  group_by(election) |> 
  mutate(year = min(year)) |>
  ungroup() |> 
  mutate(party = fct_lump(party, n = 12, w = votes)) |>
  group_by(year, party) |> 
  summarise(votes = sum(votes), .groups = "drop") |>
  group_by(year) |> 
  mutate(total_votes = sum(votes)) |> 
  ungroup() |> 
  mutate(votes_pc = votes / total_votes) |> 
  ggplot(aes(x = year, y = votes_pc, group = party)) + 
  geom_line(aes(colour = party), 
            linewidth = 1.05, 
            alpha = .8) + 
  scale_colour_viridis_d(option = "turbo") +
  scale_x_continuous(breaks = seq(1955, 2026, 5)) +
  guides(colour = guide_legend(override.aes = list(linewidth = 2, 
                                                   alpha = 1))) + 
  scale_y_continuous(labels = percent, breaks = seq(0, .6, .1)) +
  labs(x = "", 
       y = "Share of popular vote", 
       title = "UMNO still has, very marginally, the largest share of votes",
       subtitle = "But it has never won more than 40% of votes since the formation of the Federation", 
       colour = "") + 
  theme(axis.text.x = element_text(size = 7))
  
  
```

<br> 

Echoing Deivasagayam's arguments, [Syaza](https://www.iseas.edu.sg/wp-content/uploads/2024/04/TRS12_24.pdf) concludes that "the sense of Islamic identity and belonging often supersedes the allure of economic development for Malay youths when choosing how to cast their vote". 

This speaks to not only to the Islamisation race that UMNO, PAS and Bersatu have engaged in -- [Syaza](https://www.iseas.edu.sg/wp-content/uploads/2024/04/TRS12_24.pdf) reports that 65% of respondents in a Merdeka Center Muslim Youth Survey saying that Muslims should only vote for Muslim leaders -- but also to an unfair economic system where whatever economic boons offered are too meagre and piecemeal, and insufficient to overcome racial and religious identity politics despite economic concerns being the [top issue](https://merdeka.org/mid-term-survey-may-2025-report-final/) amongst voters. 

Pakatan Harapan, furthermore, seems unwilling to engage in the Islamisation Race or put forward a competing vision of a more moderate Islam, possibly -- I hypothesise -- out of fears that doing so would simultaneously alienate its core non-Muslim base and anger conservative Muslims.



<br><br><br>

# Fragile coalitions

Whilst public support for BN has faded considerably, between BN and PN, there was majority support for a Malay-supremacist (*Ketuanan Melayu*) government, picking up more than 50% of votes in GE-15, an increase in the vote share garnered by Barisan and GS (PAS's previous coalition) in GE-14. 

The current unity (*Madani*, PH-BN-GPS-GRS) government is heavily reliant on BN for both a parliamentary majority as well as a majority of the vote share. 

<br>

```{r general-elections-vote-share}
ballots |> 
  mutate(count = 1) |> 
  mutate(coalition = ifelse(
    coalition %in% c("USA", "ALONE", "GTA"), "Other", coalition
  )) |> 
  filter(election %in% c("GE-15", "GE-14", "GE-13", "GE-12")) |> 
  group_by(election, coalition) |>
  summarise(votes = sum(votes), 
            candidates = sum(count), 
            .groups = "drop") |> 
  group_by(election) |> 
  mutate(total_votes = sum(votes)) |> 
  ungroup() |> 
  mutate(votes_pc = votes / total_votes) |> 
  ggplot(aes(x = votes_pc, y = fct_rev(coalition), fill = coalition)) + 
  geom_col() + 
  facet_wrap(~ election, scales = "free_y") + 
  scale_fill_viridis_d(option = "magma", begin = .1, end = .9) + 
  scale_x_continuous(labels = percent) + 
  labs(x = "% of votes", 
       y = "", 
       fill = "Coalition", 
       title = "Vote share by coalition") + 
  theme(strip.background = element_rect(fill = "grey30"))
```

<br>

The two largest coalitions in parliament are now PH and PN. But since neither were able to win a commanding mandate in GE-15 (or able to compromise with each other), BN was left to play kingmaker. 

<br>

```{r general-elections-seats}
ballots |> 
  mutate(count = 1) |> 
  mutate(coalition = ifelse(
    coalition %in% c("USA", "ALONE", "GTA"), "Other", coalition
  )) |> 
  filter(election %in% c("GE-15", "GE-14", "GE-13", "GE-12") & 
           result %in% c("won", "won_uncontested")) |> 
  group_by(election, coalition) |>
  summarise(votes = sum(votes), 
            seats = sum(count), 
            .groups = "drop") |> 
  group_by(election) |> 
  mutate(total_votes = sum(votes)) |> 
  ungroup() |> 
  mutate(votes_pc = votes / total_votes) |> 
  ggplot(aes(x = seats, y = fct_rev(coalition), fill = coalition)) + 
  geom_col() + 
  facet_wrap(~ election, scales = "free_y") + 
  scale_fill_viridis_d(option = "magma", begin = .1, end = .9) + 
  scale_x_continuous(breaks = seq(0, 150, 20)) +
  labs(x = "Parliament seats", 
       y = "", 
       fill = "Coalition", 
       title = "Parliament seats by coalition") + 
  theme(strip.background = element_rect(fill = "grey30"))
```

<br>

The plots below show vote share amongst the major coalitions in state elections. We see that that PN has managed to garner a lead over PH in state polls. 

Recalling our criticism of Anwar in the [addendum on malapportionment](https://aimdata-labs.github.io/malaysian_elections/), it is possible that his bid with GRS to increase the number of seats in Sabah is meant to engineer a more lasting parliamentary majority without needing to win the popular vote. We would emphasise that any further malapportionment would be, ultimately, an anti-democratic exercise. 

<br>

```{r state-elections-coalitions}

ballots |> 
  mutate(count = 1) |> 
  mutate(coalition = ifelse(
    coalition %in% c("USA", "ALONE", "GTA", "WARISAN-PLUS", "GASAK"), "Other", coalition
  )) |> 
  # Due to it being in 2021, SE-14 for Sarawak was included in SE-15
  mutate(election = ifelse(date == "2021-12-18" & state == "Sarawak", 
                "SE-15", election)) |> 
  filter(election %in% c("SE-15", "SE-14", "SE-13", "SE-12")) |> 
  group_by(election, coalition) |>
  summarise(votes = sum(votes), 
            candidates = sum(count), 
            .groups = "drop") |> 
  group_by(election) |> 
  mutate(total_votes = sum(votes)) |> 
  ungroup() |> 
  mutate(votes_pc = votes / total_votes) |> 
  ggplot(aes(x = votes_pc, y = fct_rev(coalition), fill = coalition)) + 
  geom_col() + 
  facet_wrap(~ election, scales = "free_y") + 
  scale_fill_viridis_d(option = "magma", begin = .1, end = .9) + 
  scale_x_continuous(labels = percent) + 
  labs(x = "% of votes", 
       y = "", 
       fill = "Coalition", 
       title = "State election vote share by coalition") + 
  theme(strip.background = element_rect(fill = "grey30"))
```

<br> 

[Syaza](https://www.iseas.edu.sg/wp-content/uploads/2024/04/TRS12_24.pdf) argues that there is a strong dichotomy between Malay and non-Malay voters' perceptions of the government. 85% of Malay voters felt that the Perikatan Nasional government protects the interest of the Malays, while 77% of non-Malay voters believed the government does not treat all races equally. 

Given the intense racialisation of Malaysian politics, we will next use demographic data to predict the performances of Pakatan Harapan and Perikatan Nasional in the upcoming GE-16. 

<br><br><br>


# Predictions for the GE-16



```{r training-and-testing}

# Training dataset is kind of a frankenstein dataset
# Mostly from State Election 15, 
# but also 14 for Sarawak (which took place in 18 Dec 2021)
# GE-14 for the federal territories is included 
# in the hope that ft voting patterns are fairly static
# The model is significantly worse without the federal territories data

training <- census_fed |> 
  mutate(pc_bumi_peninsula = ifelse(
    east_malaysia == 0, pc_bumi, 0), 
    pc_bumi_east_malaysia = ifelse(
      east_malaysia == 1, pc_bumi, 0
    )) |> 
  select(
    parlimen, population_density, pc_bumi_peninsula, pc_bumi_east_malaysia, 
    pc_chinese, pc_indian, gini, income_avg, poverty_incidence
  ) |> 
  left_join(
    census_alt |>
      mutate(non_citizen_pc = non_citizens / total_population) |>
      select(
        parlimen,
        labour_force_participation_rate,
        sex_ratio,
        total_dependency_ratio,
        average_household_size
      ),
    by = "parlimen"
  ) |> 
  left_join(
    ballots |>
      filter(election == "SE-15") |>
      # Using SE-14 for Sarawak for the testing data 
      # It definitely falls into the correct date range though 
      rbind(ballots |>
              filter(date == "2021-12-18" & state == "Sarawak")) |>
      left_join(census_dun |>
                  distinct(dun, parlimen), by = c("seat" = "dun")) |>
      group_by(parlimen) |>
      summarise(
        ph_votes = sum(votes[coalition == "PH"], na.rm = TRUE),
        bn_votes = sum(votes[coalition == "BN"], na.rm = TRUE), 
        pn_votes = sum(votes[coalition == "PN"], na.rm = TRUE),
        total_votes = sum(votes, na.rm = TRUE),
        .groups = "drop"
      ) |>
      # Using GE-14 as the testing data for the federal territories
      # This data does fall outside of a reasonable date range, 
      # But I am hoping that federal territories voting results
      # are fairly static
      rbind(
        ballots |>
          filter(election == "GE-14" & str_detect(state, "Kuala Lumpur|Putrajaya|Labuan")) |>
          group_by(parlimen = seat) |>
          summarise(
            ph_votes = sum(votes[coalition %in% c("PH", "PR")], na.rm = TRUE),
            bn_votes = sum(votes[coalition == "BN"], na.rm = TRUE),
            pn_votes = sum(votes[coalition %in% c("PN", "GS")], na.rm = TRUE),
            total_votes = sum(votes, na.rm = TRUE),
            .groups = "drop"
          )
      ) |> 
      mutate(pc_ph_votes = ph_votes / total_votes), 
    by = "parlimen"
  ) |> 
  filter(!is.na(pc_ph_votes)) |>
  # Both population density and income must be logged
  mutate(population_density = log(population_density), 
         income_avg = log(income_avg)) |> 
  # Normalising the variables
  mutate_at(
    c("population_density", "pc_bumi_peninsula", "pc_bumi_east_malaysia", 
      "pc_chinese", "pc_indian", "gini", "income_avg", 
      "labour_force_participation_rate", "poverty_incidence", 
      "sex_ratio", "total_dependency_ratio", "average_household_size"), 
    function(x){
      (x - mean(x) / sd(x))
    }
  ) |> 
  na.omit()

# Testing data are the results from GE-15
testing <- census_fed |> 
  mutate(pc_bumi_peninsula = ifelse(
    east_malaysia == 0, pc_bumi, 0), 
    pc_bumi_east_malaysia = ifelse(
      east_malaysia == 1, pc_bumi, 0
    )) |> 
  select(
    parlimen, population_density, pc_bumi_peninsula, pc_bumi_east_malaysia, 
    pc_chinese, pc_indian, gini, income_avg, poverty_incidence
  ) |> 
  left_join(
    census_alt |>
      mutate(non_citizen_pc = non_citizens / total_population) |>
      select(
        parlimen,
        labour_force_participation_rate,
        sex_ratio,
        total_dependency_ratio,
        average_household_size
      ),
    by = "parlimen"
  ) |>  
  left_join(
    ballots |> 
      filter(election == "GE-15") |> 
      group_by(seat) |> 
      summarise(ph_votes = sum(votes[coalition == "PH"], na.rm = TRUE), 
                bn_votes = sum(votes[coalition == "BN"], na.rm = TRUE), 
                pn_votes = sum(votes[coalition == "PN"], na.rm = TRUE),
                total_votes = sum(votes, na.rm = TRUE), 
                .groups = "drop") |>
      mutate(pc_ph_votes = ph_votes / total_votes), 
    by = c("parlimen" = "seat")
  ) |> 
   # Both population density and income must be logged
  mutate(population_density = log(population_density), 
         income_avg = log(income_avg)) |> 
  # Normalising the variables
  mutate_at(
    c("population_density", "pc_bumi_peninsula", "pc_bumi_east_malaysia", 
      "pc_chinese", "pc_indian", "gini", "income_avg", 
      "labour_force_participation_rate", "poverty_incidence", 
      "sex_ratio", "total_dependency_ratio", "average_household_size"), 
    function(x){
      (x - mean(x) / sd(x))
    }
  ) |> 
  na.omit()
```



```{r stepwise-model-ph}
#| echo: false
#| output: false

stepwise_model <- step(lm(
  pc_ph_votes ~ population_density + pc_bumi_peninsula + pc_bumi_east_malaysia + 
  pc_chinese + pc_indian + gini + income_avg + poverty_incidence + 
    labour_force_participation_rate + sex_ratio + total_dependency_ratio + 
    average_household_size,  
  data = training, 
  direction = "backward"
))

# The stepwise model indicates that the most relevant variables to predicting 
# PH performance are log(population density), % of population who are peninsula bumis, 
# % of the population who are east malaysia bumis and % of the population who are indian

```



```{r glmnet-model-ph}

set.seed(3)

glmnet_model <- train(
  pc_ph_votes ~ population_density + pc_bumi_peninsula + pc_bumi_east_malaysia + 
  pc_chinese + pc_indian + gini + income_avg + poverty_incidence + 
    labour_force_participation_rate + sex_ratio + total_dependency_ratio + 
    average_household_size,  
  data = training, 
  method = "glmnet", 
  metric = "RMSE", 
  trControl = trainControl(method = "LOOCV"), 
  preProcess = c("center", "scale"), 
  tuneLength = 10
)

```

```{r preds}
preds <- testing %>%
  mutate(
    aic_pred = predict(object = stepwise_model, newdata = .),
    glmnet_pred = predict(object = glmnet_model, newdata = testing)
  ) %>%
  mutate(pred = (aic_pred + glmnet_pred) / 2) |> 
  select(proj_ph_vote = pred, 
         step_pred = aic_pred, 
         glmnet_pred = glmnet_pred)

testing_preds <- testing |> 
  cbind(preds)

```

```{r RMSE, eval=FALSE}

sqrt(mean(testing_preds$pc_ph_votes - testing_preds$step_pred)^2)

sqrt(mean(testing_preds$pc_ph_votes - testing_preds$glmnet_pred)^2)

stepwise_model |> summary()

rss_step <- sum((testing_preds$pc_ph_votes - testing_preds$step_pred)^2)
tss <- sum((testing_preds$pc_ph_votes - mean(testing_preds$pc_ph_votes))^2)
1 - rss_step / tss

rss_glm <- sum((testing_preds$pc_ph_votes - testing_preds$glmnet_pred)^2)
1 - rss_glm / tss
```



```{r ph-safe-seats}

modifier1 <- ballots |> 
  filter(coalition == "PH") |>
  # For the chance of PH winning a seat if it has less than 50% of the votes
  filter(votes_perc < 50) |> 
  mutate(ph_win = ifelse(result %in% c("won", "won_uncontested"), 1, 0), 
         votes_perc = votes_perc / 100) %>% 
  lm(ph_win ~ votes_perc, data = .) |> 
  summary() |> 
  tidy() |> 
  filter(term == "votes_perc") |> 
  pull(estimate)

modifier2 <- ballots |> 
  filter(coalition == "PH") |>
  # Modifier for the full dataset
  # Basically just the chance of PH winning
    mutate(ph_win = ifelse(result %in% c("won", "won_uncontested"), 1, 0), 
         votes_perc = votes_perc / 100) %>% 
  lm(ph_win ~ votes_perc, data = .) |> 
  summary() |> 
  tidy() |> 
  filter(term == "votes_perc") |> 
  pull(estimate)

safe_seats <- testing_preds |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  mutate(ph_win_chance = ifelse(
    proj_ph_vote < .5, proj_ph_vote * modifier1, proj_ph_vote * modifier2), 
    ph_win_chance = ifelse(ph_win_chance > 1, 1, ph_win_chance)) |>
  filter(ph_win_chance > .5) |> 
  nrow()

ph_ge_15 <- ballots |> 
  mutate(count = 1) |> 
  filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
  group_by(coalition) |> 
  summarise(count = sum(count), .groups = "drop") |> 
  mutate(pc = round(count / 222 * 100, 2)) |> 
  filter(coalition == "PH") |> 
  pull(count)
```


Below, the performance (the share of votes they received) of PH in GE-15 is plotted against their predicted performance in GE-16 using the combination of a stepwise and a glmnet model, with each point being a federal constituency. It is predicted that PH's vote share will shrink in GE-16. 

We predict that Pakatan is expected to win only **`r safe_seats`** out of the 222 total seats in parliament, down from `r ph_ge_15` in GE-15. 


<br>


```{r plotting-results-ph}
#| out-width: 100%
#| out-height: 100%
pred_plot <- testing_preds |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  mutate(ph_win_chance = ifelse(
    proj_ph_vote < .5, proj_ph_vote * modifier1, proj_ph_vote * modifier2), 
    ph_win_chance = ifelse(ph_win_chance > 1, 1, ph_win_chance)) |> 
  ggplot(aes(x = pc_ph_votes, y = proj_ph_vote)) + 
  geom_point(aes(colour = ph_win_chance,
                 text = paste0(parlimen, ",", "\n", 
                               state, ",", "\n", 
                               "PH win chance: ", round(ph_win_chance * 100, 2), "%", "\n",
                               "Projected PH%: ", round(proj_ph_vote * 100, 2), "%", "\n", 
                               "GE-15 PH%: ", round(pc_ph_votes * 100, 2), "%", "\n", 
                               "GE-15 winner: ", coalition))) +
  geom_smooth(method = "lm") + 
  scale_colour_viridis(option = "plasma", 
                       labels = percent) + 
  scale_x_continuous(labels = percent, breaks = seq(0, 1, .1)) + 
  scale_y_continuous(labels = percent, breaks = seq(0, 1, .1)) +
  labs(colour = "Chance of\nPH win", 
       x = "% Pakatan Harapan votes in GE-15", 
       y = "Projected % of PH votes in GE-16", 
       title = "Interactive plot of Pakatan Harapan GE-15 results and predictions for GE-16", 
       subtitle = "Mouse over for details; drag and click to select and zoom; double-click legend select/deselect")

ggplotly(pred_plot, tooltip = c("text")) |> 
  layout(height = 500, width = 800, 
         margin = list(t = 50)) |> 
  config(displayModeBar = FALSE)
```


<br>


These predictions also underscore the instability of the Madani government's coalition. The results have been re-plotted below to show the GE-15 winner of each constituency. Marginal seats (roughly in the middle of the graph below) where PH may seek to expand its reach will largely only come at the expense of BN and vice versa. 

There are a fair number of marginal seats (largely three-cornered fights) where PH received less than 40% of the votes in GE-15, but were still won by them. These are all vulnerable for pick-up by BN or PN. 

<br>



```{r results-coalition}
#| out-width: 100%
#| out-height: 100%
pred_plot_coalition <- testing_preds |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  ggplot(aes(x = pc_ph_votes, y = proj_ph_vote)) + 
  geom_point(aes(colour = coalition,
                 text = paste0(parlimen, ",", "\n", 
                               state, ",", "\n", 
                               "Projected PH%: ", round(proj_ph_vote * 100, 2), "%", "\n", 
                               "GE-15 PH%: ", round(pc_ph_votes * 100, 2), "%", "\n", 
                               "GE-15 winner: ", coalition))) +
  geom_smooth(method = "lm") + 
  scale_colour_viridis_d(option = "turbo") + 
  scale_x_continuous(labels = percent, breaks = seq(0, 1, .1)) + 
  scale_y_continuous(labels = percent, breaks = seq(0, 1, .1)) +
  labs(colour = "Winning\ncoalition\nin GE15", 
       x = "% Pakatan Harapan votes in GE-15", 
       y = "Projected % of PH votes in GE-16", 
       title = "Interactive plot of Pakatan Harapan GE-15 results and predictions for GE-16", 
       subtitle = "Mouse over for details; drag and click to select and zoom; double-click legend select/deselect")

ggplotly(pred_plot_coalition, tooltip = c("text")) |> 
  layout(height = 500, width = 800, 
         margin = list(t = 50)) |> 
  config(displayModeBar = FALSE)
```

<br>

Based on our predictions of PH's performance, below is the projected breakdown of seats within the coalition. The DAP will remain the largest party in the coalition, whilst PKR will be faced with the highest amount of difficult races. 

<br>

```{r party-ph-preds, fig.height=3.5}
testing_preds |> 
  mutate(ph_win_chance = ifelse(
    proj_ph_vote < .5, proj_ph_vote * modifier1, proj_ph_vote * modifier2), 
    ph_win_chance = ifelse(ph_win_chance > 1, 1, ph_win_chance)) |> 
  filter(ph_win_chance >= .5) |> 
  mutate(marginal = ifelse(ph_win_chance <= .7, "Marginal", "Safe")) |> 
  left_join(
    ballots |>
      filter(election == "GE-15" & rank %in% c(1, 2)) |>
      select(state, seat, rank, party) |>
      mutate(rank = ifelse(rank == 1, "first", "second")) |>
      pivot_wider(names_from = rank, values_from = party), 
    by = c("parlimen" = "seat")
  ) |> 
  mutate(projected_winner = ifelse(
    first %out% c("AMANAH", "DAP", "MUDA", "PKR", "UPKO"), 
    second, 
    first), 
    count = 1) |>
  group_by(projected_winner, marginal) |> 
  summarise(count = sum(count)) |>
  ggplot(aes(x = count, y = reorder_within(projected_winner, count, marginal))) + 
  geom_col(aes(fill = projected_winner)) + 
  geom_text(aes(label = count),
            hjust = "inward") +
  scale_fill_viridis_d(option = "cividis", begin = .8, end =.2) + 
  facet_wrap(~ marginal, scales = "free_y") +
  scale_y_reordered() +
  guides(fill = guide_legend(reverse = TRUE)) + 
  labs(title = "Projected breakdown of probable PH seats in GE-16", 
       subtitle = "Marginal seats are those where PH is expected to win less than 50% of the vote", 
       y = "", 
       x = "Projected number of seats") + 
  theme(legend.position = "none", 
        strip.background = element_rect(fill = "grey20"))


```
<br><br><br>

## Perikatan Nasional


```{r pn-bn-train-test}
pn_bn_train <- training |> 
  mutate(pc_pn_votes = pn_votes / total_votes, 
         pc_bn_votes = bn_votes / total_votes, 
         bn_pn_difference_pc = pc_bn_votes - pc_pn_votes)

pn_bn_test <- testing |> 
  mutate(pc_pn_votes = pn_votes / total_votes, 
         pc_bn_votes = bn_votes / total_votes, 
         bn_pn_difference_pc = pc_bn_votes - pc_pn_votes)
```

```{r stepwise-bn-pn}
#| echo: false
#| output: false

stepwise_model_bp <- step(lm(
  bn_pn_difference_pc ~ population_density + pc_bumi_peninsula + pc_bumi_east_malaysia + 
  pc_chinese + pc_indian + gini + income_avg + poverty_incidence + 
    labour_force_participation_rate + sex_ratio + total_dependency_ratio + 
    average_household_size, 
  data = pn_bn_train, 
  direction = "backward"
))

```


```{r glmnet-bn-pn}
set.seed(3)

glmnet_model_bp <- train(
  bn_pn_difference_pc ~ population_density + pc_bumi_peninsula + pc_bumi_east_malaysia + 
  pc_chinese + pc_indian + gini + income_avg + poverty_incidence + 
    labour_force_participation_rate + sex_ratio + total_dependency_ratio + 
    average_household_size, 
  data = pn_bn_train, 
  method = "glmnet", 
  metric = "RMSE", 
  trControl = trainControl(method = "LOOCV"), 
  preProcess = c("center", "scale"), 
  tuneLength = 10
)
```



```{r preds-bn-pn}
preds_pn_bn <- pn_bn_test %>%
  mutate(
    aic_pred = predict(object = stepwise_model_bp, newdata = .),
    glmnet_pred = predict(object = glmnet_model_bp, newdata = pn_bn_test)
  ) %>%
  mutate(pred = (aic_pred + glmnet_pred) / 2) |> 
  select(bn_pn_difference_pc_pred = pred, 
         step_pred = aic_pred, 
         glmnet_pred = glmnet_pred)

pn_bn_results <- pn_bn_test |> 
  cbind(preds_pn_bn)

```

Let's use the same combination of stepwise linear modelling and glmnet to predict the performance of Barisan Nasional against Perikatan Nasional. 

According to the model, BN is expected to lose even more ground to PN in the upcoming 16th general election, making further inroads into Johor, Pahang and Perak. BN will be largely dependent on its competitiveness in East Malaysia to ensure that it has any representation in parliament. 

In the plot below, negative values on the y-axis indicate Perikatan winning more votes than Barisan. We note that the model predicts PN winning more votes than BN in almost all the constituencies that BN won in GE-15. 

Clustered at `x = 0` are the constituencies where BN and PN were closest to each other in number of votes: these are the constituencies where they are most competitive (with each other) and not competitive at all (likely losing by a large margin to PH or another coalition). 

<br>



```{r results-bn-pn}
#| out-width: 100%
#| out-height: 100%
pred_pn_bn_coalition <- pn_bn_results |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  left_join(census_fed |>
              select(parlimen,
                     pc_chinese_actual = pc_chinese, 
                     pc_indian_actual = pc_indian, 
                     pc_bumi_actual = pc_bumi,
                     pop_den_actual = population_density), 
            by = "parlimen") |> 
  ggplot(aes(x = bn_pn_difference_pc, y = bn_pn_difference_pc_pred)) + 
  geom_vline(xintercept = 0, linetype = 2, size = .7, colour = "grey30") + 
  geom_hline(yintercept = 0, linetype = 2, size = .7, colour = "grey30") +
  geom_point(aes(colour = coalition,
                 text = paste0(parlimen, ",", "\n", 
                               state, ",", "\n", 
                               "Projected BN-PN diff.%: ", 
                               round(bn_pn_difference_pc_pred * 100, 2), "%", "\n", 
                               "Actual BN-PN diff.%: ", round(bn_pn_difference_pc * 100, 2), "%", "\n", 
                               "GE-15 BN%: ", round(pc_bn_votes * 100, 2), "%", "\n",
                               "GE-15 PN%: ", round(pc_pn_votes * 100, 2), "%", "\n", 
                               "% Bumi: " , round(pc_bumi_actual * 100, 2), "%", "\n",
                               "% Chinese: " , round(pc_chinese_actual * 100, 2), "%", "\n",
                               "Population density: " , round(pop_den_actual, 2), "\n",
                               "GE-15 winner: ", coalition)),
             stroke = NA, 
             size = 2) +
  geom_smooth(method = "lm") + 
  scale_colour_manual(values = c("ALONE" = "#30123B40", 
                                 "BN" = "#3E9BFEFF",
                                 "GPS" = "#46F88440", 
                                 "GRS" = "#E1DD3740",
                                 "PH" = "#F05B1240", 
                                 "PN" = "#7A0403FF")) +
  scale_x_continuous(labels = percent, breaks = seq(-1, 1, .1)) + 
  scale_y_continuous(labels = percent, breaks = seq(-1, 1, .1)) +
  labs(colour = "Winning\ncoalition\nin GE15", 
       x = "% Difference between BN and PN votes in GE-15", 
       y = "Projected % difference between BN and PN votes in GE-16", 
       title = "Projected difference in BN and PN votes vs. actual difference", 
       subtitle = "Mouse over for details; drag and click to select and zoom; double-click legend select/deselect")

ggplotly(pred_pn_bn_coalition, tooltip = c("text")) |> 
  layout(height = 500, width = 800, 
         margin = list(t = 50)) |> 
  config(displayModeBar = FALSE)
```

<br>

```{r stepwise-pn}
#| echo: false
#| output: false

stepwise_model_pn <- step(lm(
  pc_pn_votes ~ population_density + pc_bumi_peninsula + pc_bumi_east_malaysia + 
  pc_chinese + pc_indian + gini + income_avg + poverty_incidence + 
    labour_force_participation_rate + sex_ratio + total_dependency_ratio + 
    average_household_size, 
  data = pn_bn_train, 
  direction = "backward"
))

```


```{r glmnet-pn}
set.seed(3)

glmnet_model_pn <- train(
  pc_pn_votes ~ population_density + pc_bumi_peninsula + pc_bumi_east_malaysia + 
  pc_chinese + pc_indian + gini + income_avg + poverty_incidence + 
    labour_force_participation_rate + sex_ratio + total_dependency_ratio + 
    average_household_size, 
  data = pn_bn_train, 
  method = "glmnet", 
  metric = "RMSE", 
  trControl = trainControl(method = "LOOCV"), 
  preProcess = c("center", "scale"), 
  tuneLength = 10
)
```


```{r preds-pn}
preds_pn <- pn_bn_test %>%
  mutate(
    aic_pred = predict(object = stepwise_model_pn, newdata = .),
    glmnet_pred = predict(object = glmnet_model_pn, newdata = pn_bn_test)
  ) %>%
  mutate(pred = (aic_pred + glmnet_pred) / 2) |> 
  select(proj_pn_vote = pred, 
         step_pred = aic_pred, 
         glmnet_pred = glmnet_pred)

pn_test <- pn_bn_test |> 
  cbind(preds_pn)

```

```{r pn-safe-seats}

modifier_pn1 <- ballots |> 
  filter(coalition == "PN") |>
  # For the chance of PH winning a seat if it has less than 50% of the votes
  filter(votes_perc < 50) |> 
  mutate(pn_win = ifelse(result %in% c("won", "won_uncontested"), 1, 0), 
         votes_perc = votes_perc / 100) %>% 
  lm(pn_win ~ votes_perc, data = .) |> 
  summary() |> 
  tidy() |> 
  filter(term == "votes_perc") |> 
  pull(estimate)

modifier_pn2 <- ballots |> 
  filter(coalition == "PN") |>
  # Modifier for the full dataset 
    mutate(pn_win = ifelse(result %in% c("won", "won_uncontested"), 1, 0), 
         votes_perc = votes_perc / 100) %>% 
  lm(pn_win ~ votes_perc, data = .) |> 
  summary() |> 
  tidy() |> 
  filter(term == "votes_perc") |> 
  pull(estimate)

pn_safe_seats <- pn_test |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  mutate(pn_win_chance = ifelse(
    proj_pn_vote < .5, proj_pn_vote * modifier_pn1, proj_pn_vote * modifier_pn2), 
    pn_win_chance = ifelse(pn_win_chance > 1, 1, pn_win_chance)) |>
  filter(pn_win_chance >= .5) |> 
  nrow()

pn_ge_15 <- ballots |> 
  mutate(count = 1) |> 
  filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
  group_by(coalition) |> 
  summarise(count = sum(count), .groups = "drop") |> 
  mutate(pc = round(count / 222 * 100, 2)) |> 
  filter(coalition == "PN") |> 
  pull(count)
```

Perikatan Nasional is predicted to be be largest coalition in parliament in 2016. Using the same combination of stepwise and glmnet models as before, we predict that PN will win **`r pn_safe_seats`** out of the 222 seats in parliament, up from `r pn_ge_15` in GE-15, making further inroads into Selangor, Johor and Melaka. Perikatan remains uncompetitive in East Malaysia. 

<br>



```{r plotting-pn-results}
#| out-width: 100%
#| out-height: 100%
pn_pred_plot <- pn_test |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
   mutate(pn_win_chance = ifelse(
    proj_pn_vote < .5, proj_pn_vote * modifier_pn1, proj_pn_vote * modifier_pn2), 
    pn_win_chance = ifelse(pn_win_chance > 1, 1, pn_win_chance)) |> 
  ggplot(aes(x = pc_pn_votes, y = proj_pn_vote)) + 
  geom_point(aes(colour = pn_win_chance,
                 text = paste0(parlimen, ",", "\n", 
                               state, ",", "\n", 
                               "PN win chance: ", round(pn_win_chance * 100, 2), "%", "\n",
                               "Projected PN%: ", round(proj_pn_vote * 100, 2), "%", "\n", 
                               "GE-15 PN%: ", round(pc_pn_votes * 100, 2), "%", "\n", 
                               "GE-15 winner: ", coalition))) +
  geom_smooth(method = "lm") + 
  scale_colour_viridis(option = "plasma", 
                       labels = percent) + 
  scale_x_continuous(labels = percent, breaks = seq(0, 1, .1)) + 
  scale_y_continuous(labels = percent, breaks = seq(0, 1, .1)) +
  labs(colour = "Chance of\nPN win", 
       x = "% PH votes in GE-15", 
       y = "Projected % of PN votes in GE-16", 
       title = "Interactive plot of Perikatan Nasional GE-15 results and predictions for GE-16", 
       subtitle = "Mouse over for details; drag and click to select and zoom; double-click legend select/deselect")

ggplotly(pn_pred_plot, tooltip = c("text")) |> 
  layout(height = 500, width = 800, 
         margin = list(t = 50)) |> 
  config(displayModeBar = FALSE)
```


<br> 

The model does not predict, however, an outright win. And Perikatan Nasional will still need to partner with another coalition to form the government. 

PAS is anticipated to be the largest party within the coalition, and it will also contest fewer marginal seats than Bersatu. 

<br>



```{r party-pn-preds, fig.height=3}

pn_test |> 
  mutate(pn_win_chance = ifelse(
    proj_pn_vote < .5, proj_pn_vote * modifier_pn1, proj_pn_vote * modifier_pn2), 
    pn_win_chance = ifelse(pn_win_chance > 1, 1, pn_win_chance)) |>
  filter(pn_win_chance >= .5) |>
  mutate(marginal = ifelse(pn_win_chance <= .7, "Marginal", "Safe")) |> 
  left_join(
    ballots |>
      filter(election == "GE-15" & rank %in% c(1, 2, 3)) |>
      select(state, seat, rank, party) |>
      mutate(rank = case_when(rank == 1 ~ "first", 
                              rank == 2 ~ "second", 
                              rank == 3 ~ "third")) |> 
      pivot_wider(names_from = rank, values_from = party), 
    by = c("parlimen" = "seat")
  ) |> 
  mutate(projected_winner = case_when(
    first %in% c("BERSATU", "GERAKAN", "PAS") ~ first, 
    second %in% c("BERSATU", "GERAKAN", "PAS") ~ second, 
    third %in% c("BERSATU", "GERAKAN", "PAS") ~ third), 
    count = 1) |> 
  group_by(projected_winner, marginal) |> 
  summarise(count = sum(count)) |>
  ggplot(aes(x = count, y = reorder_within(projected_winner, count, marginal))) + 
  geom_col(aes(fill = projected_winner)) + 
  geom_text(aes(label = count),
            hjust = "inward", 
            colour = "grey40") +
  scale_fill_viridis_d(option = "cividis", begin = .8, end =.2) + 
  facet_wrap(~ marginal, scales = "free_y") +
  scale_y_reordered() +
  guides(fill = guide_legend(reverse = TRUE)) + 
  labs(title = "Projected breakdown of probable PN seats in GE-16", 
       subtitle = "Marginal seats are those where PN is expected to win less than 50% of the vote", 
       y = "", 
       x = "Projected number of seats") + 
  theme(legend.position = "none", 
        strip.background = element_rect(fill = "grey20"))


```



<br><br><br>


# Conclusions

Perikatan Nasional are anticipated to win **`r pn_safe_seats`** seats, whilst Pakatan Harapan are anticipated to only win **`r safe_seats`** seats. As a small consolation to PH, there is projected to only be a very small difference between PH's and PN's shares of the national vote. But to have a chance at being the largest coalition, Pakatan must win all of Selangor and extend its reach in Johor and Pahang -- an unlikely prospect. 

<br>


```{r projected-national-vote-share-seats, fig.height=4}

testing_preds |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  select(proj_ph_vote, parlimen) |> 
  left_join(
    pn_test |>
      left_join(
        ballots |>
          filter(election == "GE-15" &
                   result %in% c("won", "won_uncontested")) |>
          select(seat, coalition),
        by = c("parlimen" = "seat"))
  ) |> 
  mutate(proj_ph_vote_number = proj_ph_vote * total_votes, 
         proj_pn_vote_number = proj_pn_vote * total_votes) |> 
  summarise_at(vars(proj_ph_vote_number, 
                    proj_pn_vote_number, 
                    total_votes), 
               ~ sum(.x)) |> 
  mutate(Other = total_votes - proj_ph_vote_number - proj_pn_vote_number) |> 
  pivot_longer(everything()) |> 
  mutate(total_votes = max(value)) |> 
  mutate(pc = value / total_votes, 
         name = case_when(
           name == "proj_ph_vote_number" ~ "Pakatan Harapan", 
           name == "proj_pn_vote_number" ~ "Perikatan Nasional", 
           TRUE ~ name), 
         pc = round(pc * 100, 2)) |> 
  filter(name != "total_votes") |> 
  select(name, pc) |>
  mutate(name = fct_relevel(name, 
                            c("Pakatan Harapan",
                              "Perikatan Nasional",
                              "Other"))) |> 
  ggplot(aes(x = pc, y = fct_rev(name))) + 
  geom_col(aes(fill = name)) + 
  geom_text(aes(label = paste0(pc, "%")), 
            hjust = "inward", 
            colour = "white") +
  scale_fill_viridis_d(option = "magma", begin = .1, end = .9, direction = -1) +
  theme(legend.position = "none", 
        strip.background = element_rect(fill = "grey20")) +
  labs(x = "Vote share (%)", 
       y = "", 
       fill = "", 
       title = "Predicted vote shares") +
  
tribble(
      ~name, ~seats, 
      "Pakatan Harapan", 72, 
      "Perikatan Nasional", 95,
      "Other", 55
    ) |> 
  mutate(name = fct_relevel(name, 
                            c("Pakatan Harapan",
                              "Perikatan Nasional",
                              "Other"))) |> 
  ggplot(aes(x = seats, y = fct_rev(name))) + 
  geom_col(aes(fill = name)) + 
  geom_text(aes(label = seats), 
            hjust = "inward", 
            colour = "white") +
  scale_fill_viridis_d(option = "magma", begin = .1, end = .9, direction = -1) +
  theme(legend.position = "none", 
        strip.background = element_rect(fill = "grey20"), 
        axis.text.y = element_blank()) +
  labs(x = "Number of seats", 
       y = "", 
       fill = "", 
       title = "Predicted number of seats") + 
  
  plot_annotation(title = "Predictions for GE-16")

```

<br>

It might not be possible to re-form the Madani government. This is dependent on the extent of BN's losses and the scale of PN's victory. We were not able to develop a model that accurately predicted BN's district-level vote share using demographic data, probably due to BN's performance being largely determined by non-demographic factors (i.e. running a particularly strong candidate in Titiwangsa or the strengthening of unfavourable perceptions of the coalition). It is possible that they will not even be in a position to play kingmaker. 


<br>


```{r ph-vs-pn}
#| out-width: 100%
#| out-height: 100%
ph_pn_plot <- testing_preds |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  mutate(ph_win_chance = ifelse(
    proj_ph_vote < .5, proj_ph_vote * modifier1, proj_ph_vote * modifier2), 
    ph_win_chance = ifelse(ph_win_chance > 1, 1, ph_win_chance)) |> 
  left_join(
    pn_test |>
      left_join(
        ballots |>
          filter(election == "GE-15" &
                   result %in% c("won", "won_uncontested")) |>
          select(state, seat, coalition),
        by = c("parlimen" = "seat")
      ) |>
      mutate(
        pn_win_chance = ifelse(
          proj_pn_vote < .5,
          proj_pn_vote * modifier_pn1,
          proj_pn_vote * modifier_pn2
        ),
        pn_win_chance = ifelse(pn_win_chance > 1, 1, pn_win_chance),
        pn_win = ifelse(pn_win_chance < .5, "pn_win", "pn_loss")
      ) |>
      select(parlimen, proj_pn_vote, pn_win_chance), 
    by = "parlimen"
  ) |> 
  mutate(winner = case_when(
    proj_pn_vote > .33 & proj_pn_vote > proj_ph_vote ~ "PN win", 
    proj_ph_vote > .33 & proj_ph_vote > proj_pn_vote ~ "PH win", 
    pn_win_chance >= .5 ~ "PN win",
    ph_win_chance >= .5 ~ "PH win", 
    TRUE ~ "Other"), 
    winner = fct_relevel(winner, c("PH win", "PN win", "Other"))) |> 
  ggplot(aes(x = proj_ph_vote, y = proj_pn_vote, group = coalition)) + 
  geom_point(aes(colour = winner,
                 text = paste0(parlimen, ",", "\n", 
                               state, ",", "\n", 
                               "Projected PH%: ", round(proj_ph_vote * 100, 2), "%", "\n",
                               "Projected PN%: ", round(proj_pn_vote * 100, 2), "%", "\n", 
                               "GE-15 winner: ", coalition))) + 
  scale_colour_viridis_d(option = "magma", begin = .1, end = .9, direction = -1) + 
  scale_x_continuous(labels = percent, breaks = seq(0, 1, .1)) + 
  scale_y_continuous(labels = percent, breaks = seq(0, 1, .1)) +
  labs(title = "Interactive plot of Pakatan vs. Perikatan projected vote % by constituency", 
       x = "Projected Pakatan %", 
       y = "Projected Perikatan %", 
       colour = "Projected\nwinner", 
       group = "GE-15 winner") + 
  guides(colour = guide_legend(order = 1), 
         group = guide_legend(order = 2))
  
ggplotly(ph_pn_plot, tooltip = c("text")) |> 
  layout(height = 500, width = 800, 
         margin = list(t = 50)) |>  
  config(displayModeBar = FALSE) 


```

<br>

The models (the details of which are in the appendices below) we developed also highlight the electoral challenges facing Barisan: whilst it may be more competitive than Perikatan in areas that are more ethnically diverse, it will still likely lose those seats to Pakatan Harapan. It may be more competitive in East Malaysia, but support there has swung towards native and indigenous coalitions. And whilst it can still win largely homogeneous Malay communities, these constituencies tend to be extremely small in terms of population. 

<br>

```{r marginal-seats}
ballots |> 
  filter(election == "GE-15" & rank %in% c(1, 2)) |>
  group_by(seat) |> 
  mutate(second_place = min(votes_perc)) |> 
  ungroup() |> 
  filter(rank == 1) |> 
  mutate(margin = (votes_perc - second_place)) |>
  filter(margin < 10) |>
  select(state, seat, coalition, margin) |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & rank == 2) |> 
      select(seat, second_place = coalition),
    by = "seat"
  ) |> 
  mutate(count = 1) |> 
  group_by(coalition, second_place) |> 
  summarise(count = sum(count)) |> 
  ggplot(aes(x = coalition, y = count)) + 
  geom_col(aes(fill = second_place)) + 
  scale_fill_viridis_d(option = "magma") + 
  labs(title = "Competitive seats in GE-15", 
       subtitle = "Where the margin was less than 10%", 
       x = "Winning coalition", 
       y = "Number of marginal seats", 
       fill = "Second\nplace")



```

<br>

If we take a look at the most marginal seats from GE-15 in the plot above, BN's largest competitors are PH and PN. If support for BN to further erode and Malay voters to consolidate behind PN, as we predict, PH would no longer be sure to win in three-cornered fights (where it has picked up most of its post-2018 wins). The primary beneficiary of three-cornered fights after the collapse of UMNO has been Pakatan Harapan.

<br>


```{r three-cornered}
ballots |> 
  # Beginning of the collapse of BN and the opening up of 
  # the political space
  filter(date > "2013-01-01" & votes_perc >= 5) |> 
  mutate(count = 1) |> 
  group_by(seat, date, federal) |> 
  summarise(candidates = sum(count), .groups = "drop") |> 
  mutate(multicandidate = ifelse(
    candidates > 2, 1, 0
  )) |> 
  left_join(
    ballots |> 
      filter(date > "2013-01-01" & 
               result %in% c("won", "won_uncontested")) |> 
      select(date, seat, winner = coalition), 
    by = c("date", "seat")
  ) |> 
  filter(candidates != 1) |> 
  mutate(count = 1) |> 
  ggplot(aes(x = candidates, y = count)) + 
  geom_col(width = 1, 
                 aes(fill = winner)) +
  facet_wrap(~ federal, scales = "free") + 
  scale_fill_viridis_d(option = "magma") + 
  scale_x_continuous(breaks = seq(0, 7, 1)) + 
  theme(strip.background = element_rect(fill = "grey20")) + 
  labs(title = "Three-cornered fights are usually won by PH or BN", 
       subtitle = "Only electoral races post UMNO collapse (2018); only counts candidates which won more than 5% of the vote share", 
       y = "Number of electoral races", 
       x = "Number of candidates", 
       fill = "Winner")
  
  

```

<br>

However, the margin by which PH has won has been trending downwards, especially in contrast to its predecessor, Pakatan Rakyat. Its footprint is also narrowing, especially with one its key component parties, PKR, focusing on [safe seats](https://www.freemalaysiatoday.com/category/nation/2026/04/16/avoiding-weak-seats-a-realistic-strategy-says-analyst). It seems that they too are predicting losses for PH and are seeking to minimise the extent of their defeat.  

<br>


```{r ph-performance-year}
ballots |> 
  filter(coalition %in% c("PH", "PR") & result != "won_uncontested") |> 
  mutate(result = str_to_title(str_replace(result, "_", " "))) |>
  ggplot(aes(x = date, y = votes_perc / 100)) + 
  geom_smooth(se = FALSE, method = "lm", colour = "red",
              linewidth = .8) +
  geom_jitter(aes(colour = result), 
              alpha = .6) + 
  scale_colour_viridis_d(option = "plasma", end = .75) + 
  scale_x_date(breaks = "2 years", date_labels = "%y") +
  scale_y_continuous(labels = percent, 
                     breaks = seq(0, 1, .1)) +
  facet_wrap(~ federal) + 
  theme(strip.background = element_rect(fill = "grey20")) + 
  labs(x = "Year", 
       y = "PH/PR vote share", 
       title = "Electoral performance of PR/PH by vote share", 
       colour = "Result") + 
  guides(colour = guide_legend(override.aes = list(size = 3,
                                                   alpha = 1), 
                               reverse = TRUE))

# ggsave("./plots/performance_year.png", width = 7, height = 4, units = "in")
```

<br>

[James Chin](https://www.commonwealthroundtable.co.uk/commonwealth/eurasia/malaysia/research-article-anwars-long-walk-to-power-the-2022-malaysian-general-elections/) at the Commonwealth Roundtable argues that there is no longer any middle ground left in Malaysian politics, with the DAP (Pakatan) on one end and PAS (Perikatan) on the other. This erosion of the middle ground will have a profound negative impact on Pakatan Harapan's election chances, especially as it refuses to engage in the Islamisation race or put forward an alternative vision that can win over Malay youth. 

The convergence of the two social cleavages of race and religion, as well as UMNO's downfall, have made it most probable that Perikatan Nasional will be the largest coalition in parliament. 


<br><br><br>



# Appendices

Read the addendum on malapportionment [here](https://aimdata-labs.github.io/malaysian_elections/part_two.html).

## Data 

The data for this report comes primarily from  [Thevesh](https://github.com/Thevesh/paper-meco-results/tree/main), who compiled the *Malaysian Election Corpus*, containing federal and state-level election results since 1955. 

Additional data, including the digitised census datasets, come from [JJean95](https://www.kaggle.com/datasets/jjean95/malaysia-general-election-datasets?select=census_dun.csv) and [Jane Loh](https://www.kaggle.com/datasets/janeloh/malaysia-census-demographic-2020).


<br><br><br>

## Reference table

Note that the models sometimes predict negative vote shares. We have left these in as the models are overall, fairly accurate and explainable and these instances of negative vote shares are quite few in number. 

<br>

```{r}
testing_preds |> 
  left_join(
    ballots |> 
      filter(election == "GE-15" & result %in% c("won", "won_uncontested")) |> 
      select(state, seat, coalition), 
    by = c("parlimen" = "seat")
  ) |> 
  mutate(ph_win_chance = ifelse(
    proj_ph_vote < .5, proj_ph_vote * modifier1, proj_ph_vote * modifier2), 
    ph_win_chance = ifelse(ph_win_chance > 1, 1, ph_win_chance)) |> 
  left_join(
    pn_test |>
      left_join(
        ballots |>
          filter(election == "GE-15" &
                   result %in% c("won", "won_uncontested")) |>
          select(state, seat, coalition),
        by = c("parlimen" = "seat")
      ) |>
      mutate(
        pn_win_chance = ifelse(
          proj_pn_vote < .5,
          proj_pn_vote * modifier_pn1,
          proj_pn_vote * modifier_pn2
        ),
        pn_win_chance = ifelse(pn_win_chance > 1, 1, pn_win_chance),
        pn_win = ifelse(pn_win_chance < .5, "pn_win", "pn_loss")
      ) |>
      select(parlimen, proj_pn_vote, pn_win_chance), 
    by = "parlimen"
  ) |> 
  mutate(winner = case_when(
    proj_pn_vote > .33 & proj_pn_vote > proj_ph_vote ~ "PN win", 
    proj_ph_vote > .33 & proj_ph_vote > proj_pn_vote ~ "PH win", 
    pn_win_chance >= .5 ~ "PN win",
    ph_win_chance >= .5 ~ "PH win", 
    TRUE ~ "Other")) |> 
  mutate(proj_ph_vote = round(proj_ph_vote * 100, 2), 
         proj_pn_vote = round(proj_pn_vote * 100, 2), 
         winner = str_replace_all(winner, " win", "")) |> 
  select(
    State = state, Seat = parlimen, 
    `pred. PH vote` = proj_ph_vote, 
    `pred. PN vote` = proj_pn_vote, 
    `pred. winner` = winner, 
    `GE15 winner` = coalition
  ) |>  
  datatable(filter = list(position = "top", clear = FALSE),
            options = list(pageLength = 10, scrollX = TRUE),
            caption = htmltools::tags$caption(style = "caption-side: top;
                                              text-align: center; font-size: 140%;",
                                              "Predicted vote shares for GE-16"))
  
```

<br><br><br>

## Models

### Pakatan Harapan models

Below is a summary of the stepwise model used to predict Pakatan Harapan's performance in GE-16. The population density and percentage of Indians in a constituency (`pc_indian`) both have positive correlation with PH electoral performance. The percentage of Bumiputeras (either peninsula or East Malaysian) have a negative correlation with electoral performance. 

<br>

```{r stepwise-summary-ph}
stepwise_model |> summary()
```

<br>

These are the coefficients for the final glmnet model used to predict Pakatan Harapan's performance in GE-16. Somewhat similarly to the stepwise model, the percentage of Indians and Chinese in a constituency have both strong positive correlations with PH electoral performance, as does the population density. 

The predicted values of these two models were averaged to obtain the final predictions. 

<br>

```{r glmnet-coef-ph}
coef(glmnet_model$finalModel, s = glmnet_model$bestTune$lambda)
```

<br><br><br>

### Perikatan Nasional models

The stepwise model used to predict Perikatan's election performance is more complex than what was needed to predict Pakatan Harapan's performance. 

The percentage of Chinese in a constituency has the strongest negative correlation with PN performance, followed by the percentage of East Malaysia Bumis. The negative correlation with the percentage of peninsula Bumis might indicate that homogeneous Malay constituencies still have a tendency to vote for UMNO (though its t-value is comparatively lower).  

There are also negative correlations with average income (indicating more disadvantaged areas are more likely to vote for PN) as well with the sex ratio. Additionally, whilst the effect is not as strong as it is with Pakatan Harapan, population density does have a positive correlation with PN's vote share -- it does have quite widespread popularity in urban areas in northern and eastern peninsula Malaysia. 

<br>

```{r summary-stepwise-pn}
stepwise_model_pn |>  summary()
```
<br>

Similar to the stepwise model, Perikatan's glmnet model is also more complex than Pakatan's. But the most notable coefficients are once again the negative correlations with the Chinese and East Malaysian Bumiputera populations (highlighting the racialised nature of elections as well as PN's uncompetitiveness in East Malaysia). 

<br>

```{r coef-glmnet-pn}
coef(glmnet_model_pn$finalModel, s = glmnet_model_pn$bestTune$lambda)
```
<br>

As with the Pakatan Harapan model, the predictions for both the Perikatan models were averaged to provide the final predictions. 

<br><br><br>

### Comparing Barisan Nasional against Perikatan Nasional

Finally, below is the stepwise model used to predict the difference between Barisan and Perikatan vote shares. We see that Barisan is more likely to win less dense areas (BN is the primary beneficiary of malapportionment, as we note in our [addendum on malapportionment](https://aimdata-labs.github.io/malaysian_elections/)), as well in areas that were more ethnically diverse. Additionally, BN is more likely than PN to win in areas that have higher incomes and more men than women. 

However, we should note than the r-squared is quite low, when compared to the earlier two models. BN's performance is actually quite difficult to model from demographic data, probably due to the several non-demographic factors influencing their popularity (corruption, dissatisfaction over their collaboration with PH). 

<br>

```{r summary-stepwise-bp}
stepwise_model_bp |>  summary()
```
<br>

Similarly, the glmnet model indicates that BN is more likely to get more votes than PN in areas that are more ethnically diverse and those which have higher incomes, as well as in East Malaysia. 

<br>

```{r coef-glmnet-bp}
coef(glmnet_model_bp$finalModel, s = glmnet_model_bp$bestTune$lambda)
```

<br>

Ultimately, these models highlight the electoral challenges faces Barisan: whilst it may be more competitive than Perikatan in areas that are more ethnically diverse, it will still likely lose those seats to Pakatan Harapan. It may be more competitive in East Malaysia, but support there has swung towards native and indigenous coalitions. And whilst it can still win largely homogeneous Malay communities, these constituencies tend to be extremely small in terms of population.

We could very conceivably improve these predictions if we had access to additional data, such as education levels and religious attitudes. Additionally, opinion polling in Malaysia is far less developed than in the US and UK and do not occur at a sufficient frequency or sample size to be included in the models we have developed. 

<br><br><br>



# References

::: {#refs}

Architects of Diverity (2023). *State of Discrimination Survey 2023*. Retrieved from https://www.aodmalaysia.org/sods. 

Azzubair, Kartini & Nazri Muslim (2026). A Systematic Literature Review of Malaysia's Coalition Politics, 2021-2025. *Frontiers in Political Science, Elections and Representation section*, 8-2026. https://doi.org/10.3389/fpos.2026.1721966

Chin, J. (2023) Anwar’s long walk to power: the 2022 Malaysian general
elections, The Round Table, 112:1, 1-13, DOI: 10.1080/00358533.2023.2165303

JJean95 (2022). *Malaysia General Election Results (GE12-GE15)*. https://www.kaggle.com/datasets/jjean95/malaysia-general-election-datasets

Law Y. F. & Mohamad Zaini Abu Bakar (2025). The Dynamics of New Multi-Ethnic Political Parties in Malaysia with Special Reference to Parti Bangsa Malaysia (PBM). *Asian Journal of Research in Education and Social Sciences*, 7(8) 537-545. https://doi.org/10.55057/ajress.2025.7.8.44. Retrieved from https://mysitasi.mohe.gov.my/uploads/get-media-file?refId=f7b5ff9d-7a8a-416a-baad-b5c84b43e4a2

Loh, J. (2023). *Malaysia Census Demographic 2020*. https://www.kaggle.com/datasets/janeloh/malaysia-census-demographic-2020

Deivasagayam, A.D. (2025). Explaining UMNO's Downfall Post GE14 and GE15: The Strengthened Convergence Between Two Cleavage Systems. *Malaysian Journal of Social Sciences and Humanities*, 10(2). https://doi.org/10.47405/mjssh.v10i2.3248

Merdeka Center for Opinion Research (2025, 23 June). Mid-Term Survey May 2025 Report Final. Retrieved from https://merdeka.org/mid-term-survey-may-2025-report-final/ 

Nora M. (2026, 16 April). Avoiding Weak Seats a Realistic Strategy, Says Analyst. *Free Malaysia Today*. https://www.freemalaysiatoday.com/category/nation/2026/04/16/avoiding-weak-seats-a-realistic-strategy-says-analyst

Pusat KOMAS (2024). *Malaysia Racial Discrimination Report: a Decade in review (2015-2024)*. Retrieved from https://komas.org/download/Malaysia-Racism-Report_A-Decade-In-Review-2015-2024.pdf

Syaza S. (2024). Why Young Malay Voters in Malaysia are "Turning Green". *Trends in Southeast Asia, ISEAS -- Yusof Ishak Institute*, 12. Retrieved from https://www.iseas.edu.sg/wp-content/uploads/2024/04/TRS12_24.pdf

Thevesh T. (2026). *Malaysian Election Corpus: Federal and State-Level Election Results since 1955*. https://github.com/Thevesh/paper-meco-results/tree/main


:::