# Paris 2024: Olympic debrief

Who had the best performance?
R
Data viz
Author

Liam D. Bailey

Published

August 16, 2024

## Introduction: What have we learnt so far?

In my last post, we saw how medal inflation and economic advantage can make it tricky to properly assess a country’s Olympic performance. Now that we have the final results from Paris 2024, we can use our new found knowledge to find the countries that over-(and under) performed at this year’s event.

## The data

We start out where we left off in the last blog with data on medal percentages between 1996 and 2016. We will add to this with new results from Paris 2024. To include information on GDP, we will use data from the World Bank.

## The basics: Clean the data and add some context

Packages this time are: dplyr, tidyr, readr, and lay to work with data and ggplot2 and shadowtext to create plots and scales and gt when building tables. This time we’ll also include spaMM, which we’ll use for some basic statistical modelling. Again, countrycode is needed to convert NOC to ISO3c.

## Data wrangling
library(dplyr)
library(tidyr)
library(readr)
library(lay)
library(countrycode)
## Plots and tables
library(ggplot2)
library(shadowtext)
library(geomtextpath)
library(scales)
library(gt)
## Stats
library(spaMM)

First, let’s read in the data we finished with last time and the new data from Paris.

## Load data
(historical_data <- readr::read_csv("./data/training_data.csv", show_col_type = FALSE))
# A tibble: 2,463 × 6
Year ISO3c country       perc_Gold perc_Silver perc_Bronze
<dbl> <chr> <chr>             <dbl>       <dbl>       <dbl>
1  2016 USA   United States     15.0       12.1         10.6
2  2016 GBR   Great Britain      8.79       7.52         4.72
3  2016 CHN   China              8.47       5.88         7.22
4  2016 RUS   Russia             6.19       5.56         5.56
5  2016 DEU   Germany            5.54       3.27         4.17
6  2016 JPN   Japan              3.91       2.61         5.83
7  2016 FRA   France             3.26       5.88         3.89
8  2016 KOR   South Korea        2.93       0.980        2.5
9  2016 ITA   Italy              2.61       3.92         2.22
10  2016 AUS   Australia          2.61       3.59         2.78
# ℹ 2,453 more rows
paris <- readr::read_csv("./data/paris_results.csv", show_col_types = FALSE) |>
mutate(across(Gold:Total, $$x) tidyr::replace_na(x, 0)), ISO3c_countrycode = countrycode::countrycode(NOC, origin = "ioc", destination = "iso3c"), ISO3c = case_when(!is.na(ISO3c_countrycode) ~ ISO3c_countrycode, NOC == "KOS" ~ "XKX", NOC == "SIN" ~ "SGP", is.na(ISO3c_countrycode) ~ NOC), country = countrycode::countrycode(ISO3c, origin = "iso3c", destination = "country.name")) |> select(country, ISO3c, Gold:Total) paris # A tibble: 92 × 6 country ISO3c Gold Silver Bronze Total <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 United States USA 40 44 42 126 2 China CHN 40 27 24 91 3 Japan JPN 20 12 13 45 4 Australia AUS 18 19 16 53 5 France FRA 16 26 22 64 6 Netherlands NLD 15 7 12 34 7 United Kingdom GBR 14 22 29 65 8 South Korea KOR 13 9 10 32 9 Italy ITA 12 13 15 40 10 Germany DEU 12 13 8 33 # ℹ 82 more rows We know from last time that accounting for the number of available medals is important. There were 71 more medals available at Paris 2024 than Rio 2016 and we need to reflect that in the data! Show the code total_gold <- sum(parisGold) total_silver <- sum(parisSilver) total_bronze <- sum(parisBronze) total <- sum(parisTotal) paris_perc <- paris |> mutate(perc_Gold = (Gold/total_gold)*100, perc_Silver = (Silver/total_silver)*100, perc_Bronze = (Bronze/total_bronze)*100, perc_Total = (Total/total)*100) |> select(country, ISO3c, perc_Gold:perc_Total) paris_perc # A tibble: 92 × 6 country ISO3c perc_Gold perc_Silver perc_Bronze perc_Total <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 United States USA 12.2 13.3 10.9 12.1 2 China CHN 12.2 8.18 6.23 8.72 3 Japan JPN 6.08 3.64 3.38 4.31 4 Australia AUS 5.47 5.76 4.16 5.08 5 France FRA 4.86 7.88 5.71 6.13 6 Netherlands NLD 4.56 2.12 3.12 3.26 7 United Kingdom GBR 4.26 6.67 7.53 6.23 8 South Korea KOR 3.95 2.73 2.60 3.07 9 Italy ITA 3.65 3.94 3.90 3.83 10 Germany DEU 3.65 3.94 2.08 3.16 # ℹ 82 more rows ## Predicted performance We also know that medal performance is strongly correlated to a countries GDP. To assess current performance, we can model the relationship between GDP and Olympic results using data from previous years (1996 - 2016) and use it to predict outcomes for 2024. The difference between predicted and actual scores allows us to identify those countries that performed better (or worse) than expected. The first sticking point we encounter is that Olympic performance is actually three variables: Gold, Silver, and Bronze medals. To model things more easily, we want to convert these into a single score. To make sure this single score reflects the difference in importance between different medal types we can use a weighted mean. Here we’ll consider gold medals to be 3x more valuable than silver, and silver as 3x more valuable than bronze. \((Gold*\frac{9}{13} + Silver*\frac{3}{13} + Bronze*\frac{1}{13})$$

historical_data_score <- historical_data |>
filter(Year >= 1996) |>
mutate(score = lay::lay(pick(c(perc_Gold, perc_Silver, perc_Bronze)),