Dr. Liam D. Bailey

👴 Old post

This post was created before the release of R v4.1.0. Some code might be outdated.

The dreaded table

I dread encountering a data table in an academic paper. A jumble of numbers, column sub-headers, and confusing footnotes, seemingly bereft of purpose. Why, I cry, is this not in the supplementary material?! Surely, I think, they could’ve created a graph! Despite my prejudice, the fact is that we often do need to use tables to present data. While a plot is a beautiful way to show trends or patterns, it falls short if your goal is to compare individual values. This year saw the release of the gt package, a new addition to the growing number of packages that can produce publication quality tables in R. It seemed like as good a time as any to overcome my dread of tables. Here is my attempt.

The data

As a test case I decided to work with data on greenhouse gas emissions from aviation travel, recently published in Our World in Data. Hannah Ritchie does a great job explaining the complexity of calculating per capita aviation emissions and I would highly recommend giving her article a read. For now, though, we’ll side-step this complexity and focus on the simplest case, emissions from domestic travel. The map from Hannah’s post clearly shows the range and disparity in emissions across the globe, but if we want to compare individual values between countries, or maybe within continents, such a map isn’t the best option. How about a table? While the original post includes a basic table to compare emissions, I wanted to see whether I could create a table of my own using only R. A table that I might be happy to encounter in a publication.

The basics: Clean the data and add some context

We’ll need dplyr to tidy some of the data and gt to build our tables. scales will be used for creating colour palettes.

library(dplyr)
library(scales)
library(gt)
packageVersion("gt")

[1] '0.8.0'

gt integrates well into the existing tidyverse, and creating a gt table is as simple as two lines of code.

Note

For this example we’re just showing the worst emitters.

#Load data and arrange in descending order of emissions
emissions_data <- read.csv(here::here("posts/2020-11-27-making-beautiful-tables-with-gt/assets/per-capita-co2-domestic-aviation.csv")) %>% 
  arrange(desc(Per.capita.domestic.aviation.CO2))

#Generate a gt table from head of data
head(emissions_data) %>% 
  gt()

Entity	Code	Per.capita.domestic.aviation.CO2
United States	USA	0.3855204
Australia	AUS	0.2671672
Norway	NOR	0.2092302
New Zealand	NZL	0.1741885
Canada	CAN	0.1682674
Japan	JPN	0.0739585

A table, yes, but not one you’re likely to publish! To take our first steps towards table perfection we need to tidy up the data and add a title and data source. The reader needs to understand what’s going on. While much of the tidying could be completed using dplyr before converting to a table I’ll demonstrate how this could be achieved exclusively inside gt.

Coding tip

Notice how we select columns using the generic c(). For larger, more complex data tidy-select functions like starts_with() or contains() may come in handy.

(emissions_table <- head(emissions_data) %>% 
   gt() %>% 
   #Hide unwanted columns
   cols_hide(columns = c(Code)) %>% 
   #Rename columns
   cols_label(Entity = "Country",
              Per.capita.domestic.aviation.CO2 = "Per capita emissions (tonnes)") %>% 
   #Add a table title
   #Notice the `md` function allows us to write the title using markdown syntax (which allows HTML)
   tab_header(title = md("Comparison of per capita CO<sub>2</sub> emissions from domestic aviation (2018)")) %>% 
   #Add a data source footnote
   tab_source_note(source_note = "Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]"))

Country	Per capita emissions (tonnes)
Comparison of per capita CO₂ emissions from domestic aviation (2018)
United States	0.3855204
Australia	0.2671672
Norway	0.2092302
New Zealand	0.1741885
Canada	0.1682674
Japan	0.0739585
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]

A few lines of code and the table is already much better. To me there is still one issues that needs to be fixed before we’ve finished the basics. While it’s technically correct to report emissions in tonnes I feel the data would be much more suitable in kilograms. For this we’ll use fmt_number().

Coding tip

There are fmt_xxx() functions for many different data types, including fmt_currency() and fmt_date().

(emissions_table <- emissions_table %>% 
   #Format numeric column. Use `scale_by` to divide by 1,000. (Note: we'll need to rename the column again)
   fmt_number(columns = c(Per.capita.domestic.aviation.CO2),
              scale_by = 1000) %>%
   #Our second call to cols_label overwrites our first
   cols_label(Per.capita.domestic.aviation.CO2 = "Per capita emissions (kg)"))

Country	Per capita emissions (kg)
Comparison of per capita CO₂ emissions from domestic aviation (2018)
United States	385.52
Australia	267.17
Norway	209.23
New Zealand	174.19
Canada	168.27
Japan	73.96
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]

The touchup: Adding some style and colour

We’ve got a working table, but it won’t be winning any points for style. Our next step is to change the style of different cells to help the reader more clearly find the information they’re after. For this we’ll use the tab_style() function. The style choices here follow some of the ‘Ten Guidelines for Better Tables’ from Jon Schwabish.

Firstly, we need to more clearly distinguish between the column headers and the body of the table (and while we’re at it, also the title!)

(emissions_table <- emissions_table %>% 
   #Apply new style to all column headers
   tab_style(
     locations = cells_column_labels(columns = everything()),
     style     = list(
       #Give a thick border below
       cell_borders(sides = "bottom", weight = px(3)),
       #Make text bold
       cell_text(weight = "bold")
     )
   ) %>% 
   #Apply different style to the title
   tab_style(
     locations = cells_title(groups = "title"),
     style     = list(
       cell_text(weight = "bold", size = 24)
     )
   ))

Country	Per capita emissions (kg)
Comparison of per capita CO₂ emissions from domestic aviation (2018)
United States	385.52
Australia	267.17
Norway	209.23
New Zealand	174.19
Canada	168.27
Japan	73.96
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]

As our reader is interested in comparing emissions values between countries, we can add a heatmap to our cells to more clearly show the differences. This will require us to set a colour palette and apply a conditional colouring using data_color(). We’ll use the same palette employed in Hannah Ritchie’s map above.

#Apply our palette explicitly across the full range of values so that the top countries are coloured correctly
min_CO2 <- min(emissions_data$Per.capita.domestic.aviation.CO2)
max_CO2 <- max(emissions_data$Per.capita.domestic.aviation.CO2)
emissions_palette <- col_numeric(c("#FEF0D9", "#990000"), domain = c(min_CO2, max_CO2), alpha = 0.75)

(emissions_table <- emissions_table %>% 
    data_color(columns = c(Per.capita.domestic.aviation.CO2),
               colors = emissions_palette))

Country	Per capita emissions (kg)
Comparison of per capita CO₂ emissions from domestic aviation (2018)
United States	385.52
Australia	267.17
Norway	209.23
New Zealand	174.19
Canada	168.27
Japan	73.96
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]

The finer details: Table options

We’ve added some colour and style, but if we really want to customise our table and tweak the fine details we need to start using the opt_xxx() and tab_options() functions. The opt_xxx() functions adjust specific elements of the table, while tab_options() is similar to the theme() function used with ggplot2. Below, we’ll adjust the options to resemble tables from fivethirtyeight (working from Thomas Mock’s great blog).

(emissions_table <- emissions_table %>% 
   #All column headers are capitalised
   opt_all_caps() %>% 
   #Use the Chivo font
   #Note the great 'google_font' function in 'gt' that removes the need to pre-load fonts
   opt_table_font(
     font = list(
       google_font("Chivo"),
       default_fonts()
     )
   ) %>%
   #Change the width of columns
   cols_width(c(Per.capita.domestic.aviation.CO2) ~ px(150),
              c(Entity) ~ px(400)) %>% 
   tab_options(
     #Remove border between column headers and title
     column_labels.border.top.width = px(3),
     column_labels.border.top.color = "transparent",
     #Remove border around table
     table.border.top.color = "transparent",
     table.border.bottom.color = "transparent",
     #Reduce the height of rows
     data_row.padding = px(3),
     #Adjust font sizes and alignment
     source_notes.font.size = 12,
     heading.align = "left"
   ))

Country	Per capita emissions (kg)
Comparison of per capita CO₂ emissions from domestic aviation (2018)
United States	385.52
Australia	267.17
Norway	209.23
New Zealand	174.19
Canada	168.27
Japan	73.96
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]

Bonus round 1: Adding images

I’m already really happy with this table. It clearly allows a comparison of exact emissions between countries, and the colour provides additional assistance to the reader. We could leave it here, but for a little bonus let’s add country flags to our plot. We can use flags sources from wikicommons (a database for with URL locations is available through data.world).

To link this flag data base to our emissions data we need to translate country names into 3-letter country codes. We can do this using the countrycode package. The code for this is not really relevant for using gt but you can see it below if you’re interested. The end goal is to create a column containing the URL of a flag image for each country.

Details…

#To convert country codes
library(countrycode)

flag_db <- read.csv("assets/Country_Flags.csv") %>% 
  #Convert country names into 3-letter country codes
  mutate(Code = countrycode(sourcevar = Country, origin = "country.name", destination = "iso3c", warn = FALSE)) %>% 
  select(Code, flag_URL = ImageURL)

flag_data <- emissions_data %>% 
  left_join(flag_db, by = "Code") %>% 
  select(flag_URL, Entity, everything())

#We'll need to refit our table using this new data
#Code below with comments removed.
emissions_table <- head(flag_data) %>% 
  gt() %>% 
  cols_hide(columns = c(Code)) %>% 
  cols_label(Entity = "Country",
             Per.capita.domestic.aviation.CO2 = "Per capita emissions (tonnes)") %>% 
  tab_header(title = md("Comparison of per capita CO<sub>2</sub> emissions from domestic aviation (2018)")) %>% 
  tab_source_note(source_note = "Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]") %>% 
  fmt_number(columns = c(Per.capita.domestic.aviation.CO2),
             scale_by = 1000) %>%
  cols_label(Per.capita.domestic.aviation.CO2 = "Per capita emissions (kg)") %>% 
  tab_style(
     locations = cells_column_labels(columns = everything()),
     style     = list(
       cell_borders(sides = "bottom", weight = px(3)),
       cell_text(weight = "bold")
     )
   ) %>% 
   tab_style(
     locations = cells_title(groups = "title"),
     style     = list(
       cell_text(weight = "bold", size = 24)
     )
   ) %>% 
  data_color(columns = c(Per.capita.domestic.aviation.CO2),
             colors = emissions_palette) %>% 
  opt_all_caps() %>% 
  opt_table_font(
    font = list(
      google_font("Chivo"),
      default_fonts()
    )
  ) %>%
  cols_width(c(Per.capita.domestic.aviation.CO2) ~ px(150),
             c(Entity) ~ px(400)) %>% 
  tab_options(
    column_labels.border.top.width = px(3),
    column_labels.border.top.color = "transparent",
    table.border.top.color = "transparent",
    table.border.bottom.color = "transparent",
    data_row.padding = px(3),
    source_notes.font.size = 12,
    heading.align = "left")

head(flag_data)

                                                                                   flag_URL
1              https://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
2 https://upload.wikimedia.org/wikipedia/commons/8/88/Flag_of_Australia_%28converted%29.svg
3                    https://upload.wikimedia.org/wikipedia/commons/d/d9/Flag_of_Norway.svg
4               https://upload.wikimedia.org/wikipedia/commons/3/3e/Flag_of_New_Zealand.svg
5                         https://upload.wikimedia.org/wikipedia/en/c/cf/Flag_of_Canada.svg
6                          https://upload.wikimedia.org/wikipedia/en/9/9e/Flag_of_Japan.svg
         Entity Code Per.capita.domestic.aviation.CO2
1 United States  USA                        0.3855204
2     Australia  AUS                        0.2671672
3        Norway  NOR                        0.2092302
4   New Zealand  NZL                        0.1741885
5        Canada  CAN                        0.1682674
6         Japan  JPN                        0.0739585

We can now use the text_transform() function to add our country flags. text_transform() allows us to apply any custom function to a column. We can use the web_image function in gt to convert a URL to an embedded image.

emissions_table %>% 
  gt::text_transform(
    #Apply a function to a column
    locations = cells_body(c(flag_URL)),
    fn = function(x) {
      #Return an image of set dimensions
      web_image(
        url = x,
        height = 12
      )
    }
  ) %>% 
  #Hide column header flag_URL and reduce width
  cols_width(c(flag_URL) ~ px(30)) %>% 
  cols_label(flag_URL = "")

	Country	Per capita emissions (kg)
Comparison of per capita CO₂ emissions from domestic aviation (2018)
	United States	385.52
	Australia	267.17
	Norway	209.23
	New Zealand	174.19
	Canada	168.27
	Japan	73.96
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]

Bonus round 2: Within-group comparison

It’s pretty clear by now that the US and Australia are the worst for domestic aviation emissions, but what if we wanted to compare within continents. Which country has the highest emissions within Africa or S. America? For this, we can use the row grouping functionality in gt.

Before we can do this, we’ll need to assign each country to its corresponding continent. As above, this isn’t really gt relevant, but the code is included below if you’re interested.

Details…

continent_data <- flag_data %>% 
  #Convert iso3 codes to FIPS
  mutate(continent = countrycode(sourcevar = Code, origin = "iso3c", destination = "continent", warn = FALSE)) %>% 
  select(continent, flag_URL, Entity, Per.capita.domestic.aviation.CO2)

head(continent_data)

  continent
1  Americas
2   Oceania
3    Europe
4   Oceania
5  Americas
6      Asia
                                                                                   flag_URL
1              https://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
2 https://upload.wikimedia.org/wikipedia/commons/8/88/Flag_of_Australia_%28converted%29.svg
3                    https://upload.wikimedia.org/wikipedia/commons/d/d9/Flag_of_Norway.svg
4               https://upload.wikimedia.org/wikipedia/commons/3/3e/Flag_of_New_Zealand.svg
5                         https://upload.wikimedia.org/wikipedia/en/c/cf/Flag_of_Canada.svg
6                          https://upload.wikimedia.org/wikipedia/en/9/9e/Flag_of_Japan.svg
         Entity Per.capita.domestic.aviation.CO2
1 United States                        0.3855204
2     Australia                        0.2671672
3        Norway                        0.2092302
4   New Zealand                        0.1741885
5        Canada                        0.1682674
6         Japan                        0.0739585

In this case, we need to start from the beginning apply grouping to our table before we begin.

(emissions_table_continent <- continent_data %>%
  #Just take the top 5 from each continent for our example
  group_by(continent) %>% 
  slice(1:5) %>% 
  #Just show Africa and Americas for our example
  filter(continent %in% c("Africa", "Americas")) %>%
  #Group data by continent
  gt(groupname_col = "continent") %>% 
  #Add flag images as before
  gt::text_transform(
    locations = cells_body(c(flag_URL)),
    fn = function(x) {
      web_image(
        url = x,
        height = 12
      )
    }
  ) %>% 
  cols_width(c(flag_URL) ~ px(30)) %>% 
  cols_label(flag_URL = "") %>% 
  #Original changes as above.
  cols_label(Entity = "Country",
             Per.capita.domestic.aviation.CO2 = "Per capita emissions (tonnes)") %>% 
  tab_header(title = md("Comparison of per capita CO<sub>2</sub> emissions from domestic aviation (2018)")) %>% 
  tab_source_note(source_note = "Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]") %>% 
  fmt_number(columns = c(Per.capita.domestic.aviation.CO2),
             scale_by = 1000) %>%
  cols_label(Per.capita.domestic.aviation.CO2 = "Per capita emissions (kg)") %>% 
  tab_style(
     locations = cells_column_labels(columns = everything()),
     style     = list(
       cell_borders(sides = "bottom", weight = px(3)),
       cell_text(weight = "bold")
     )
   ) %>% 
   tab_style(
     locations = cells_title(groups = "title"),
     style     = list(
       cell_text(weight = "bold", size = 24)
     )
   ) %>% 
  data_color(columns = c(Per.capita.domestic.aviation.CO2),
             colors = emissions_palette) %>% 
  opt_all_caps() %>% 
  opt_table_font(
    font = list(
      google_font("Chivo"),
      default_fonts()
    )
  ) %>%
  cols_width(c(Per.capita.domestic.aviation.CO2) ~ px(150),
             c(Entity) ~ px(400)) %>% 
  tab_options(
    column_labels.border.top.width = px(3),
    column_labels.border.top.color = "transparent",
    table.border.top.color = "transparent",
    table.border.bottom.color = "transparent",
    data_row.padding = px(3),
    source_notes.font.size = 12,
    heading.align = "left",
    #Adjust grouped rows to make them stand out
    row_group.background.color = "grey"))

	Country	Per capita emissions (kg)
Comparison of per capita CO₂ emissions from domestic aviation (2018)
Africa
	South Africa	25.09
	Namibia	11.49
	Mauritius	8.75
	Kenya	7.10
	Algeria	4.43
Americas
	United States	385.52
	Canada	168.27
	Chile	70.46
	Brazil	42.86
	Mexico	39.93
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]

The Conclusion: Tables can be nice!

A package like gt really unlocks the power of a table to display data. Using a combination of style and colour it’s easy to create a clear output that allows a reader to compare individual values. More over, we have the potential to easily include neat additions such as embedded images or graphs. The one draw back of gt is that it only generates static tables, to build interactive tables with filters or tabs we will need to move into other packages like reactable. Still, it’s a fun tool to add to the data visualisation toolbox of anybody working in R.