library(dplyr)
library(scales)
library(gt)
packageVersion("gt")
[1] '0.8.0'
gt
packageLiam D. Bailey
November 27, 2020
This post was created before the release of R v4.1.0. Some code might be outdated.
I dread encountering a data table in an academic paper. A jumble of numbers, column sub-headers, and confusing footnotes, seemingly bereft of purpose. Why, I cry, is this not in the supplementary material?! Surely, I think, they could’ve created a graph! Despite my prejudice, the fact is that we often do need to use tables to present data. While a plot is a beautiful way to show trends or patterns, it falls short if your goal is to compare individual values. This year saw the release of the gt
package, a new addition to the growing number of packages that can produce publication quality tables in R. It seemed like as good a time as any to overcome my dread of tables. Here is my attempt.
As a test case I decided to work with data on greenhouse gas emissions from aviation travel, recently published in Our World in Data. Hannah Ritchie does a great job explaining the complexity of calculating per capita aviation emissions and I would highly recommend giving her article a read. For now, though, we’ll side-step this complexity and focus on the simplest case, emissions from domestic travel. The map from Hannah’s post clearly shows the range and disparity in emissions across the globe, but if we want to compare individual values between countries, or maybe within continents, such a map isn’t the best option. How about a table? While the original post includes a basic table to compare emissions, I wanted to see whether I could create a table of my own using only R. A table that I might be happy to encounter in a publication.
We’ll need dplyr
to tidy some of the data and gt
to build our tables. scales
will be used for creating colour palettes.
gt
integrates well into the existing tidyverse
, and creating a gt
table is as simple as two lines of code.
For this example we’re just showing the worst emitters.
#Load data and arrange in descending order of emissions
emissions_data <- read.csv(here::here("posts/2020-11-27-making-beautiful-tables-with-gt/assets/per-capita-co2-domestic-aviation.csv")) %>%
arrange(desc(Per.capita.domestic.aviation.CO2))
#Generate a gt table from head of data
head(emissions_data) %>%
gt()
Entity | Code | Per.capita.domestic.aviation.CO2 |
---|---|---|
United States | USA | 0.3855204 |
Australia | AUS | 0.2671672 |
Norway | NOR | 0.2092302 |
New Zealand | NZL | 0.1741885 |
Canada | CAN | 0.1682674 |
Japan | JPN | 0.0739585 |
A table, yes, but not one you’re likely to publish! To take our first steps towards table perfection we need to tidy up the data and add a title and data source. The reader needs to understand what’s going on. While much of the tidying could be completed using dplyr
before converting to a table I’ll demonstrate how this could be achieved exclusively inside gt
.
Notice how we select columns using the generic c()
. For larger, more complex data tidy-select functions like starts_with()
or contains()
may come in handy.
(emissions_table <- head(emissions_data) %>%
gt() %>%
#Hide unwanted columns
cols_hide(columns = c(Code)) %>%
#Rename columns
cols_label(Entity = "Country",
Per.capita.domestic.aviation.CO2 = "Per capita emissions (tonnes)") %>%
#Add a table title
#Notice the `md` function allows us to write the title using markdown syntax (which allows HTML)
tab_header(title = md("Comparison of per capita CO<sub>2</sub> emissions from domestic aviation (2018)")) %>%
#Add a data source footnote
tab_source_note(source_note = "Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]"))
Comparison of per capita CO2 emissions from domestic aviation (2018) | |
Country | Per capita emissions (tonnes) |
---|---|
United States | 0.3855204 |
Australia | 0.2671672 |
Norway | 0.2092302 |
New Zealand | 0.1741885 |
Canada | 0.1682674 |
Japan | 0.0739585 |
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data] |
A few lines of code and the table is already much better. To me there is still one issues that needs to be fixed before we’ve finished the basics. While it’s technically correct to report emissions in tonnes I feel the data would be much more suitable in kilograms. For this we’ll use fmt_number()
.
There are fmt_xxx()
functions for many different data types, including fmt_currency()
and fmt_date()
.
(emissions_table <- emissions_table %>%
#Format numeric column. Use `scale_by` to divide by 1,000. (Note: we'll need to rename the column again)
fmt_number(columns = c(Per.capita.domestic.aviation.CO2),
scale_by = 1000) %>%
#Our second call to cols_label overwrites our first
cols_label(Per.capita.domestic.aviation.CO2 = "Per capita emissions (kg)"))
Comparison of per capita CO2 emissions from domestic aviation (2018) | |
Country | Per capita emissions (kg) |
---|---|
United States | 385.52 |
Australia | 267.17 |
Norway | 209.23 |
New Zealand | 174.19 |
Canada | 168.27 |
Japan | 73.96 |
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data] |
We’ve got a working table, but it won’t be winning any points for style. Our next step is to change the style of different cells to help the reader more clearly find the information they’re after. For this we’ll use the tab_style()
function. The style choices here follow some of the ‘Ten Guidelines for Better Tables’ from Jon Schwabish.
Firstly, we need to more clearly distinguish between the column headers and the body of the table (and while we’re at it, also the title!)
(emissions_table <- emissions_table %>%
#Apply new style to all column headers
tab_style(
locations = cells_column_labels(columns = everything()),
style = list(
#Give a thick border below
cell_borders(sides = "bottom", weight = px(3)),
#Make text bold
cell_text(weight = "bold")
)
) %>%
#Apply different style to the title
tab_style(
locations = cells_title(groups = "title"),
style = list(
cell_text(weight = "bold", size = 24)
)
))
Comparison of per capita CO2 emissions from domestic aviation (2018) | |
Country | Per capita emissions (kg) |
---|---|
United States | 385.52 |
Australia | 267.17 |
Norway | 209.23 |
New Zealand | 174.19 |
Canada | 168.27 |
Japan | 73.96 |
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data] |
As our reader is interested in comparing emissions values between countries, we can add a heatmap to our cells to more clearly show the differences. This will require us to set a colour palette and apply a conditional colouring using data_color()
. We’ll use the same palette employed in Hannah Ritchie’s map above.
#Apply our palette explicitly across the full range of values so that the top countries are coloured correctly
min_CO2 <- min(emissions_data$Per.capita.domestic.aviation.CO2)
max_CO2 <- max(emissions_data$Per.capita.domestic.aviation.CO2)
emissions_palette <- col_numeric(c("#FEF0D9", "#990000"), domain = c(min_CO2, max_CO2), alpha = 0.75)
(emissions_table <- emissions_table %>%
data_color(columns = c(Per.capita.domestic.aviation.CO2),
colors = emissions_palette))
Comparison of per capita CO2 emissions from domestic aviation (2018) | |
Country | Per capita emissions (kg) |
---|---|
United States | 385.52 |
Australia | 267.17 |
Norway | 209.23 |
New Zealand | 174.19 |
Canada | 168.27 |
Japan | 73.96 |
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data] |
We’ve added some colour and style, but if we really want to customise our table and tweak the fine details we need to start using the opt_xxx()
and tab_options()
functions. The opt_xxx()
functions adjust specific elements of the table, while tab_options()
is similar to the theme()
function used with ggplot2
. Below, we’ll adjust the options to resemble tables from fivethirtyeight (working from Thomas Mock’s great blog).
(emissions_table <- emissions_table %>%
#All column headers are capitalised
opt_all_caps() %>%
#Use the Chivo font
#Note the great 'google_font' function in 'gt' that removes the need to pre-load fonts
opt_table_font(
font = list(
google_font("Chivo"),
default_fonts()
)
) %>%
#Change the width of columns
cols_width(c(Per.capita.domestic.aviation.CO2) ~ px(150),
c(Entity) ~ px(400)) %>%
tab_options(
#Remove border between column headers and title
column_labels.border.top.width = px(3),
column_labels.border.top.color = "transparent",
#Remove border around table
table.border.top.color = "transparent",
table.border.bottom.color = "transparent",
#Reduce the height of rows
data_row.padding = px(3),
#Adjust font sizes and alignment
source_notes.font.size = 12,
heading.align = "left"
))
Comparison of per capita CO2 emissions from domestic aviation (2018) | |
Country | Per capita emissions (kg) |
---|---|
United States | 385.52 |
Australia | 267.17 |
Norway | 209.23 |
New Zealand | 174.19 |
Canada | 168.27 |
Japan | 73.96 |
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data] |
I’m already really happy with this table. It clearly allows a comparison of exact emissions between countries, and the colour provides additional assistance to the reader. We could leave it here, but for a little bonus let’s add country flags to our plot. We can use flags sources from wikicommons (a database for with URL locations is available through data.world).
To link this flag data base to our emissions data we need to translate country names into 3-letter country codes. We can do this using the countrycode
package. The code for this is not really relevant for using gt
but you can see it below if you’re interested. The end goal is to create a column containing the URL of a flag image for each country.
#To convert country codes
library(countrycode)
flag_db <- read.csv("assets/Country_Flags.csv") %>%
#Convert country names into 3-letter country codes
mutate(Code = countrycode(sourcevar = Country, origin = "country.name", destination = "iso3c", warn = FALSE)) %>%
select(Code, flag_URL = ImageURL)
flag_data <- emissions_data %>%
left_join(flag_db, by = "Code") %>%
select(flag_URL, Entity, everything())
#We'll need to refit our table using this new data
#Code below with comments removed.
emissions_table <- head(flag_data) %>%
gt() %>%
cols_hide(columns = c(Code)) %>%
cols_label(Entity = "Country",
Per.capita.domestic.aviation.CO2 = "Per capita emissions (tonnes)") %>%
tab_header(title = md("Comparison of per capita CO<sub>2</sub> emissions from domestic aviation (2018)")) %>%
tab_source_note(source_note = "Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]") %>%
fmt_number(columns = c(Per.capita.domestic.aviation.CO2),
scale_by = 1000) %>%
cols_label(Per.capita.domestic.aviation.CO2 = "Per capita emissions (kg)") %>%
tab_style(
locations = cells_column_labels(columns = everything()),
style = list(
cell_borders(sides = "bottom", weight = px(3)),
cell_text(weight = "bold")
)
) %>%
tab_style(
locations = cells_title(groups = "title"),
style = list(
cell_text(weight = "bold", size = 24)
)
) %>%
data_color(columns = c(Per.capita.domestic.aviation.CO2),
colors = emissions_palette) %>%
opt_all_caps() %>%
opt_table_font(
font = list(
google_font("Chivo"),
default_fonts()
)
) %>%
cols_width(c(Per.capita.domestic.aviation.CO2) ~ px(150),
c(Entity) ~ px(400)) %>%
tab_options(
column_labels.border.top.width = px(3),
column_labels.border.top.color = "transparent",
table.border.top.color = "transparent",
table.border.bottom.color = "transparent",
data_row.padding = px(3),
source_notes.font.size = 12,
heading.align = "left")
flag_URL
1 https://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
2 https://upload.wikimedia.org/wikipedia/commons/8/88/Flag_of_Australia_%28converted%29.svg
3 https://upload.wikimedia.org/wikipedia/commons/d/d9/Flag_of_Norway.svg
4 https://upload.wikimedia.org/wikipedia/commons/3/3e/Flag_of_New_Zealand.svg
5 https://upload.wikimedia.org/wikipedia/en/c/cf/Flag_of_Canada.svg
6 https://upload.wikimedia.org/wikipedia/en/9/9e/Flag_of_Japan.svg
Entity Code Per.capita.domestic.aviation.CO2
1 United States USA 0.3855204
2 Australia AUS 0.2671672
3 Norway NOR 0.2092302
4 New Zealand NZL 0.1741885
5 Canada CAN 0.1682674
6 Japan JPN 0.0739585
We can now use the text_transform()
function to add our country flags. text_transform()
allows us to apply any custom function to a column. We can use the web_image
function in gt
to convert a URL to an embedded image.
emissions_table %>%
gt::text_transform(
#Apply a function to a column
locations = cells_body(c(flag_URL)),
fn = function(x) {
#Return an image of set dimensions
web_image(
url = x,
height = 12
)
}
) %>%
#Hide column header flag_URL and reduce width
cols_width(c(flag_URL) ~ px(30)) %>%
cols_label(flag_URL = "")
Comparison of per capita CO2 emissions from domestic aviation (2018) | ||
Country | Per capita emissions (kg) | |
---|---|---|
United States | 385.52 | |
Australia | 267.17 | |
Norway | 209.23 | |
New Zealand | 174.19 | |
Canada | 168.27 | |
Japan | 73.96 | |
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data] |
It’s pretty clear by now that the US and Australia are the worst for domestic aviation emissions, but what if we wanted to compare within continents. Which country has the highest emissions within Africa or S. America? For this, we can use the row grouping functionality in gt
.
Before we can do this, we’ll need to assign each country to its corresponding continent. As above, this isn’t really gt
relevant, but the code is included below if you’re interested.
continent
1 Americas
2 Oceania
3 Europe
4 Oceania
5 Americas
6 Asia
flag_URL
1 https://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
2 https://upload.wikimedia.org/wikipedia/commons/8/88/Flag_of_Australia_%28converted%29.svg
3 https://upload.wikimedia.org/wikipedia/commons/d/d9/Flag_of_Norway.svg
4 https://upload.wikimedia.org/wikipedia/commons/3/3e/Flag_of_New_Zealand.svg
5 https://upload.wikimedia.org/wikipedia/en/c/cf/Flag_of_Canada.svg
6 https://upload.wikimedia.org/wikipedia/en/9/9e/Flag_of_Japan.svg
Entity Per.capita.domestic.aviation.CO2
1 United States 0.3855204
2 Australia 0.2671672
3 Norway 0.2092302
4 New Zealand 0.1741885
5 Canada 0.1682674
6 Japan 0.0739585
In this case, we need to start from the beginning apply grouping to our table before we begin.
(emissions_table_continent <- continent_data %>%
#Just take the top 5 from each continent for our example
group_by(continent) %>%
slice(1:5) %>%
#Just show Africa and Americas for our example
filter(continent %in% c("Africa", "Americas")) %>%
#Group data by continent
gt(groupname_col = "continent") %>%
#Add flag images as before
gt::text_transform(
locations = cells_body(c(flag_URL)),
fn = function(x) {
web_image(
url = x,
height = 12
)
}
) %>%
cols_width(c(flag_URL) ~ px(30)) %>%
cols_label(flag_URL = "") %>%
#Original changes as above.
cols_label(Entity = "Country",
Per.capita.domestic.aviation.CO2 = "Per capita emissions (tonnes)") %>%
tab_header(title = md("Comparison of per capita CO<sub>2</sub> emissions from domestic aviation (2018)")) %>%
tab_source_note(source_note = "Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data]") %>%
fmt_number(columns = c(Per.capita.domestic.aviation.CO2),
scale_by = 1000) %>%
cols_label(Per.capita.domestic.aviation.CO2 = "Per capita emissions (kg)") %>%
tab_style(
locations = cells_column_labels(columns = everything()),
style = list(
cell_borders(sides = "bottom", weight = px(3)),
cell_text(weight = "bold")
)
) %>%
tab_style(
locations = cells_title(groups = "title"),
style = list(
cell_text(weight = "bold", size = 24)
)
) %>%
data_color(columns = c(Per.capita.domestic.aviation.CO2),
colors = emissions_palette) %>%
opt_all_caps() %>%
opt_table_font(
font = list(
google_font("Chivo"),
default_fonts()
)
) %>%
cols_width(c(Per.capita.domestic.aviation.CO2) ~ px(150),
c(Entity) ~ px(400)) %>%
tab_options(
column_labels.border.top.width = px(3),
column_labels.border.top.color = "transparent",
table.border.top.color = "transparent",
table.border.bottom.color = "transparent",
data_row.padding = px(3),
source_notes.font.size = 12,
heading.align = "left",
#Adjust grouped rows to make them stand out
row_group.background.color = "grey"))
Comparison of per capita CO2 emissions from domestic aviation (2018) | ||
Country | Per capita emissions (kg) | |
---|---|---|
Africa | ||
South Africa | 25.09 | |
Namibia | 11.49 | |
Mauritius | 8.75 | |
Kenya | 7.10 | |
Algeria | 4.43 | |
Americas | ||
United States | 385.52 | |
Canada | 168.27 | |
Chile | 70.46 | |
Brazil | 42.86 | |
Mexico | 39.93 | |
Data: Graver, Zhang, & Rutherford (2019) [via Our World in Data] |
A package like gt
really unlocks the power of a table to display data. Using a combination of style and colour it’s easy to create a clear output that allows a reader to compare individual values. More over, we have the potential to easily include neat additions such as embedded images or graphs. The one draw back of gt
is that it only generates static tables, to build interactive tables with filters or tabs we will need to move into other packages like reactable
. Still, it’s a fun tool to add to the data visualisation toolbox of anybody working in R.