It’s been talked about for awhile, but the next big changes to R, v.4.1.0, are here! For many R users, new versions don’t make a huge amount of difference to their day to day work. I often find people still happpily running R v3.x! This time around, however, v4.1.0 includes some major (and interesting) changes to the base R syntax, which might affect (improve?) people’s workflow. We’ll cover the most talked about changes below. Put on your party hat and lets get into it.
The native pipe
Ever since the rise of the tidyverse and magrittr I’ve noticed a worrying divergence in how people write R code. New users, or those that jumped on the tidyverse bandwagon, tended to write long piped code using dplyr functions. Older users, who learned R before the rise of the tidyverse, tended to use nested base functions to achieve the same result. Having a variety of tools to approach the same problem isn’t inherently bad, but it seemed like we were heading towards a world where the two groups were almost using different, incomprehensible, programming languages. Enter the native pipe: |>. With v4.1.0 we now have an inbuilt pipe included in base R syntax. With |> you can build piped code without ever needing to load a tidyverse library!
This is an awesome change for the base R syntax. I guess we can all forget about magrittr and move to using the native pipe, right? Well, not so fast. While the native pipe is a huge improvement, there are still a few features that might keep you using the more familiar %>% (at least for now).
Flexible placement
magrittr pipe allows the left-hand side to be used at any point in the following function with the placement of the .. In comparison, native pipe requires the left-hand side to always fill the first argument on the right.
We can achieve the same thing in base R by writing a nested function, but it’s definitely not as effective and defeats some of the benefits of writing piped code (though see the changes to anonymous functions below.
A less well known, but sometimes useful, feature of the magrittr pipe is the ability to quickly turn a piece of piped code into a function. This functionality isn’t available with the native pipe…although that might not be a deal breaker for many people.
#Create a function to run same pipereplace_underscore <- . %>%gsub(pattern ="_", x = ., replacement =" ")replace_underscore(names)
[1] "John Doe" "Jeff Jones" "Rachel Black"
#The slightly messier base approachreplace_underscore <-function(x){ x |> {function(x) gsub(pattern ="_", x = x, replacement =" ")}()}replace_underscore(names)
[1] "John Doe" "Jeff Jones" "Rachel Black"
The rest of the magrittr package
Until now, we’ve comparing %>% and |>, but if you’re actually loading the magrittr package there are a few other special pipes that are also powerful. One particularly useful one is the tee-pipe, %T>%, which allows you to create an output in the middle of your pipe (e.g. plot or view your data) without needing to stop the pipe.
#Return the sum of two normally distributed variablescol_sums <-data.frame(x =rnorm(10), y =rnorm(10)) %T>%#View the data to make sure it all looks fineprint() %>%#Return the column sumscolSums()
Let’s not pretend that %>% has things all its own way. One major benefit of the native pipe is that it means you can use pipe with no dependencies. Working with the tidyverse can often mean working with quite a few dependencies, and there’s no guarantee that updates will be backward compatible. If you’re building an R package, even just for personal use, having the option of writing piped code without needing additional package dependencies might be exactly what you’re looking for.
Unlike our pipe comparison, the new base anonymous functions have a number of benefits over tidyverse equivalents.
Use anywhere in R
The ~ syntax is useful within tidyverse but can’t easily be used outside of this context. \(x), on the other hand, can be used to write functions anywhere you want.
So, while I might be holding off on using native pipe, I think the new base R anonymous functions will quickly become part of my coding workflow!
Concatenating factors
One smaller change that might be useful to regular R users is the new ability to concatenate factors together. Previously, trying to concatenate different factors would coerce all levels into their underlying integers (how the data are actually stored within R). With v4.1.0, factors can be concatenated and stay as factors, combining together all the levels from the two original factors.
[1] Apple Orange Apple Banana Strawberry
Levels: Apple Orange Banana Strawberry
Note
The order of the factor levels in the new factor are dependent on the order in which the two factors are concatenated together.
#Levels start with Applec(factor1, factor2)
[1] Apple Orange Apple Banana Strawberry
Levels: Apple Orange Banana Strawberry
#Levels start with Bananac(factor2, factor1)
[1] Banana Strawberry Apple Orange Apple
Levels: Banana Strawberry Apple Orange
Wrap up
R v4.1.0 marks an exciting step in R development. The introduction of native pipe and easy anonymous functions will allow users to take advantage of these useful tools even if they’re more comfortable in base R than the tidyverse. While I might not be switching over to using native pipe just yet, I’m really excited to see where this leads in future updates!