`read.delim("./data/Day5.txt", sep = " -> ")`

`Error in scan(file, what = "", sep = sep, quote = quote, nlines = 1, quiet = TRUE, : invalid 'sep' value: must be one byte`

Day 5

R

Author

Liam D. Bailey

Published

December 8, 2021

It’s Day 5 and we’re 20% of the way through the advent challenge! See the explanation for today’s challenge here. Today’s challenges will again involve working with indexing numbers in ~~The Matrix~~ a matrix.

We’re given two sets of 2-dimensional coordinates (X,Y), which represent the start and end of a line. We then need to count the number of points at which at least two lines overlap. The data includes an unusual separator (`->`

), so we’ll use `read_delim()`

from the `readr`

package to read in the data. I find the `readr`

functions more powerful than base functions for reading data because they allow for more complex separators. See the example below with base function `read.delim`

that cannot use a separator larger than 1 byte.

`Error in scan(file, what = "", sep = sep, quote = quote, nlines = 1, quiet = TRUE, : invalid 'sep' value: must be one byte`

```
library(readr)
#Read in data where each set of coordinates is a character string
raw_data <- readr::read_delim("./data/Day5.txt",
delim = " -> ", col_names = c("start", "end"),
show_col_types = FALSE,
col_types = list(start = readr::col_character(),
end = readr::col_character()))
head(raw_data)
```

```
# A tibble: 6 × 2
start end
<chr> <chr>
1 503,977 843,637
2 437,518 437,225
3 269,250 625,250
4 846,751 646,751
5 18,731 402,731
6 749,923 749,986
```

We need to be able to access each of the X and Y values separately, so we’ll separate this data out into 4 columns.

```
library(stringr)
#Convert the characters into a 4col numeric matrix
start_point <- str_split(raw_data$start, pattern = ",", simplify = TRUE)
end_point <- str_split(raw_data$end, pattern = ",", simplify = TRUE)
all_points <- cbind(start_point, end_point)
all_points <- as.numeric(all_points)
dim(all_points) <- c(nrow(raw_data), 4)
head(all_points)
```

```
[,1] [,2] [,3] [,4]
[1,] 503 977 843 637
[2,] 437 518 437 225
[3,] 269 250 625 250
[4,] 846 751 646 751
[5,] 18 731 402 731
[6,] 749 923 749 986
```

For challenge 1 we just focus on horizontal or vertical lines. Just like in our bingo challenge on day 4, we will create an empty matrix (all 0s) on which to map our results. Our matrix is 1000 x 1000 to accomodate all the possible coordinates in our data.

```
zero_mat <- matrix(0, nrow = 1000, ncol = 1000)
#Look at a section of the matrix
zero_mat[1:10, 1:10]
```

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0
[9,] 0 0 0 0 0 0 0 0 0 0
[10,] 0 0 0 0 0 0 0 0 0 0
```

As we’re just focussing on the horizontal or diagonal lines, we filter only those cases where x or y are constant.

Now, adding lines to our matrix is fairly straight forward but we need to deal with two small issues before we do. First, the coordinates we’re given start at 0,0; however, in R indexing begins at 1 (a shocking fact for people familiar with other programming languages!). This means we’ll have to add 1 to all our coordinates so they can be properly used for indexing.

The second hurdle is that our coordinates are provided as X,Y, but to index a matrix in R we need to provide coordinates as Y (i.e. rows), X (i.e. columns). In the for loop below I create a sequence of coordinate to map the horizontal and vertical lines and then reverse their order to be used as our index values.

```
for (i in 1:nrow(nondiag)) {
line <- nondiag[i, ]
#Make sequence of first coordinates (X)
#Add one to use for R indexing
xs <- (line[1]:line[3]) + 1
#Make sequence of second coordinates (Y)
ys <- (line[2]:line[4]) + 1
#Create a matrix of index values, BUT we need to write this as Y,X instead of X,Y
index_values <- cbind(ys, xs)
zero_mat[index_values] <- zero_mat[index_values] + 1
}
```

Now we can get our first answer, the number of points at which at least two lines overlap.

Challenge 2 requires us to include the diagonal lines too. This might seem more complex at first, but we are given the helpful caveat that *all diagonal lines are 45 degrees*, or, in other words, the slope of all lines will be 1. Because of this, our previous approach that creates a sequence of X and Y values increasing by 1 will still be appropriate. Only this time both the X and Y values will change.

```
#Use the same approach to add all diagonal lines to our original matrix
for (i in 1:nrow(diag)) {
line <- diag[i, ]
#Make sequence of first coordinates (X)
#Add one to use for R indexing
xs <- (line[1]:line[3]) + 1
#Make sequence of second coordinates (Y)
ys <- (line[2]:line[4]) + 1
#Create a matrix of index values, BUT we need to write this as Y,X instead of X,Y
index_values <- cbind(ys, xs)
zero_mat[index_values] <- zero_mat[index_values] + 1
}
```

Once we have also added diagonal lines, we can get our second result.

See previous solutions here: