# Code: Simplicity or Speed?

While delving into the details of this StackOverflow question

```
I am trying to generate a vector containing a increasing, reverse series such as
1,2,1,3,2,1,4,3,2,1,5,4,3,2,1.
```

various solutions arose. The simplest (in terms of number of key-strokes) was provided by user Henrik

`rev(sequence(5:1))`

which is indeed a very elegant yet simple solution. However, this wasn't the **fastest** solution, as we will soon see.

As with many programming problems there is often a trade-off between code simplicity and speed. One of the first lessons in R (especially if you are moving from other languages) is that it's better forgo the apparent simplicity of constructs like `for`

loops, for optimised functions like the `apply`

family. On the other hand there are whole libraries (think `dplyr`

and the `tidyverse`

in general) whose primary aims include improving code readability.

With that in mind, let's get back to the StackOverflow example. In addition to Henrik's solution, the ever-present user akrun (with input from others) suggested

`unlist(lapply(1:5, ":", 1))`

which is also a nice solution that requires a few more key strokes, but in practice runs faster.

## The need for speed..

In trying to provide an alternative answer I went back to basics looking for a faster implementation. Coupled with what I've learnt while integrating `C++`

into my googleway package, I came up with a simple `for-loop`

written in `Rcpp`

.

(And, hopefully as the loop is written in `C++`

all the `for-loop-in-R`

haters will be appeased)

```
library(Rcpp)
cppFunction('NumericVector reverseSequence(int maxValue, int vectorLength){
NumericVector out(vectorLength);
int counter = 0;
for(int i = 1; i <= maxValue; i++){
for(int j = i; j > 0; j--){
out[counter] = j;
counter++;
}
}
return out;
}')
maxValue <- 5
reverseSequence(maxValue, sum(1:maxValue))
# [1] 1 2 1 3 2 1 4 3 2 1 5 4 3 2 1
```

`Rcpp`

provides methods that allow you to easily integrate `R`

and `C++`

. And its speed benefit became clear when I benchmarked it against the two `R`

solutions. The looping `C++`

implementation is faster most of the time (median speed of 1037ms, compared with 1900ms (akrun) and 4994ms (henrik)).

```
library(microbenchmark)
maxValue <- 1000
microbenchmark(
henrik = {
rev(sequence(maxValue:1))
},
akrun = {
unlist(lapply(1:maxValue, ":", 1))
},
symbolix = {
reverseSequence(maxValue, sum(1:maxValue))
}
)
# Unit: microseconds
# expr min lq mean median uq max neval
# henrik 3788.987 4567.422 7085.908 4993.793 5689.287 35355.34 100
# akrun 1533.615 1723.819 3302.222 1900.983 2688.463 35944.15 100
# symbolix 502.540 663.786 2818.100 1037.945 1545.540 33808.83 100
```

## Righto, so which one?

Back to the title of this blog; Simplicity or Speed? Well, I can't answer that for you, you'll have to decide whether those extra few seconds are worth the time spent designing a longer piece of code. In this case we have a sequence of 1000 and a difference of just under a second. But if we have a sequence of one million, the impact is much larger.

Because we deal with big data we often look for speed over code simplicity. I like watching my codes tick away for a while, but it wears thin if every test takes an hour (or a day) to complete.

If you are still not sure, you can always refer to the repository of all programmng wisdom, xkcd: