# Getting started with keras in R

The RStudio produced package for keras was the subject of the keynote at this year's rstudio-conf.

Keras is a python API designed to provide a higher level interface to the neural network backends Tensorflow, CNTK and Theano. The most common of these is Tensorflow and that is the one we will use as well. In this post I'll outline the steps I took to get some hello world models up and running in R and also some of the nice additional things you can do with the new R packages.

I won't go into too much detail about the background of keras, mainly because the existing documentation is very good:

- RStudio / keras reference: https://tensorflow.rstudio.com
- Keras documentation (for more details on options and explanations): https://keras.io/

I also won't go into too much detail about the nuts and bolts of deep learning because there are some fantastic resources that are only a google away. If you want a great introduction, go and watch Andrew Ng's youtube course (don't worry, we'll wait for you).

# The set up

You need a machine with the tensorflow backend and you will also need python because the R package accesses the underlying python libraries (through a great little intermediate package called `reticulate`

(https://github.com/rstudio/reticulate)

You can use a CPU for computations, but if you have a CUDA enabled GPU it's worth installing the GPU variant of tensorflow because it is much faster.

## Get the R libraries

The first thing you need to do is install the tensorflow R package. You can then use this to manage the install of the tensorflow back-end.

```
install.packages("tensorflow")
library(tensorflow)
```

I'd recommend the development version of `keras`

for now because there are some fixes to line up the R package with the latest version of the keras python packages that it relies on.

`devtools::install_github("rstudio/keras")`

The `tfruns`

package is also very handy for recording and comparing model runs (we'll get to that in the next post).

## CPU setup

The CPU setup is straightforward and can be done in R:

`install_tensorflow()`

## GPU requirements

My current set up is a NVIDA GTX1080 machine powered by ubuntu 16.04. It takes some care to ensure all your library and driver versions match. At time of writing I would recommend CUDA toolkit v9.0, cuDNN (the deep neural network library) 7.0 and tensorflow 1.5.

Then you have a one-line install in R

`install_tensorflow(tensorflow = "gpu")`

Why am I so sure about that recommendation? Because I installed CUDA toolkit version **9.1** which is not supported by the prebuilt tensorflow binaries, so I lost an hour or so compiling tensorflow from source. Everything worked fine after that though.

If you do run into trouble with configurations (or python version conflicts) you can easily check your configuration with

```
reticulate::py_discover_config("tensorflow")
reticulate::py_discover_config("keras")
```

# The fun stuff

We will start with the MNIST handwriting dataset, because what the internet needs right now is another MNIST example, amiright?

First, get the MNIST data (it's included with the package)

```
library(keras)
library(tensorflow)
library(imager) # very helpful image package
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
dim(x_train)
dim(y_train)
```

The training set is an array class and contains 60000 28 x 28 image arrays. Here's a sample:

# Step one - deep neural net

Tensorflow requires the data with dimensions [ `No_images`

rows X 784 cols] (i.e. each row contains a vector of image pixels).

We also scale the data so each pixel value lies between 0 and 1.

```
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
x_train <- x_train / 255
x_test <- x_test / 255
dim(x_train)
```

We need to make the response data categorical. The function `keras::to_categorical`

Converts a vector of integers to binary data in a matrix. In this case we convert `y_train`

from integer classes between 0 and 9 to a binary matrix (0's and 1's in an [NIMG x 10] dimensional array).

Note:this step is necessary because we are using the`categorical_crossentropy`

loss function in the model below. This note explains it (https://keras.io/losses/#usage-of-loss-functions) :

```
head(y_train)
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
head(y_train)
```

## About the models

Keras/TF use symbolic computation so we need to set up the model graph first. The `keras`

libary has a bunch of different layer types you can use.

The model itself will probably be initialised using `keras::keras_model_sequential`

which is just a linear stack of layers. The `multi_gpu_model`

function allows you to initalise the model over multiple gpus (but you probably already guessed that).

Useful layer types:

`layer_dense()`

A fully/densely connected layer. Useful settings are`activation`

,`use_bias`

,`kernel_initializer`

,`weights`

(allows you to initialise the kernel weights and layer weights),`kernel_regularizer`

,`units`

(dimensions of the output space).

`layer_activation(activation = "name")`

adds an activition layer (or if it's one of the standard ones you can do it in the`layer_dense`

or`layer_conv_X`

calls). See https://keras.io/activations/ for list of basics.`layer_activation_leaky_relu`

adds a leaky relu activation specifically.`layer_dropout()`

adds a dropout.`rate`

is number between 0 and 1 that specifies the fraction to drop out.`layer_flatten()`

flattens the layer (e.g. going from convolution cycle to fully connected layers).

There are also a bunch of options for convolutions and for pooling layers. `layer_conv_2d`

and `layer_max_pooling_2d`

are two of the most common ones.

## First define a model

Let's define a really simple model, with one dense (fully connected) layer, a dropout layer and a final softmax

```
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>% layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax')
summary(model)
```

One nice feature of the keras package is the model detail you get when you run `summary`

- very handy to enable quick and dirty calculations to see if your next model run is going to blow smoke out of the GPU...

## Compile the model

The compilation stage is where you define the hyperparameters, .e.g.

- loss functions - You can use any of the keras library so see https://keras.io/losses/#available-loss-functions
- opimizer - again, see https://keras.io/optimizers/
- Metrics - track ALL THE THINGS: https://keras.io/metrics/

We are using `categorical_crossentropy`

loss function (link).

```
# compile (define loss and optimizer)
model %>% keras::compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
```

## Fit the model and test

Finally, at runtime we set the number of epochs, the batch size and validation split (what percentage of the training data will be kept aside for validation).

```
# train (fit)
model %>% fit(
x_train,
y_train,
epochs = 15,
batch_size = 128,
validation_split = 0.2,
verbose = 1
)
```

Another nice feature of the `keras`

library (as opposed to, say the `mxnet`

library) is tracking the loss function and accuracy as the model is training (see the clip above)

By the end of the training run our model accuracy on the training set is around 98.51% on the training set and 97.95% on the validation set. It's possible to get a higher accuracy than this, but I'll leave that exploration up to you readers (feel free to share your experiments in the comments).

With 98% accuracy our model is fairly well optimised (for a first go). But having an **optimised** model (i.e. a good performance on the training set) is not the same as having a **generalised** model. We need to ensure we have not overfit the model by testing the prediction accuracy against our test set.

This is also straightforward:

```
model %>%
evaluate(x_test, y_test)
```

We get a prediction accuracy of 98.06% on the test set. Again, this is pretty good for a hello world.

# What's next?

At Symbolix we will be using more of R/keras for our machine and deep learning analysis (we are quite impressed). I am hoping to write more of these walkthroughs - what would you like to see next?