Getting started with keras in R

# HELLO KERAS WORLD_ library( tensorflow )_ library( keras )(2).png

The RStudio produced package for keras was the subject of the keynote at this year's rstudio-conf.

Keras is a python API designed to provide a higher level interface to the neural network backends Tensorflow, CNTK and Theano. The most common of these is Tensorflow and that is the one we will use as well. In this post I'll outline the steps I took to get some hello world models up and running in R and also some of the nice additional things you can do with the new R packages.

I won't go into too much detail about the background of keras, mainly because the existing documentation is very good:

I also won't go into too much detail about the nuts and bolts of deep learning because there are some fantastic resources that are only a google away. If you want a great introduction, go and watch Andrew Ng's youtube course (don't worry, we'll wait for you).

The set up

You need a machine with the tensorflow backend and you will also need python because the R package accesses the underlying python libraries (through a great little intermediate package called reticulate(

You can use a CPU for computations, but if you have a CUDA enabled GPU it's worth installing the GPU variant of tensorflow because it is much faster.

Get the R libraries

The first thing you need to do is install the tensorflow R package. You can then use this to manage the install of the tensorflow back-end.


I'd recommend the development version of keras for now because there are some fixes to line up the R package with the latest version of the keras python packages that it relies on.


The tfruns package is also very handy for recording and comparing model runs (we'll get to that in the next post).

CPU setup

The CPU setup is straightforward and can be done in R:


GPU requirements

My current set up is a NVIDA GTX1080 machine powered by ubuntu 16.04. It takes some care to ensure all your library and driver versions match. At time of writing I would recommend CUDA toolkit v9.0, cuDNN (the deep neural network library) 7.0 and tensorflow 1.5.

Then you have a one-line install in R

install_tensorflow(tensorflow = "gpu")

Why am I so sure about that recommendation? Because I installed CUDA toolkit version 9.1 which is not supported by the prebuilt tensorflow binaries, so I lost an hour or so compiling tensorflow from source. Everything worked fine after that though.

If you do run into trouble with configurations (or python version conflicts) you can easily check your configuration with


The fun stuff

We will start with the MNIST handwriting dataset, because what the internet needs right now is another MNIST example, amiright?

First, get the MNIST data (it's included with the package)

library(imager)  # very helpful image package

mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y 

x_test <- mnist$test$x
y_test <- mnist$test$y


The training set is an array class and contains 60000 28 x 28 image arrays. Here's a sample:

By Josef Steppan via Wikimedia Commons

By Josef Steppan via Wikimedia Commons

Step one - deep neural net

Tensorflow requires the data with dimensions [ No_images rows X 784 cols] (i.e. each row contains a vector of image pixels).

We also scale the data so each pixel value lies between 0 and 1.

x_train <- array_reshape(x_train, c(nrow(x_train), 784)) 
x_test <- array_reshape(x_test, c(nrow(x_test), 784)) 
x_train <- x_train / 255
x_test <- x_test / 255


We need to make the response data categorical. The function keras::to_categorical Converts a vector of integers to binary data in a matrix. In this case we convert y_train from integer classes between 0 and 9 to a binary matrix (0's and 1's in an [NIMG x 10] dimensional array).

Note: this step is necessary because we are using the categorical_crossentropy loss function in the model below. This note explains it ( :

y_train <- to_categorical(y_train, 10) 
y_test <- to_categorical(y_test, 10)

About the models

Keras/TF use symbolic computation so we need to set up the model graph first. The keras libary has a bunch of different layer types you can use.

The model itself will probably be initialised using keras::keras_model_sequential which is just a linear stack of layers. The multi_gpu_model function allows you to initalise the model over multiple gpus (but you probably already guessed that).

Useful layer types:

  • layer_dense() A fully/densely connected layer. Useful settings are
    • activation,
    • use_bias,
    • kernel_initializer,
    • weights (allows you to initialise the kernel weights and layer weights),
    • kernel_regularizer,
    • units (dimensions of the output space).
  • layer_activation(activation = "name") adds an activition layer (or if it's one of the standard ones you can do it in the layer_dense or layer_conv_X calls). See for list of basics.
  • layer_activation_leaky_relu adds a leaky relu activation specifically.
  • layer_dropout() adds a dropout. rate is number between 0 and 1 that specifies the fraction to drop out.
  • layer_flatten() flattens the layer (e.g. going from convolution cycle to fully connected layers).

There are also a bunch of options for convolutions and for pooling layers. layer_conv_2d and layer_max_pooling_2d are two of the most common ones.

First define a model

Let's define a really simple model, with one dense (fully connected) layer, a dropout layer and a final softmax

model <- keras_model_sequential() 
model %>%
  layer_dense(units = 256, activation = 'relu',  input_shape = c(784)) %>%
  layer_dropout(rate = 0.4) %>% layer_dense(units = 128, activation = 'relu') %>% 
  layer_dense(units = 10, activation = 'softmax')

One nice feature of the keras package is the model detail you get when you run summary - very handy to enable quick and dirty calculations to see if your next model run is going to blow smoke out of the GPU...

Screen Shot 2018-03-09 at 4.03.50 pm.png

Compile the model

The compilation stage is where you define the hyperparameters, .e.g.

We are using categorical_crossentropy loss function (link).

# compile (define loss and optimizer)
model %>% keras::compile(
  loss = 'categorical_crossentropy', 
  optimizer = optimizer_rmsprop(), 
  metrics = c('accuracy')

Fit the model and test

Finally, at runtime we set the number of epochs, the batch size and validation split (what percentage of the training data will be kept aside for validation).

# train (fit)
model %>% fit(
  epochs = 15, 
  batch_size = 128, 
  validation_split = 0.2,
  verbose = 1

Another nice feature of the keras library (as opposed to, say the mxnet library) is tracking the loss function and accuracy as the model is training (see the clip above)

By the end of the training run our model accuracy on the training set is around 98.51% on the training set and 97.95% on the validation set. It's possible to get a higher accuracy than this, but I'll leave that exploration up to you readers (feel free to share your experiments in the comments).

With 98% accuracy our model is fairly well optimised (for a first go). But having an optimised model (i.e. a good performance on the training set) is not the same as having a generalised model. We need to ensure we have not overfit the model by testing the prediction accuracy against our test set.

This is also straightforward:

model %>% 
  evaluate(x_test, y_test)

We get a prediction accuracy of 98.06% on the test set. Again, this is pretty good for a hello world.

What's next?

At Symbolix we will be using more of R/keras for our machine and deep learning analysis (we are quite impressed). I am hoping to write more of these walkthroughs - what would you like to see next?