Debugging in R
Debugging in R
Introduction
In this practical we will introduce and practise basic debugging in R and RStudio. We will use the cost-effectiveness Markov model code to demonstrate this.
Open up the script debugging_script.R
and run it by pressing the
source button.
We want to use the ‘broken’ functions for this practical so we will
first source the files ce_markov_debug1.R
and p_matrix_cycle.R
by
running
source("ce_markov_debug1.R")
source("p_matrix_cycle.R")
Make sure this points to the correct folder for you.
debugging_script.R
is the main script from which to run the analysis
which calls our function ce_markov
. You should get the following error
Error in cycle_state_costs[i, ] <- (state_c_matrix[treatment = i, ] %*% :
number of items to replace is not a multiple of replacement length
Lets investigate…
Using debug()
The functions browser()
and debug()
are simple to use and very
powerful at identifying the source of an error.
debug()
works by telling R to enter the debugger at the first line
whenever a given function is called.
debug(ce_markov)
This breakpoint on the first line will remain until you remove it with
undebug(ce_markov)
Sometimes you just want to run the debugger a single time and so instead use
debugonce(ce_markov)
Once you’ve set this either re-soruce debugging_script.R
or just run
the relevant bit
ce_res <- ce_markov(start_pop = c(n_pop, 0, 0),
p_matrix,
state_c_matrix,
trans_c_matrix,
state_q_matrix)
The GUI will show you the ce_markov()
script with the body of the
function highlighted. A green arrow on the left hand side will point at
the first line of the code. This indicated where the debugger has
stopped at and hasn’t executed any of the lines below it.
Execute the next line of code by either pressing n
or Enter
or F10
or clicking on the Next button.
Notice in the Environment pane that the values are for those passed to the function.
If you ‘step through’ the code by pressing Next a few more times you will see the Environment variable being up dated as the highlighted code moved down line by line.
When you get to the for (j in 2:n_cycles)
loop you’ll notice that
pressing Next takes you around every iteration. If you want to fast
forward and leave the loop having run all iterations in one step then
you can “step out” of the loop with Shift+F7.
Which line does the error occur at?
The debug()
function is pretty cool because it can also show you the
insides of other people’s functions.
For example, try
debugonce(mean)
mean(c(1,2,"three"))
This will send you to the mean.default
method
function (x, trim = 0, na.rm = FALSE, ...)
{
if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
}
if (isTRUE(na.rm))
x <- x[!is.na(x)]
if (!is.numeric(trim) || length(trim) != 1L)
stop("'trim' must be numeric of length one")
n <- length(x)
if (trim > 0 && n) {
if (is.complex(x))
stop("trimmed means are not defined for complex data")
if (anyNA(x))
return(NA_real_)
if (trim >= 0.5)
return(stats::median(x, na.rm = FALSE))
lo <- floor(n * trim) + 1
hi <- n + 1 - lo
x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
}
.Internal(mean(x))
}
Have a look inside some other functions in the same way.
Using browser()
Every time you use debug()
it starts at the first line of the
function. Now that we have identified where the error is thrown, or if
you had an idea beforehand anyway, you don’t want to step through all
the code to get to the interesting bit. In this case you can use the
browser()` function.
Simply write browser()
on the line where you want the break point to
be. Include browser()
in a sensible place in ce_markov()
and rerun
the function. Once the break point has been activated you can step
through the code in the same way as before. Don’t forget to re-source
the function to tell R that you have included new code.
Editor break points
A further way that you can include break points is interactively by left
clicking the margin of the Editor pane to the left of the row numbers.
You’ll see that a red dot appears and this is equivalent to the
browser()
function at this line.
You can have multiple break points through out the code and when you
continue running the code after one break point (pressing c
or
Shift+F5) then the code execution stops at the next one.
Try using multiple break points, setting them via the Editor
Stepping in to functions
Run ce_markov()
again with debugonce()
. We may have a suspicion that
p_matrix_cycle
is responsible for something going wrong. We could add
multiple break point as above with one inside of p_matrix_cycle
.
Alternatively, when we get to this line of code in ce_markov()
we can
“step in to” the function pressing s
or Shift+F4. This is like
taking us one layer down into the code.
Try this stepping in to p_matrix_cycle
from ce_markov
. You can
also step in to base R functions.
Conditional break points
As was the case against using debug()
we don’t want to be stepping
through loads of code to get to the point where the error occurs or
where we think that the problem originated in the code.
We can further refine our use of break points by only entering the debugger when a certain condition is met. This is commonly when a loop is at a certain (large number) of iterations.
It will look something like this
if <CONDITIONAL STATEMENT> browser()
Include a conditional breakpoint in the ce_markov()
function for
when the number of cycles is over 20
But where is the bug?
If we go back to the original error message
Error in cycle_state_costs[i, ] <- (state_c_matrix[treatment = i, ] %*% :
number of items to replace is not a multiple of replacement length
This is telling us that the dimensions of cycle_state_costs
and the
result of the right hand side don’t match. To see why let us first see
what these dimensions are. Add a break point just before this line and
then type the following in the console when you enter the debugging
mode.
dim(state_c_matrix[treatment = i, ] %*% pop[, , treatment = i])
dim(cycle_state_costs[i, ])
cycle_state_costs[i, ]
It appears that the object of length 2 is being assigned 46 entries. If we look at the full matrix
cycle_state_costs
we see that this has 46 rows which is a remarkable coincidence. We can hypothesise that the rows and columns have gotten mixed up somewhere along the way earlier in the code and the error is thrown later on when we try to use these objects.
If we search for cycle_state_costs
we see that it is initiated with
cycle_empty_array
. This is the culprit. We can see that the dim
argument has the order cycle, treatment but the dimnames
argument has
the order treatment, cycle. Swapping dim = c(n_treat, n_cycles)
should
fix the error. This is a relatively easy bug to fix because we can spot
the inconsistency in the code. Some bugs can be particularly difficult
to catch but using the debugger to hone in and probe inside of the code
can be invaluable.