Saturday, August 20, 2011

Monkey Testing: a Technique for Detecting Nasty Bugs

I'm going to tell about a technique that could help you find those bugs that no one else can find or are classified as "non-reproducible".

Some real life bugs

First let me show you that the technique also works with a real and recent 0.37 version of RIDE by demonstrating some hard to detect bugs that I detected with the tool I made. NOTE: These bugs have been in RIDE for very long time so they should be repeatable also in older versions.

First bug

  1. Open RIDE

  2. Open a keyword or a suite editor

  3. Insert some text to the first row

  4. Select the first row and select move row up from row context menu -- Nothing happens

  5. Undo the last command (Ctrl+Z) -- produces a stack trace either to RIDE log or to command prompt

Second bug

This is a bit trickier but it has an issue in RIDE issue tracker -- unlike the previous one. Most likely because the underlying problem could have more ways to reveal it self than just this one.

  1. Open RIDE

  2. Open a keyword editor

  3. Insert some text to the fifth line

  4. Delete first row

  5. Delete sixth row

  6. Save file

  7. Delete third row -- produces a stack trace either to RIDE log or to command prompt

Monkey testing

So how did I detect those bugs and how did I find a way to reproduce them?

I used a technique called monkey testing. It's name comes from the saying: "a thousand monkeys at a thousand typewriters will eventually type out the entire works of Shakespeare".

The basic idea is to make an automated system that randomly executes actions in the system under test. If the actions are selected in a way that will keep the system under test in a running state the test can just go on forever (or until it finds a bug).

To make the detected bugs reproducible the randomness in the test system must be controlled. Basic method is to control and log the seed value of the used (pseudo) random number generator.

Usually a failing test run will result in a very long trace (a list of executed actions) that needs to be pruned to find out the actions that resulted in the detected error. This pruning can of course be automatic.

You can find the related code from

How do I know (well almost) that there is no more of these bugs around?

Because I have now let those monkeys run for several hours without catching anything. This gives me confidence to say that it is very unlikely that there are bugs that the monkey testing tool could produce and detect. In RIDE I mean that there most likely are no more non-GUI code bugs that will throw exceptions while editing test cases or keywords. And I can still put the monkeys to work to make me even more confident that the bugs that I have detected so far are all there is (I have corrected the bugs so the monkeys will not stumble on them again).

It is very important that a monkey testing tool can produce enough different kinds of actions to make it possible for it to express different types of defecting traces. The tool should also have good enough detection mechanisms so it will catch the defects -- but remember that more complexity means more complexity (the error could be in the tool if the tool is too complex).

In my case with the RIDE the detection mechanism has so far been only to catch exceptions but I've been thinking of taking some basic action related assertions in to use.

If you find this technique useful you could also check out model based testing to make monkeys that handle more complex action sequences and situations.


Bulkan-Savun Evcimen said...

Isn't this like Fuzz Testing ?

Mikko Korpela said...


Fuzz Testing in my opinion is more about random data. Monkey Testing (or State Machine based Model Based Testing in more general) is more about randomized action sequences.

Although in my opinion there sometimes is no difference between actions and data..