Friday, February 11, 2011

Too much sleep is better than not enough

Timing is an important aspect of automated testing. Tests should run in minimum time so that they give rapid feedback and are not bottle necks. But when testing complex systems some operations unfortunately just take time and this has to be handled in tests. This post is about handling those situation in tests.

Usually the first solution is to use sleeps. A sleep (in Robot Framework BuiltIn.Sleep) does nothing for a specified time period and continues the test after this. A Sleep isn't usually the optimal solution as it doesn't guarantee anything else than that the give time has passed. In other words: It guarantees that the test will take longer and doesn't guarantee that the waited event has happened.

Sleeps don't always work. This results in flickering tests.

To make Sleep work always it has to be very long so that it will be enough for every environment where the system under test should work. This is most likely longer time than what it would take in average.

Another solution for waiting is polling (or busy wait). In polling (in Robot Framework BuiltIn.Wait Until Keyword Succeeds) the waited condition is checked repeatedly until the condition passes or a timeout period expires. Polling guarantees that the waited event has happened.

Polling will usually result in a less total time of test execution than using sleeps. Sometimes this doesn't happen. For example the polling check could exhaust the system under test. Repeating the keyword will also produce more test logs.

If the repeated keyword takes enough time wait until could take more time than needed.
For example keyword that would take 9 seconds, in a situation where there has to be at least 10 seconds of waiting before the keyword can succeed, it will take 27 seconds to execute the Wait Until Keyword Succeeds but it would take 19 seconds to sleep 10 seconds and then execute the keyword.

Other methods to handle time:
* If possible simulate the passing of time
* Hollywood principle
* First sleep and then use wait until keyword succeeds to minimize polling

4 comments:

tarun k said...

I prefer polling over any other technique. I also feel that hard coded wait periods are the worse which could be adopted in test automation.

Mikko Korpela said...

I think that the best solution is to use the Hollywood principle (the test is notified in the appropriate time) but this isn't always possible in test cases..

Polling is also very good in many cases and can be easier to implement than notification system.

Sometimes I've seen cases where polling isn't the perfect solution.

For example when it takes a very long time to do the operation that we are waiting (lets say at least 30 minutes -- hard limit) and one round of polling will take a very little time (lets say 0.01 seconds). In this kind of situation polling will produce huge logs (30*60/0.01 == 180 000 rounds).

I would most likely first sleep 29.9 minutes and then start polling. I would also abstract this waiting functionality from the test case to a separate function.

Does anyone have other ideas to handle this kind of long waits in tests that can't use notification systems?

Or other timing related war stories

catherine beloin said...

i think that "BuiltIn.Wait Until Keyword Succeeds" could be adapted to manage case when we have to wait a very long time (more than 1 hour in endurancy tests): the same way gmail increases its retry connection delay when you are disconnected from the network, "BuiltIn.Wait Until Keyword Succeeds" could wait a longer time at each iteration (let's say: (delay + original delay) each time), this is a compromise to prevent too much logs and to wait not too long in the latest iteration. what is your opinion?

Mikko Korpela said...

Hi Catherine,

I think it is a good idea. I even thought that I had implemented that but it has been 3 years since I've left the Robot Framework project so I really can't remember.

I think it would be fairly easy to make an implementation that would always increase the wait time by let say 10 %. This would generate logarithmic number of checks and in the same time the possible increase in waited time would be minimal.