Why we’re using random data for our Selenium tests
Me and François have been working on the Selenium tests for some time now. Several tests are on Reviewboard or already committed. During the review process, we were asked to justify why we decided to choose random data for the Selenium tests instead of predefined data.
Of course, the Basie instance on which we run the tests needs to have data. But where do we obtain it ? The two main options are :
- Predefined data created by the tester
- Randomly generated data that fills the database
Let me begin by explaining the differences between the tests with each approach.
Predefined data
We create a set of data that will be used for the tests. This data will be the same for anyone who runs the test. Otherwise, the tests would fail. This is often the approach used by Unit Testing. We know what input the function is given and we know what output to expect.
Here’s an example :
I want to test if the version numbers for the Wiki pages are incremented when the page is modified. I generate a Wiki page and some old versions for the same page. The generated page is now on version 7. The Selenium test modifies the Wiki page and verifies that the version number is 8. If the version number is not 8, the test fails.
Random data
We start by filling the database with randomly generated data. Users, tickets, projects, mail, everything is random and different for everyone who runs the tests. Since we don’t know what the data is on the website, we can use the Selenium to figure out what output we should be expecting.
Using the same example as before :
I generate random Wiki pages. Using Selenium, I select a random Wiki page and I dynamically check what is the current version for Wiki page. I then modify the page and make sure that the version number has been incremented.
So, which one do we use ?
The advantage of the random method is that it is dynamic. The predefined data tests are static. They won’t work on any other set of data, as opposed to the random data tests. This might seem like a flimsy “advantage”… Why would I want to run my tests on various data ?
You want to do this in order to avoid the Pesticide Paradox. The overall idea can be summed up with this quote :
“Highly repeatable testing can actually minimize the chance of discovering all the important problems, for the same reason that stepping in someone else’s footprints minimizes the chance of being blown up by a land mine.” - James Bach
How does this apply to our Selenium tests ?
By running the tests against randomly generated data, we make sure that we don’t step in the same footprints. Different sets of data will find different bugs.
There is also another benefit from this.
Keep in mind that we are testing functionality with Selenium tests. We are testing the whole application and the interactions between the various parts of Basie. This is not the case with Unit Testing.
This means that we want every Basie functionalities to behave a certain way, no matter the data. This can be verified with little effort by using random data. This is also the best way to simulate a user’s experience and make sure that things look and act the way we expect them to.
For those two main reasons, we have chosen to stick with randomly generated data. We invite you to regenerate your data from time to time when running Selenium tests.
[...] Created a Blog Post on why we want to use random data for our Selenium Tests [...]
Guillaume – Status Report at Basie Blog
22 Mar 10 at 8:00 pm