Thursday, 7 October 2010

Automation benefit measured by EMTE - good or bad?

Being able to run tests that we would not have had time to run manually is one of the benefits of automated testing; we should be able to measure this benefit in some way.

What is EMTE?
EMTE stands for "Equivalent Manual Test Effort" and is a way of measuring the benefit of running automated tests.

If an automated test (A) would have taken 4 hours to run manually, then its EMTE is 4 hours; another test (B) that would have taken 7.5 hours to run manually has an EMTE of 7.5 hrs.

In a test cycle, if Test A is run five times, and Test B is run twice, then the EMTE for that cycle is 5*4 hrs + 2*7.5 hrs = 20 + 15 = 35 hours EMTE.

What is EMTE used for?
EMTE can be used as a way to measure a benefit of test automation (automated running of functional tests).

When tests are automated, they can be run in much less time than they could be run manually. Our tests A and B may be able to run in 5 and 10 minutes respectively, for example. So we can achieve "4 hours' worth" of manual testing in 5 minutes of automated testing. Whenever we run Test A, we can "clock up" 4 hours of EMTE.

Is EMTE a good thing?
Yes, because it is a way to show the benefit of automation.

Costs (of automation as well as other things) tend to become visible by themselves - managers see that people are spending time on the automation. But what is the benefit of this automation? If you don't make benefits visible to managers, there is a risk that they will not see the benefits, and may eventually conclude that there are no benefits. EMTE is one way to make an automation benefit visible.

So how could it be a bad thing?
I have had discussions with a couple of people recently (thanks Julian and Wade) about abusing EMTE, and yes, it can be abused (as any metric can). Here is how it could be mis-used:

"Test A takes 5 minutes, so let's run it 12 times every hour for 2 hours. This gives 24*4 hours of EMTE = 96 hours. This will make us look really great!"

The problem is that after the first run, the other 23 runs are being done just for the sake of looking good, not for a valuable benefit in terms of running that test. This is an abuse of EMTE, and is a bad thing.

What to do about it?
Use EMTE (and other measures of the benefit of test automation) sensibly.

Perhaps only "count" EMTE once a day, however many times a test is run? (e.g. in continuous integration testing)

In what other ways can the benefit of automation be shown? (e.g. more coverage, freeing testers to find more bugs, number of times tests are run, more variety of data used?)

Have you encountered the abuse of this (or other automation) measures? How have you solved the problem? (Please comment, thanks!)

Dot Graham

Thursday, 27 May 2010

Automated tests should find bugs? No!

I have recently been having what seems like the same discussion with a number of different people.

"Automated tests should find bugs" or "find more bugs" is a very common misconception. Basically this says that finding bugs is a valid objective for automation. I don't agree - I think this is generally a very poor objective for test automation. The reasons are to do with the nature of testing and of automation.

Testing is an indirect activity, not a direct one. We don't just "do testing", we "test something". (Testing is like a transitive verb which requires an object to be grammatically correct.) This is why the quality of the software we test has a large impact on testing: if a project is delayed because testing finds lots of bugs, we shouldn't blame the testing! (I hope that most people realize this by now, but do have my doubts at times!) Testing is not responsible for the bugs inserted into software any more than the sun is responsible for creating dust in the air. Testing is a way of assessing the software, whatever the quality of that software is.

Test automation is doubly indirect. We don't "do automation", we "automate tests that test something".

Automation is a mechanism for executing tests, whatever the quality of those tests are (that assess the software, whatever the quality of that software is).

Bugs are found by tests, not by automation.

It is just as unfair to hold automation responsible for the quality of the test, as it is to hold the testing responsible for the quality of the software.

This is why "finding bugs" is not a good objective for test automation. But there are a couple more points to make.

Most people automate regression tests. Regression tests by their nature are tests that have been run before and are run many times. The most likely time a test will find a bug is the first time it is run, so regression tests are less likely to find bugs than say exploratory tests. In addition the same test run for a second time (and more) is even less likely to find a bug. Hence the main purpose of regression tests (whether automated or not) is to give confidence that what worked before is still working (to the extent that the tests cover the application).

Of course, this is complicated by the fact that because automated tests can be run more often, they do sometimes find bugs that wouldn't have been found otherwise. But even this is not because those tests are automated, it is because they were run. If the tests that are automated had been run manually, then those manual tests would have found the bugs. So even this bug-finding is a characteristic of the tests, not of the automation.

So should your goal for automation be to find bugs? No! At least not if you are planning to automate your existing regression tests.

I have been wondering if there may be two exceptions: Model-Based Testing (where tests are generated from a model), and mature Keyword-driven automation, i.e. using a Domain Specific Test Language. In both cases, the first time a test is run is in its automated form.

But hang on, this means that again it is the tests that are finding the bugs, not the fact that those tests are automated!

"Finding bugs" is a great objective for testing - but it is not a good objective for automation.

Thursday, 14 January 2010

Finding 40% of your own mistakes

I used to be fond of quoting a statistic that says you can only find around 40% of your own mistakes.

Michael Stahl emailed me to ask where this number came from. Interesting question - first thought - I don't remember! I'm sure I must have read it somewhere at some time, but where, by whom and was it based in a study?

I checked with Mark Fewster, one of my former colleagues, and he thinks it might have come from a study done by the Open University in the UK.

I checked with Tom Gilb, as he uses an estimate of around a third (33%) for the effectiveness of an initial inspection - which is probably more effective than an individual anyway! Tom has demonstrated an effectiveness of 33% repeatedly by experimentation with early Inspections; he said it also agrees with Capers Jones' data.

I think we used the figure of 40% only because people found it more believable than 33%.

The frightening consequence is that if you don't have anyone else review your work, you are guaranteed to leave in two thirds of your own mistakes!

Saturday, 2 January 2010

DDP Discussions and challenges

Several people have asked about benchmarks for DDP. I have actually blogged about this, but my comments are "buried" in the comments to the post about "Starting with DDP". Please have a look at the comments for that post, which include:

- benchmarking DDP with other organisations (raised by Bernhard Burger)

- Paul Herzlich's challenges about the seriousness of defects, getting data, DDP being hard to use and code-based metrics (all of which I have replied to in my following comment)

- using DDP to improve development (raised by Ann-Charlotte)

- Michael Bolton's challenges:
5 examples to show when it doesn't work (which I reply to in my first comment following his) (including some Dilbert Elbonian testers ;-)
7 "problems" - some of which I agree with, some I don't understand, and some I think illustrate the benefit of DDP rather than being problems with it (replied to in my second comment after his)

Thanks to Michael B's comments, I also formulated 3 Rules for when to use DDP:
DDP is appropriate to use when:
1) you keep track of defects found during testing
2) you keep track of defects found afterwards
3) there are a reasonable number of defects – for both 1) and 2)

These are not the whole story (as illustrated by Michael's examples) but I think are a pre-requisite to sensible use of DDP.