Friday, November 9, 2012

Of TDD and Unit Testing

Inspired by an amazing talk by Greg Wilson with the topic, What We Actually Know About Software Development, and Why We Believe It's True, I will not accept truths about software development without citation anymore!

Its amazing to realize after doing software professionally soon for 13 years that the things we take as truths in this profession are often not much more than opinions.

So the first thing I decided to ask Google was: Is there proof that unit testing and TDD really results in better software? The most promising result was a research paper published by Boby George and Laurie Williams: An Initial Investigation of Test Driven Development in Industry.

The results seem to back up the claims of TDD (and by reference also unit testing, I'd assume). 18% increase in software quality with only 16% increase in development time compared to the control group (and the control group didn't mostly do any unit tests, even afterward so the increase in development time is probably skewed).

So based on this research I'd have to admit to being wrong about TDD. But...yes, of course I have a but. The researchers admit themselves that the software that was developed was trivial (200 LOC, a bowling game) with static completely known requirements. To me this is a very serious problem with the study. All that can be concluded is that TDD works in optimal situations.

Things that this study ignores are:

  1. Granularity of unit tests. In large software there are multiple layers where tests could be written. Are we to assume always the lowest possible level (ie. class)? If not, what is the optimum granularity? 
  2. Complexity and lability of requirements. How well would TDD approach work in high complexity work with shifting requirements where fast feedback cycles and ability to change are important (ie. software will never make it into production if correct requirements cannot be harvested via fast feedback loops)?
  3. Software size. Unit test code is often overlooked in the size of the overall codebase. In large refactorings test refactoring often (in my personal experience) takes most of the time. Is there a limit after which test code amount starts to hamper development speed enough to matter?
Since I believe in functional, automated black box testing via interfaces or user interface, to me it would be much more interesting to compare software developed with rigorous TDD and unit tests to software developed with tests similar to what the researches in the TDD study themselves used to determine the quality of the solutions. More specifically it would be interesting to find answers to these questions:
  1. Which approach yields faster development cycles with a mature code base and shifting requirements?
  2. Which approach survives large and small refactorings better and how big difference there is between development effort when refactoring?
  3. Is there a difference in overall software quality? 
Also what was a bit striking and surprisingly dogmatic for me was this quote from the paper: "The industry standard for coverage is in the range 80% to 90%,  although ideally the coverage should be 100%" (The paper quoted this from Steve Cornett).

If you've ever worked in a large real-life project of any significance, you'll realize how absurd this is. To get test coverage to 100% with white box unit tests requires enormous amounts of very fragile unit test suites basically casting your software in cement. Refactoring is of course always possible, but I would imagine the psychological barrier for doing a large refactoring in a system like this is very high indeed. 

I know what you're saying now. But isn't that just the point? That the unit tests give enough security to do refactorings? Sure, but don't black box functional tests do just the same? In fact they do much more. They make sure your software works identically from the point of the outside world, which is what matters.

I have zero research to back my hypothesis that a much better overall result could be achieved these ingredients:
  • Automated black box functional tests for interfaces and user interface where test code to production code ratio is optimised (Another thing I'm surprised to never see mentioned).
  • Unit tests for obvious hot spots in the codebase with difficult algorithms that lend themselves to be easily tested with unit tests.
  • Minimising accidental complexity and CDD (Curriculum Driven Development) - this is slightly off topic, but I like to keep reminding.
In fact I find it very surprising that functional black box tests aren't among the first things to be done in a new project. Still I keep finding projects where people write 1000 line unit tests for single Wicket-components that themselves have half the amount of code. Yes, you'll find out if a change in the component breaks something, but you're still utterly clueless as to if your actual user interface works as expected as a whole. 

I'm starting to think there is a background in psychology here as well. Its about responsibility and limting it. "My job is to make this single component. I did it and wrote a unit test to prove it. Now I'm done, I don't need to think of the software as a whole."

That may sound cruel, but I have observed it to be true much more often than I'd like to admit.

I keep coming back to psychology so often these days that I'm starting to think of starting the study of Software Development Psychology :)