Monday, July 30, 2012

Testing Generally

I sometimes consciously split the functionality I'm testing into two parts: general: behaviour that is the same, or similar, regardless of where it appears, how it is invoked and so on; and specific: which differs according to function, context, time, data types etc. 

I'll tend to do this more on larger projects when the areas are new to me, or to the product, or if they're complex, or I think the test framework will be complex, or the specific is heavily dependent for its delivery on the general, or perhaps when the specific details are certain to change but the general will be stable.  

I'll be looking to implement automation that concentrates first on general functionality and self-consistency and that will serve as a backstop when I move on to the more specific material. 

To speed things up, to get wider coverage easily, and to avoid dependencies, I'll try to avoid crafting new test data by looking for data already in the company that can be reused. Static dumps from live servers can be good, but dynamically changing internal landfill instances are gold dust because they'll be running the latest Dev build and generating new data all the time.

Take the example of a server which exposes an API using HTTP. The API gives clients access to resources (by URLs) and actions on those resources (e.g. searching across back-end data sources).  My functionality breakdown might include the following:
general: each resource is exposed to a client as custom data structures but some properties will be shared across resources, e.g. "children" always represents sub-resources whose URL can be derived from the resource itself.  
An interesting subset of general functionality is those based on standards. In this case, the HTTP standard for client-server communications is  well-defined and independent of your product (although your product may only implement parts of it and there are areas in which there is leeway for client and server to choose an action).  
specific: any functions on the resources that are outside of HTTP are specific. For example, the query parameters on URLs will have a specific meaning to this server.
So how might I set up general testing, using pre-existing data here?

There's a huge space of potential tests to do with conformance to HTTP RFCs. As these tests are, for the most part, independent of  the data in your system  you can implement them without worrying about what data you have (if you request a resource that's not there, the system should respond with a 404).

A particular general test  might request the children of a collection resource (effectively a folder) and then request each of them in turn. If they all exist, it confirms a degree of consistency between the back-end data, its presentation in the API, and the client-side view of it. Conversely, requesting a resource that you know should not exist (e.g. http://myserver/collection/thiswasnotachild) can confirm error behaviour. Note that you can not confirm that all of the children that should be there are present this way, without extra knowledge of whatever backs the server.

A subclass of specific tests is close to general: system meta data. That is, a set of attributes of the product that are true regardless of the data that's in the system.  In the server example, perhaps there is a finite set of resource types that the server will enumerate. You can cheaply check that the server's list agrees with a list in your test suite without knowing what data is stored for any of those types.

If there is a lot of data in your test systems, randomising access to it lets you trade run-time of a given invocation of the suite against cumulative coverage over time, because different sets of data will be visited on each run. You can implement a cache of what's been touched in recent runs and avoid it later although I have found this not worth the hassle. On a landfill server, data can change under your feet which adds another dimension to the testing. And note that  it can be productive to run the suite against servers without any data in their back-end stores at all. 

These kinds of suites can also be parameterised. For example, we could ask the randomisation to run tests for a certain period of time, to a certain depth or breadth, for a certain number of data items or some other limit or search strategy.  In an  automated GUI test suite we're building at the moment, we're playing with parameters representing user behaviours such as "fast" vs "slow", "keyboard" vs "mouse" and so on for different invocations of the suite - running the same tests in different ways.

So why might this kind of testing be interesting?
  • It puts you in the product (or in the technologies on which the product depends) immediately, learning about both, getting background for the specific testing and testing to a level that is practical and sensible at any given time.
  • You quickly flesh out the basic structure of your test harness, learn what kinds of utility functions you'll need and the like. This can be invaluable when you're ready to extend to specific tests because you've got the infrastructure in place already.  I try to partition the two sets of tests so that I can run them separately.
  • You end up testing against all sorts of malformed data (intermediate formats; buggy data, crafted data from the dev team, antique data from previous releases...) and learn a lot about how the application copes with them.
  • Consistency is a testing watchword (see e.g. FEW HICCUPPS) and time spent understanding the baseline level of consistency of a feature or product is seldom wasted.General testing is a lot about consistency.
  • When you're ready to, and if it makes sense to, you can extend to creating data as well. If I do this, I make a point of cleaning up test suite data at close.
It's clear that this approach has limitations. In particular, although it's data-driven, it's driven by the data that is present and by a one-sided view of that data. If it passes, it will tell you that  no inconsistencies were found in the data and functionality touched by a particular run, but no more.

Despite this, it can be very productive and later become a regression test that extends as the data you point it at evolves. There's usually suitable data lying in your dev and test environments that belongs to you, was otherwise redundant and that you can get the extra value from.


  1. So would it be fair to say that for you, "specific" means cases which verify specific data (or a known set of data) whereas "general" means cases which don't rely on any specific data, for which any data set could be used? (Or which are good for verifying large and varied data sets.)

  2. @Jenny: Pretty much. I don't think I'd phrase it as "verifying data", though. I think of it more as executing sets of checks against the software may or may not have specific dependencies, although they are primarily data dependencies in this example. Other dependencies might include configuration options, user accounts, server-side plug-ins, additional licensed features etc.