I'm been working on an application that will orchestrate data from multiple services. As the developers add clients for those services, they have been writing integration tests and, naturally, many of them use mocked data.
Mostly the data consists of non-trivial JSON response bodies full of domain-specific terminology captured during interactions with the other services. Consequently, many of our tests reflect this complexity and domain-specificity by asserting on its data structures and particular terms.
This is functionally fine, but problematic for readability because test intent can be hidden in a mass of incomprehensible word salad. Again, this is usually fine for the author when writing the test because the intent is front of mind but it's problematic for other readers, including the author later.
I have been vocal about this drawback and today one of my colleagues asked me to summarise my prescription for it. Without thinking I said this, and I liked it a lot, so you get to hear it too:
Test data should be as real as it needs to be and as clear as it can be.
What do I mean by this?
- Real: sufficient structure and content to enable the test, and not inconsistent with a valid payload from the remote service.
- Clear: where specific details don't matter remove or simplify them. Where you need to keep data make it self-describing if you can.
Essentially, the test data only needs to be a model of the actual data. What you want to do determines the kind of model you need.
Here's a contrived example. Imagine a service that is returning encoded versions of documents in this sketch "schema":
{
"source": "the raw source",
"hash": "the source hashed as a 128-character string",
"transaction_id": "a unique identifier",
"metadata": {
"user": "the caller's username",
"timestamp": "the date time of the request",
"cost": "the price of this operation"
}
}
We have a test case for comparing the hashes of two responses to see whether they are the same and we're going to implement it with mocked data. So, unless, say, our application wants to validate the hash length, I might craft something like this:
{
"hash": "duplicate_hash_string",
"source": "a",
"transaction_id": "b",
"metadata": {}
}
{
"hash": "duplicate_hash_string",
"source": "y",
"transaction_id": "z",
"metadata": {}
}
What did I do?
- moved the data the test needs to the top ...
- ... and made it self-descriptive
- de-emphasised data only needed for the application to accept the payload
- removed data that the application doesn't care about
- kept the payloads consistent with legal responses from the service
Now when I look at the data it's clear to me what is relevant to the test and when I look at the test itself I see a string that helps me interpret its intention, duplicate_hash_string.
You might ask what I'd do if our application did want to validate the hash length. Good question! In that case I'd try to make it something like duplicate_hash_string_...xxx..._128_chars.
Make your test data models real enough and as clear as possible.
Image: Wikipedia
Syntax highlighting: pinetools
Comments
Post a Comment