My team has been building a new service over the last few months. Until recently all the data it needs has been ingested at startup and our focus has been on the logic that processes the data, architecture, and infrastructure.
This week we introduced a couple of new endpoints
that enable the creation (through an HTTP POST) and update (PUT) of the
fundamental data type (we call it a definition) that the service operates on. I picked up the task of smoke testing the first implementations.
I
started out by asking the system under test to show me what it
can do by using Postman to submit requests and inspecting the
results. It was the kinds of things you'd imagine, including:
- submit some definitions (of various structure, size, intent, name, identifiers, etc)
- resubmit the same definitions (identical, sharing keys, with variations, etc)
- retrieve the submitted definitions (using whatever endpoints exist to show some view of them)
- compare definitions I submitted from the ones I retrieved (exact match, semantic match, etc)
- invite the system to perform actions on the definitions (in whatever ways I can)
- submit some broken definitions (deliberately crafted and by random edits to existing definitions)
- compare ingested definitions submitted ones (in whatever ways I can find)
- ...
But I didn't stop there. With that groundwork I was in a position to quickly run a few experiments looking for evidence that there might be some other deeper problems.
To help me do this, I took one of the requests as a curl snippet from Postman ...
... and pasted it into a bash script:
#!/bin/bash
curl --location 'http://localhost:4444/post' --header 'Content-Type: application/json' --data '{ "some": "json" }'
I ran the script to check that it returned the same results as the request in Postman. It did, and so now I had the beginnings of an ad hoc test rig.
The first experiment was to simply loop the request. I was interested to know what happened if I repeated a request at a fast rate:
#!/bin/bash
for i in {1..3}
do
curl --location 'http://localhost:4444/post' --header 'Content-Type: application/json' --data '{ "some": "json" }'
done
Depending on the request I was using this might be legitimately result in the same or different responses. Note that my rig was not checking results, just generating data for me to review.
As it happened, the responses were sometimes large and difficult to interpret, and the data that curl reports was also being printed to the console. So I told curl to be quiet, filtered to just a significant field in the payload using jq, and made the loop run more than three times:
#!/bin/bash
for i in {1..1000}
do
curl --silent --location 'http://localhost:4444/post' --header 'Content-Type: application/json' --data '{ "some": "json" }' | jq .json
done
With an easier way to review, I could not see an obvious issue so I thought I'd retry with some of the definitions I prepared earlier by changing the script to loop over a directory of JSON files:
#!/bin/bash
for data in definitions/*.json
do
curl --silent --location 'http://localhost:4444/post' --header 'Content-Type: application/json' -d @$data | jq .json
done
Now it became more important to see the full response again, so I stopped filtering to a single field and instead archived the whole response to a file which I could inspect at my leisure. You can see here that I simply dumped the response body to a file with the same name as the source data, suffixed with .response.
#!/bin/bash
for data in definitions/*.json
do
curl --silent --location 'http://localhost:4444/post' --header 'Content-Type: application/json' -d @$data -o $data.response
done
That's cheap and cheerful. but convenient and sufficient for this kind of exploration. I didn't know where I was going next: each of these steps, and what I found, and what I'd learned up to that point, informed the next experiment.
I was interested to see whether there might be any timing concerns. The definitions had various properties that might cause the server to do different amounts of work, so I asked for the total time for each request to be reported:
#!/bin/bash
for data in definitions/*.json
do
curl -w "%{time_total}" --silent --location 'http://localhost:4444/post' --header 'Content-Type: application/json' -d @$data -o $data.response
echo ""
done
That was fine, but I couldn't tell which file had which timings, so I tweaked the script a little for presentation. This version dumps a comma-separated pair of file and timing:
#!/bin/bash
for data in definitions/*.json
do
time=`curl -w "%{time_total}" --silent --location 'http://localhost:4444/post' --header 'Content-Type: application/json' -d @$data -o $data.response`
echo $data,$time
done
And then it was only a small step to see if the timings were consistent over time, by putting the whole thing in an outer loop:
#!/bin/bash
for i in {1..100}
do
for data in definitions/*.json
do
time=`curl -w "%{time_total}" --silent --location 'http://localhost:4444/post' -d @$data -o $data.response`
echo $data, $time
done
done
I took the data that was printed to the console and plotted it as a chart using a Google spreadsheet.
Next, I tried a quick concurrency test. It simply tried to submit two requests at the same time, using the ampersand character to make the calls non-blocking, and then fetch the data to see what the service stored.
#!/bin/bash
curl --silent --location 'http://localhost:4444/post' --header 'Content-Type: application/json' --data '{ "some": "1" }' &
curl --silent --location 'http://localhost:4444/post' --header 'Content-Type: application/json' --data '{ "some": "2" }' &
sleep 1
curl --location 'http://localhost:4444/get'
Notice that I made the submitted data slightly different so that I could tell which request, if either, was accepted by the service.
As you can see, there's not so much going on here in terms of HTTP or programming expertise (Postman and bash and several headfuls of curl are taking care of this) and I could cover a reasonable amount of ground very cheaply, amplifying the effectiveness of the work I'd already done.
Oh yes, and I found a few more issues that we'll fix shortly.
This is the kind of exploration with automation that I do a lot. I have a question and I wonder what is available, in terms of data or tools, that could help me to get the answer or some idea of whether the question is worth pursuing further.
I'll often take a single example from somewhere, make variants of it, repeat them in different permutations or orders, inspect the result, identify interesting behaviours, and then ask another question.
I've found it really powerful over the years.
Image: Larissa Megale
Comments
Post a Comment