Friday, July 10, 2020

Stuck in the Middle

This week I wanted to monitor several pieces of software that talk to one another via HTTP and HTTPS. All are running on the same machine, three are Linux services, and one is a standalone script. I was interested in being able to see all of the communications between them in one place, in time order.

I know a couple of ways of capturing this kind of data: proxying and network sniffing

My default approach would be to have the applications configured to proxy via Fiddler running on my laptop inside the work network. Easy, right? Err, no, because I had forgotten that the machine in question is on a network that isn't considered secure and firewalls prevent that connection. In my proof of concept experiment, the standalone script just hung failing to find the proxy I had specified. Interesting behaviour, and I later reported it, but not what I needed right then. Next!

As all of the software is on the same machine, capturing network traffic into a pcap file using tcpdump should have been relatively straightforward and I could import it into Fiddler for viewing. Result! Err, no, because HTTPS traffic is not decrypted with this approach so I only got some of the comms. Next!

What if there was something like tcpdump for HTTPS? A bit of googling and I found ssldump. Result! Err, no, because although it was simple to install, I couldn't make it work quickly and the data I was trying to gather was not important enough to invest enormous amounts of time in learning a new tool. (Although knowing that this tool exists might be very useful to me in future.) Next!

Back to proxying. What about if I run a proxy on the machine itself? I remembered playing with mitmproxy a long time ago and its web page says it deals with HTTPS so I installed it. Result! Err, no, because the latest version won't run due to a C library incompatibility on this machine. A quick search on the developer forums suggests that this is a known and accepted issue:
We build our releases using PyInstaller on GitHub CI runners, and that combo doesn't allow us to support earlier glibc versions. Please go bother RedHat.
I have been burned before by trying to upgrade Linux C libraries and, again, today is not the day for deep diving into infrastructure that only facilitates a potentially interesting but not crucial experiment. Next!

Hold on, I'm not ready to give up on mitmproxy yet. Might there be an older version that depends on an earlier C library? Is there a page I can download historical versions from? There is. Result! And this time, after stepping back a major version at a time, I got 2.0 running on the box. Double result! Next!

Next is to repeat the proof of concept test with the standalone script. The script has no proxy configuration options but I know it's running Python's requests library and an earlier search told me that it should respect the Linux HTTP proxy environment variables

So, in one shell I started mitmdump, a flavour of mitmproxy:
$ ./mitmdump
Proxy server listening at
In another shell, I set the environment variables to point at the proxy's URL and ran the script:
$ export http_proxy=
$ export https_proxy=
$ ./myscript
At this stage, I don't care to know which of the proxy variables requests will respect, so I simply set them both. 

Result! HTTPS traffic appears in mitmdump's console and it's the kind of traffic I expect. POST https://myserver:8000/endpoint
              << 201 Created 91b GET https://myserver:8000/endpoint/result_data
              << 200 OK 20b

Next was to get the various services configured to proxy through the same mitm instance too. Unfortunately I found that they do not have proxy configuration options. I wondered whether they would respect the Linux environment variables but didn't know how to set them in the environments that the services ran in. I pinged the testers for those services in chat and  in parallel did some more googling. 

It seems that it's possible to set environment variables in an override file per service. Result! So I invoked the service editor and entered the runes required to set the same variables for one of the services:  
$ systemctl edit myservice


$ systemctl restart myservice

I ran the script again and this time saw traffic from both it and outbound from the service it was speaking to. Result! I quickly configured the other services in the same way and had the monitoring that I needed: all the traffic from all pieces of the infrastructure I cared about, aggregated in one time-ordered location.

In total, this took about an hour, and I spent another few minutes writing the steps up on our wiki for future reference. (Years ago I set up a wiki page called Log More From ... where we've documented the various tricks we've used over the years to get access to data around our products and tooling.)

A moment of reflection, then: I had a mission here. I didn't state it explicitly, but it was something like this: explore setting up HTTP/HTTPS monitoring using whatever tools work to get inter-application monitoring data for a specific analysis. The experiment I was engaged in was a nice-to-have. I was already reasonably confident that the right functional things were happening, and I had access to HTTP logs for some of the pieces of the infrastructure, so I didn't want this to be a time sink.

This framed the way I approached the problem. I have some background here, so I tried approaches that I was familiar with first. I used something like the plunge-in-and-quit heuristic, which I first read about in Lessons Learned in Software Testing, and which James Bach describes succinctly as: 
...pick something that looks hard to do and just attempt to do it right away. It doesn’t matter what you pick. The point is to try to do something now. You can always quit if it’s not getting done.
This mindset helps to stop me from disappearing down intellectually interesting or technically challenging rabbit holes: if it's not working, step back and try another way. 

Another thing that helps is having been in these kinds of situations before. My experience helps me to judge when to quit and when to continue plunging. Unfortunately, there's no substitute for experience. And the truth is that the judgements made even with experience can still be wrong: if I'd spent another couple of minutes working out what I was doing wrong with ssldump, perhaps I'd have satisfied my need ten minutes in? Yes, perhaps, or perhaps I'd have burned the whole hour fiddling.

On the whole I'm happy with how this went. I got what I needed in a proportionate time, I learned a little more about mitmproxy, I learned a new trick for configuring the environment of Linux services, and I learned of the existence of ssldump which could be just the tool I need in some future situation. Result! 

No comments:

Post a Comment