Teaching is the best teacher
I just finished a fun-and-fact-filled 2-day workshop on Riding the Architect Elevator. At the beginning of the workshop I highlight to attendees that I do these workshops because it’s a great learning opportunity for me as well. That’s because having to explain things is the best way to really understand them. I also take away a lot from the many exercises and discussions because it allows me to harvest examples from the attendees’ diverse contexts. it also sharpens my arguments and storylines – nothing tests your logic better than a dozen smart architects asking questions.
Teaching done right is therefore a definite two-way street. If you attend a training course, you’ll notice quickly whether your instructor is there to teach only or also to learn. I expect you’ll find the latter much more valuable.
Show the pirate ship
One of the 37 chapters in my book recommends “showing kids the pirate ship”. This advice is a reference to a typical Lego box: the cover doesn’t show all the tiny pieces inside the box, but it shows the pirate ship (or whichever thing you can build from the pieces). Seeing the final product excites the kids and shows the actual purpose of all the pieces on the inside.
Sadly, IT tends to do the exact opposite: we love to show all the little pieces in excruciating detail but forget to show the pirate ship that comes out of it. This is one of the many reasons IT remains a mysterious “black box” to much of the business where a lot of money goes in and little comes out.
A somewhat related exercise we do in our workshops is to have architects draw out a system structure. We do this in small teams so we can compare the results and observe and critique the results. Because the purpose of the exercise is to see different ways of illustrating an architecture I wanted to pick a system that’s quite well understood to almost all attendees. I therefore used an abstract monitoring system, which checks the health of applications and alerts if something is amiss. The nice thing about a monitoring system is that most every engineer should have interacted with one.
As part of the exercise I hand a stack of little cards to the teams of architects. Their task is to make a “good” architecture diagram that incorporates all those pieces.
The cards contain components like Black-box monitoring, White-box monitoring, Log aggregator, Time-series database, Tiggers, Alerts - all well-understood pieces of a monitoring system.
Teams tend to have quite a bit of fun with the exercise. After about 10 minutes of discussing, sorting cards, and drawing on flip charts. They generally come up with drawings like these:
Considering the purpose
The drawings generally have a clean data and/or control flow from the application through the sensors, logs, log aggregation, the time-series database, alerts, down to the operator. The diagrams are generally well structured and have a visual language that expresses the semantics of the underlying system.
After presenting and discussing the sketches, I generally ask an innocent-sounding question “what’s the purpose of this system?” Usually, after considering the purpose to monitor the applications, architects start to “zoom out” and look at the purpose to be closer to detecting outages to the real purpose: maximizing system availability - if you don’t care about system availability, you don’t need any monitoring.
Closing the loop
Upon realizing that the purpose of monitoring is to maximize uptime, we augment the picture to “show the pirate ship”. We close the loop by drawing a connection from the operator to resolving the issue that caused an alert to trigger. Once we have the loop from System Under Test to Alert and Resolution in the picture, we can highlight the purpose of the system visually: minimize the time from the issue taking place to resolving it. This is the system’s Mean Time To Recovery (MTTR). We draw this as a bold statement in the middle of the loop.
Making better decisions
Once the purpose and the complete system are clear, we can see that having the “full picture”, so to speak, helps us make better decisions. It’s now apparent that the MTTR is made up of two halves: how long does it take to detect an outage and how long does it take to resolve it. Once this aspect is clear, one can reason about whether the company should invest in a better monitoring system. For example, investing in a monitoring system that reduces the time to detect outages from half an hour to a few minutes thanks to better sensors and smarter analytics may seem like a good idea. Once you consider, though, that resolving an outage takes several hours, the picture changes. Investing let’s say half a million Dollars to reeuce the MTTR from 4.5 hours to 4.1 hours doesn’t look that great anymore. Instead, you’d be looking to reduce the time spent resolving outages, e.g. by better transparency across systems or higher levels of automation, e.g. to roll back tehe version of deployed software. Drawing a better picture has helped us make better decisions.
Teaching the teacher
Admittedly, I trick the participants a little but by handing them only cards describing the “monitoring” side of the system. At the same time, the ability to detect missing pieces and “zooming out” to see the bigger picture are essential capabilities of an architect.
The most valuable part for me is that the exercise didn’t start out this way. Originally it was just a way to draw a few architectures and compare them. Through the dialog with attendees it evolved into combining it with the Pirate Ship and decision making, drawing multiple elements of the class into a single exercise. The best way to learn is to teach.