The o11y space is buzzing with activity and I like that. Things are moving and it looks as if we finally get a place at the big table. One thing I read a couple of times in the last weeks was the complexity in O11y and that context matters. Those 2 are interwoven and understanding what that means is at the core of a good strategy when you want to implement O11y.
Let me give you an example
Sunshine ahead on the weekend ☀️
The weather forecast for the weekend is awesome – like in Europe for that last weekend – sunny, warm, spring like. That is great .. and all based on 1 datapoint that is predicted for that weekend, well maybe 2 datapoints (aka metrics): The hours of sunshine and the expected temperature.
Those datapoints have meaning, but the meaning is different for everyone. Just like with data in O11y – context matters.
Oh, you ask for examples? OK, here are some:
- For a parent with kids that means: Buy sunscreen when you plan to go to a lake, beach or just to the park – BTW sunscreen is generally a good idea not only for kids
- Barbecue sounds fantastic: you need to get charcoal, stuff to grill, stuff to drink and invite some friends
- For a merchant selling barbecue items: Stock up on goods to sell before the weekend, good business ahead
- For the grocery store or supermarket it means: Stock up on grillable items and refreshments, maybe add more personell to open more tillers at peak times
- For doctors in hospitals it means: expect patients with skin issues like burns – either heavy sunburn or sometimes grilling incidents, heat strokes, abrasions or worse from accidents at the lake or the beach
- For pharmacies and drugstores that means: Stock up your sunscreen and possibly mosquito repellents
- And the list goes on
Based on experience in the past everybody makes something personal out of the same datapoints that are publicly announced. Walmart using the weatherforecast for a real sales boost is a good example.
It is the context that defines the meaning.
What does this have to do with O11y?
We have data, usually lots of it and it needs to be processed to cater for everyone‘s needs.
When my company sells barbecue utensils in its stores, we, the ops team, have to make sure that the whole chain works fine so that there is enough but not too much charcoal in the stores. When they run out there is no revenue, just empty palettes. And the cash registers need to work flawlessly Friday afternoong and Saturday morning, when the shoppers come.
As it all runs on our highly distributed, cloudbased, µServiced system, we can react quick, and prepare for a higher than usual demand, but we need the data. The right data! Which services are used in a typical sale? Which are the ones that need more power when the traffic increases? Can we determine that from historic data or, even better, did we do that already?
Context matters
This is where the context comes into the game. Can I tie the cash register process including payment options to specific services that are running? How can I find this out? Do I know what happens when 2 sacks of coal are bought at the POS (point of sale)? Can I provide business metrics to the management to show the difference in sales that the good weather initiated? Which of my cloud instances are needed the most? Do we need more CPU or more memory or both?
Ideally I get this from my O11y data where tracing provides me the base for a dependency mapping. That in turn allows me to map infrastructure metrics to a business case (e.g. a cash register process to the CPU and memory used to run it) – but only if I can bundle the data according to context. That can be a business usecase, hardware location or service location, etc. – all provided via tags in the spans of the traces collected, the naming of the metrics and a naming convention, that everyone is following.
Experience counts
And it is tremendously helpful when you have the experience that allows you to select the correct data. Dashboards that turn green, amber and red do need calibration otherwise they are useless. This calibration comes from experience, data or/and some nifty algorithms. But selecting the right bits of data for meaningful information is key. A datalake is exactly that: A big hole in the ground filled with data that you need to give a meaning or drop it. O11y gives you the data and the ways to work with it, but you decide what you need for the developers, the operators, the DevOps team, the business managers and all the others who rely on the IT systems to make their job. Ideally you only need a fraction of the data that is provided but the context allows you to pick exactly the right ones so the I can enjoy a proper barbecue with friends and without sunburn.
Can context help me get budget?
Yes it can and it is absolutely essential for any budget discussion. O11y is an overlay function which means that it is not connected to any revenue creating activity and treated as a cost center. So we need to justify each cent that we want to spend, each hour we invest into building the system. Getting a business context is the best and easiest way to get funding by showing the impact of good performance on the revenue – and that the good performance was only possible because we knew how things are behaving through O11y.
Conclusion
Context is absolutely essential in O11y to enable you to not only measure things but create the connections and the meaning necessary for it to be useful. Measuring CPU consumption itself doesn’t have a value – using it to sell more items in your store, keeping the system reliable and the costs low and prove it to management is only possible when you have conext. And it helps to cut through the complexity as well by guiding you to the important bits.
Book a call with me if you like to talk about your O11y strategy.






Leave a Reply