A post on this topic feels incomplete without a shout-out to Charity Majors - she has been preaching this for a decade, branded the term "wide events" and "observability", and built honeycomb.io around this concept.
Also worth pointing out that you can implement this method with a lot of tools these days. Both structured Logs or Traces lend itself to capture wide events. Just make sure to use a tool that supports general query patterns and has rich visualizations (time-series, histograms).
> she has been preaching this for a decade, branded the term "wide events" and "observability",
With all due respect to her other work, she most certainly did not coin the term “observability”. Observability has been a topic in multiple fields for a very long time and has had widespread usage in computing for decades.
I’m sure you meant well by your comment, but I doubt this is a claim she even makes for herself.
She has been an influential writer on the topic and founded a company in this space, but she didn’t actually create the concept or terminology of observability.
> A post on this topic feels incomplete without a shout-out to Charity Majors
I concur. In fact, I strongly recommend anyone who has been working with observability tools or in the industry to read her blog, and the back story that lead to honeycomb. They were the first to recognize the value of this type of observability and have been a huge inspiration for many that came after.
Could you drop a few specific posts here that you think are good for someone (me) who hasn't read her stuff before? Looks like there's a decade of stuff on her blog and I'm not sure I want to start at the very beginning...
- Software Sprawl, The Golden Path, and Scaling Teams With Agency: https://charity.wtf/2018/12/02/software-sprawl-the-golden-pa... - introduces the idea of the "golden path", where you tell engineers at your company that if they use the approved stack of e.g. PostgreSQL + Django + Redis then the ops team will support that for them, but if they want to go off path and use something like MongoDB they can do that but they'll be on the hook for ops themselves.
- I test in prod: https://increment.com/testing/i-test-in-production/ - on how modern distributed systems WILL have errors that only show up in production, hence why you need to have great instrumentation in place. "No pull request should ever be accepted unless the engineer can answer the question, “How will I know if this breaks?”"
Most of that one still rings very true to me. I particularly liked this section:
> Let’s start here: hiring engineers is not a process of “picking the best person for the job”. Hiring engineers is about composing teams. The smallest unit of software ownership is not the individual, it’s the team. Only teams can own, build, and maintain a corpus of software. It is inherently a collaborative, cooperative activity.
Right now, we are in a transitioning phase, where parts of a team might reject the notion of using AI, while others might be using it wisely, and still others might be auto-creating PRs without checking the output. These misalignments are a big problem in my view, and it’s hard to know (for anybody involved) during hiring what the stance really is because the latter group is often not honest about it.
Honeycomb is inspired by Facebook's Scuba (https://research.facebook.com/publications/scuba-diving-into...). The paper is from 2013, predating honeycomb. Charity worked there as well, but presumably was not part of the initial implementation given the timing.
I've learned more from Charity about telemetry than from anyone else. Her book is great, as are her talks and blog posts. And Honeycomb, as a tool, is frankly pretty amazing
> They were the first to recognize the value of this type of observability
With all due respect to her great writing, I think there’s a mix of revisionist history blended with PR claims going on in this thread. The blog has some good reading, but let’s not get ahead of ourselves in rewriting history around this one person/company.
> I think there’s a mix of revisionist history blended with PR claims going on in this thread.
I can only speak for myself. I worked for a company that is somewhere in the observability space (Sentry) and Charity was a person I looked up to my entire time working on Sentry. Both for how she ran the company, for the design they picked and for the approaches they took. There might be others that have worked on wide events (afterall, Honeycomb is famously inspired by Facebook's scuba), she is for sure the voice that made it popular.
This post was so in-line with her writing that I was really expecting it to turn into an ad for Honeycomb at the end. I was pretty surprised with it turned out the author was unaffiliated!
Also worth pointing out that you can implement this method with a lot of tools these days. Both structured Logs or Traces lend itself to capture wide events. Just make sure to use a tool that supports general query patterns and has rich visualizations (time-series, histograms).