Data Products Can't Be Black Boxes - No Matter How Technical They Are

Jan 27, 2022

“The conversion rate on that landing page is diving, do you have any way to help us figure out why?” We’ve probably all been in this situation. All hands on deck for a short-term emergency in a business. Now imagine that the reason that dive is taking place is because a particular model or recommender was optimizing for something not well understood by your partners. On top of that, imagine that the person who built that data product has moved on to a different company.

If you’re terrified by this scenario, and of the looks you’ll receive giving this explanation, you’re honing in on why technical data products shouldn’t be so different from other products: we must be able to show how they work and why they are doing what they are doing. You might be nodding your head, saying “well, I know that, everyone knows that”. Doing this is easy to say and think about conceptually, but executing on it is much more challenging.

There are many reasons why saying a data product shouldn’t be a black box is much harder than doing it in practice:

“Unboxing” something from black box to interpretable doesn’t mean the same thing for all audiences. Making a data product interpretable to an audience who is familiar with ML or statistics is probably quite different from making it interpretable for a broader product and business audience. This requires knowing your audience and your customer and creating visibility that helps each of them.
It is hard to get dedicated time to work on these issues. There is always something higher priority moving or “on fire” in a business. Making something understandable by a company too often gets conflated with “technical debt” that can be pushed down the road. As with technical debt, the ways in which not addressing it in the moment negatively affects the business are often far more impactful than anyone realizes.
The knowledge base to do this type of work is much smaller than we’d like to think. On one hand, the likelihood that the person who built a system is still with the company at the time you realize this stuff is a problem is quite low. On the other hand, that person who has the background is probably in such high demand that giving them the time to do this won’t be prioritized in the right way.

There are many other reasons beyond those three issues above (this article could go on forever with stories of why organizations are often bad at visibility into complex decisions). But I want to use this space to be more practical: how can we make existing data products and future data products we will build more interpretable and less like a black box for the organization? I have some rough ideas to get started.

As everyone who is in product says, start with the customer. In this case, for both existing and future products, define who is using them, who needs to understand them and who needs visibility into how they create impact. Don’t start with “this shouldn’t be a black box”. Start with, “this customer needs to be able to understand how this product did this particular thing”. For example, if you rely heavily on a recommender system or experimentation platform, how do you expose the relevant information to the end user so they can trace how something happened? What does this look like for a product manager? What does this look like for a data scientist? This is likely a detailed, lengthy exercise, but one that is critically important.
Once you’ve identified exactly what the “unboxing” process looks like, prioritizing the potential ways to do it is key. Let’s be real, it is pretty unlikely that you’ll get universal buy-in on doing everything you’d like to do. So you need to be prepared with your “top priorities” if you want to get investment and approval to focus on them. Prioritization should result in a statement like this “if we make this particular change or choice with this data product, users will be able to understand [blank] about how something happened, which does [blank] for the business. If we do not do [blank], we risk [insert bad things that could happen]” This doesn’t apply universally, but I am confident that it will help create a filter for you on what really matters.
Create a plan to coach users through how interpretability matters to them. Whether it be a complex dashboard, a statistical model, experimentation platform, a recommendation engine, or any type of technical data product, customers are unlikely to read an email that says “things are now explainable and easy to see!” and suddenly change their behavior patterns and ways of engaging with the product. Instead, they need in the moment guidance and explanation of what something new does for them. This isn’t just an educational module. This means having people ready to help customers leverage these new capabilities in the moments where those capabilities can help address problems or needs.
Create a regular report out of impact of the data product that leverage this interpretability and openness of the system. If customers feel like the systems that were previously black boxes to them now can be cracked open and understood, you’ll be surprised at the engagement you can create from a quarterly or monthly summary of what the product did and how it affected different efforts at the company. Visibility and communication around impact are critically important aspects of a data product. If you don’t shout it from the rooftops, you’re unlikely to change others’ mental narratives about how that product works.

As data products take on an increasingly important role for affecting both human in the loop decisions (e.g. experimentation, inference) and automated decisions (e.g. in recommender systems), how we expose how those products work to key audiences and customers with the company becomes ever more critical. The challenge in front of us is pretty huge, but is one I think is worth investing in deeply.

Thanks for reading! I would greatly appreciate you subscribing and sharing if you found this helpful!

Stephen Bailey

Jan 29, 2022Edited

Love this sentiment, and I've got some "down with black boxes" content of my own I'm working through. But as I was revisiting this concept a second time, I had this thought -- "I don't understand the internals of half the systems I work with -- so sometimes black boxes must be okay, right?"

If a product is reliable, unsurprising, and useful -- is a black box okay for end users? For example, I don't understand deeply how my database system works, but I rely heavily on it. (Although I also trust that adequate documentation and SLAs are in place.)

Do you have any thoughts on when it's okay for the customer to not understand what is happening in the system?

Expand full comment

From Data to Product

Discussion about this post