To Build a Data Product Know Your Data Assets

Nov 24, 2021

“I had this really awesome user experience product idea that would make the checkout process really personalized and relevant. We created the business case and everyone was interested, but it fell flat. It turns out the data we needed wasn’t available and what we had, we couldn’t really rely on.” The land of great data product ideas is vast. The part of that land with viable ideas is quite tiny. Why? Because discussing using data in the abstract is one thing. Actually digging into it and knowing it well enough to produce a viable product is another entirely.

As data becomes more available, in higher volume, at a more rapid pace, there seems to be an assumption that this will enable more and stronger data product ideas. But ask anyone who works with data regularly, and you’re sure to hear the phrase “Garbage in, garbage out”. This phrase doesn’t just apply to doing analysis and modeling. A data product built on shaky foundations is just a house of cards that will eventually tumble down. That is why I’m surprised by the discussions around data product that don’t explicitly dive into knowing the data deeply. Put simply (I don’t believe) we can build data products without fairly deep expertise in the data we rely on and want to use.

Here’s what I worry about most: a gigantic cohort of passionate data product managers who love product management but don’t love data just as much (or more). This is why I care so much about this field and domain - because we cannot just be one or the other, we must care deeply about both spaces. Yet I understand that is easy to say, and much harder to do. What does it look like for someone passionate about product management to spend more time on data? What does it look like for a data aficionado to spend more time on product management? I have a few ideas, but also want to remind you - I don’t have all the answers. Hopefully, I have good enough questions to spur discussion and thinking. Here are 5 ideas.

There is no replacement for understanding the mess that data usually is under the surface. None. I fundamentally believe that the most important aspect of working with data is understanding just how challenging and messy it is. Logging, storing, transforming and making data available to use in a structured way is an immense challenge. This is also why I think data scientists with a product interest have an advantage: they understand the challenge of actually using data in a product and the investment it requires. So, what does this mean? Work with the data you want to use. Do summaries. Graph distributions. Ask your data partners to give you feedback on the datasets you’re working with.
Define the minimum viable data for your product. We talk a lot about MVPs for product. What about the MVD, or minimum viable data? I notice that when we discuss data products, the discussion tends to center on the best case scenario, where everything is in reasonable form and is useful to the customer. I like to ask a different question: what’s the worst we can do and still deliver value for the customer? That doesn’t mean we are trying to deliver a bad product. It means we need to understand the difference between the minimum needed to make a product and the best version of that product.
Map out the investment required to either get to MVD (from point 2) or to move from MVD to best case scenario. I’ll admit, writing this point made me think of the immense number of meetings and sets of input I would need to create this mapping of investment. But then my mind wandered to all the meetings where I’m asked “what would it take to get there?” with no one able to provide a clear or reasonable answer. The most likely scenario is that you will not have the data in shape to support the product idea that you want. This means you need to make an investment case. Do you need data scientists, data engineers, ML engineers? For how long, doing what? Hard questions. But incredibly important answers.
Think not just about the immediate data, but what it takes to sustain the investment in your data product over time. Imagine this: you ship a great personalized recommendation product and the underlying dataset is managed by a data engineering team with expertise in this particular dataset. Then the engineering team reorgs and you are left without anyone to support the product. What do you do? In a best case scenario, you’ve already created an understanding of support for your product, so even if teams move, you retain the subject matter experts needed to keep your product running!
Seek out disconfirming evidence about your data. When we look at data, we often take the approach of “what makes this work?” We contort ourselves into a bunch of assumptions and reasons that we can rely on the data, ultimately making a potentially shaky decision to ship a product. Instead, ask yourself, “how could this data go wrong?” While it is scary, I promise it is powerful to look at a dataset and think about what could go off the rails. Because as we all know, when it comes to data, going off the rails is probably more likely than staying on them.

With these ideas in mind, imagine how this scenario could go differently:

Previous: “I had this really awesome user experience product idea that would make the checkout process really personalized and relevant. We created the business case and everyone was interested, but it fell flat. It turns out the data we needed wasn’t available and what we had, we couldn’t really rely on.”

New: “I had this really awesome user experience product idea that would make the checkout process really personalized and relevant. We spent a ton of time looking at the data and decided that there’s a bunch of risk here. We could mitigate the risk with investment in these datasets, but here’s what it would take. We created the business case to help us talk through this. Let’s dig in.”

Quite different, right?

Have a wonderful Thanksgiving holiday for those in the US and a Happy rest of your week to those elsewhere!

Troy Hodges

Really like the concept of Minimal Viable Data, as well as investment needed to improve dataset governance. I feel like the conversation often stops as "good" or "bad" data, so just prodding the stakeholders involved to quantify data quality and think about concrete improvement steps is a huge win.

Expand full comment

Sonal goel

Dec 29, 2021

Really liked #5.

From Data to Product

Discussion about this post