I got a fascinating email a few weeks ago, the essence of which said “the reason all these data products don’t end up working is most don’t take engineering design into account.” I’ll be honest, my first instinct was to say “nah, you just don’t understand”, but over the past 2 weeks I’ve asked some staff engineers at various companies about their opinion on this issue. I heard comments about architecture, system design, scalability, latency, a lot of other terms ending in “y”. I came away with a few new ideas that pushed me to think about design of data products.
Great post. It might be because the data products I’ve managed have always been in their first stages (and in orgs that didn’t have a pre-existing data product culture), but another one I’d add to your list is what I’ve come to call “productisation”: Often the data product wasn't only not scalable in the sense that if we had 10x more users it might break (sometimes true, others not), but also that it wasn’t generalised enough to suit new users’ usecases.
I’m not talking about new features here - just that the (in my mind) standard practice of parameterising inputs wasn’t something the data scientists who’d built the first or second iteration of the data product we’re accustomed to, so a lot of logic was hardcoded rather than flexible/extensible/generalised. Sometimes this was an easy fix (replace “2021” with “param_year”), other times it required a big refactor. What I this is noteworthy here is that building it the “productised” way wasn’t much more work than the other (so it’s not your typical case of tech debt in the name of getting an MVP out asap). Instead, the challenge was just that the devs didn’t have the software background that would’ve made that decision obvious (and for one reason or another those who did weren’t listened to by DS or leadership)