Welcome to the very first episode of Product to Product, Roadmunk’s brand new podcast! 🎉
Listen to the first episode below:
For context: As a company that builds roadmapping software for product people, we’re immersed in the product community. We’ve seen how helpful product people are when it comes to helping their peers. They’re open to sharing their own experiences of working in product (both the good and the bad) in very honest ways—and they’re very generous with advice.
The candid conversations that are happening in product communities online make for excellent podcast material. So, we decided to start one. Product to Product will feature two product people talking about one product-specific topic—like applying machine learning to the right problems and building a healthy PM culture (and what “PM culture” is). We have really cool episodes coming up with product managers, designers, marketers and engineers.
This episode features Inga Chen, a product manager at Squarespace. Inga manages the team responsible for Squarespace’s analytics platform—which their clients rely on to get data-driven insights about their websites and e-commerce stores.
Machine learning is a passion of Inga’s—and she’s specifically interested in helping other product teams build machine learning models that solve real problems for end users. Inga spoke to my colleague Eleni Deacon about how product teams can properly prepare for machine learning, gathering user feedback and what to do when a model fails.
The episode can be listened to above, and we’ve also included a transcript below. You can subscribe to Product to Product on iTunes (here) and Google Play (here), or get the latest episodes delivered to your inbox by subscribing here.
Eleni: Can you clarify the difference between AI, machine learning, and deep learning?
Inga: AI is the umbrella over machine learning. AI, to me, is our hope for human intelligence exhibited by machines. There are multiple goals of AI, which are all facets of human intelligence. We can reason, accumulate knowledge, plan for things, manipulate objects, and communicate with each other with our language. Learning, of course, is a part of what makes us human.
Machine learning is the ability to learn without being explicitly programmed. The reason why machine learning has become so popular in recent years is that teaching a machine to learn—or building a machine that can learn on its own—can achieve all the goals of AI. It can learn how to reason, and understand our language, and perceive and move objects. You give a machine learning algorithm lots and lots of data and that algorithm learns the concepts around this data. It’s then able to make a determination or prediction about something in the world.
Deep learning is the newest field of machine learning, and it has really catapulted us into the renaissance that we’re in today. The “deep” in deep learning comes from multiple hidden layers of transformation in data. Examples of things that are enabled by deep learning are self-driving cars or the Google Translate app in which the camera on your phone can immediately translate text that it sees into a different language.
Eleni: You’ve talked about how there are two schools of thought in the PM world about machine learning. Can you define how you see PMs’ views of machine learning?
Inga: There’s the one camp that says, “I’m not ready for machine learning.” That’s what I hear very often from PMs—“Machine learning is really exciting, I see the potential for it to transform my problem area or my product, but we’re not ready.” The other camp is, “I’m so excited about machine learning—I want to use it for every one of my problems.”
Eleni: What would you say to those PMs saying they’re not ready? Is it something that’s more accessible than people actually think it is?
Inga: Yes. I think you can say, “I don’t have the data for it yet,” but if you ever want to be ready then you should start thinking about the data that you’ll need and start collecting it now. It takes time to collect data and to make sure that it is formatted properly. When you do get that data, and you do have a machine learning engineer take a look at it—you might need to make further product changes to collect a different type of data.
Think about the relationship between the inputs and outputs of your data, and how predictive it should be. An example is the Gmail spam filter. If you can manually outline what makes an email have spam characteristics versus not—maybe a flag is the word “hacker” or that it was sent from Nigeria. Getting the intuition around your inputs and outputs is one thing you can do before building a full machine learning team.
Eleni: I really like this idea that you have about assessing whether machine learning is the right solution to apply to a problem. How did you come to such a practical view of machine learning?
Inga: I think it’s a combination of talking to machine learning engineers and asking them, “What do you think makes a good machine learning problem? What excites you about machine learning problems?” It’s also talking to other product managers, reading blog posts by companies that have used machine learning in their products and seeing why and how they arrived at that decision.
Then it’s just kind of trial and error, honestly. The heuristic I use is to determine whether a solution is going to be 10x better if massive amounts of data are utilized to make a prediction. It’s important to consider the manual option. Can you get 80% of the way there with a non-machine learning solution?
The things that machine learning can do, like predicting stock prices or recommending products are things we can all do as humans. If you manually do that on a small scale for a few users—like recommending products for a few users after studying their purchase behavior—you can gauge if it’s actually helpful. This is before you’ve done any machine learning, maybe before you’ve collected data to see if it’s important and if it solves a real pain point. If it does, then great. Ask yourself again, will lots and lots of data make this prediction or recommendation even better? And if it does, then that’s probably a good machine learning problem and you can start collecting data for it.
Eleni: Are there any other questions that PMs should be asking themselves when they’re evaluating whether or not to apply machine learning to a particular problem?
Inga: It’s really important to think about when the prediction of the model fails and what you’re going to do about that/how bad that’s going to be. Predicting stock prices or predicting health outcomes have very different ramifications if you fail—compared to recommending a product.
I think even for problems that are less serious, user trust is really important. You’re going to be wrong. You’re making predictions with some level of probability. In the 30% of the time that you’re wrong, can you design an experience that doesn’t make your user say, “This recommendation doesn’t really know me well so I’m not going to use it or trust it again”?
Eleni: What would you recommend to avoid this problem—where users are put off by the machine so clearly being a machine that’s getting it wrong?
Inga: It’s setting the right expectation and gathering user feedback. This depends on what the feature and the use case is—but it’s important to educate the user that your model will improve over time. The more they invest into it, the better predictions or recommendations the product can provide.
An example here is Netflix or Spotify Discover Weekly. My friends are in two huge camps of “Spotify Discover Weekly knows me extremely, extremely well and they always suggest songs I’ve never heard of but that I love” and then the other camp is like, “Spotify Discover Weekly is completely off. I will never look at it again because it always gives me terrible songs.”
I think you can do a better job of setting the expectation by being upfront. Present the idea that, “These are the top five songs we think are strong matches for you, but then we added another 10 that are kind of shots in the dark.” The user will understand that a risk is being taken and it may not be a perfect match.
It’s about transparency.
Eleni: How does machine learning come into play in your day-to-day work?
Inga: We’ve grown significantly over the past few years and we have millions of Squarespace websites on the platform, and our user base is as diverse as the internet is. We’re always finding out about new interesting Squarespace websites, but doing this in a manual way is not scalable.
We’re constantly thinking about website templates that best serve our users’ needs. We have a few different verticals like food/drink and building an online store—but there are verticals we’re missing when we identify those manual categories. So we built a tool, called visual search, to get to know our users more intimately and make sure that we’re not missing any pockets.
Visual search is an internal tool that our machine learning team built. It allows us to take one website and then look for websites that are similar to it. We can search for any term and get websites that match the search term. If you searched “safari” within our visual search engine, you would get a bunch of different customer websites that have either “safari” within the content of their site or pictures of, say, lions. The cool thing about visual search is that we didn’t have to tell the machine learning model, “Watch out for ‘safari’ and make sure that pictures with lions make it into that search result.” That’s kind of the magic, and also the black box of machine learning. We’re able to do this on millions of websites and in a way that is infeasible for humans to do.
Eleni: How did you guys decide that this particular issue was a good one to tackle with machine learning?
Inga: I like to think about three different types of problems that are great for machine learning. One is a decision with many inputs. One is a decision with many outputs. And another is making decisions at scale. Visual search at Squarespace is the last two.
We now have millions of websites. When we had ten thousand, we could have potentially hired a team to go through them all and tag each of them. But there are too many now and we have everyone in the fringes of the different verticals. We thought, “How can we do this at scale, in a way that we can continue to do as we grow even bigger?” This indicated that this is a great machine learning problem and that we could not do this the manual way.
Eleni: What goes into planning a machine learning initiative? Walk us through your process as you start to undertake something new.
Inga: We start with learning about the problem and talking to users. Then we prioritize that problem and put it on the roadmap. We build the solution, launch it, and measure it. Rinse and repeat. That’s software product development at a very high level.
During the learning stage, if it is a machine learning problem, that’s when we would classify it as such. It’s also at that stage where we ask, “Do we have the right data for this problem and do we need to make product changes in order to gather that data?” We also consider the non-machine learning alternative, the manual option and whether it can get us 80% of the way there. We make all of those determinations and then if we have to gather and prep data, we do that. The build phase happens once we’re confident that the data and its format is what we need.
During the build stage, the machine learning team tests different algorithms and builds a bunch of different models. We then establish KPI’s that we’re going to evaluate those models on. I like to think of this stage as a controlled aiming test where it’s very clear which model is most predictive.
It’s also at this stage where you might have to throw away a few models, or all the models if you find out that none of them are predictive enough for your use case. I think accuracy in this stage will vary widely based on your use case. For example, product recommendations: you can probably get away with being less accurate there. But with something like predicting health outcomes, the consequences of being wrong are so much higher that you’re accuracy is going to have to be so much higher.
Once you launch a production model, I think it’s really important to get user feedback on the predictions the model makes. Then you can identify weaknesses in the model and you can start gathering more data in that specific area to feed back into the model to make it better.
Eleni: One thing that is not super known when people talk about machine learning in broad strokes is the sheer amount of data needed. Can you speak to that a little bit?
Inga: I think that’s why it’s so important that if you say you’re not ready for machine learning yet, you should still start collecting the data that you think you’ll need. Again, it takes time to gather that much data and get it in the right format. But then also look at public data sources. It varies widely based on what industry you’re in and what problem you’re solving. If you’re in the legal or political realm, there’s so much data out there for use—even in the healthcare realm. Data is really distributed and messy in healthcare systems and hospitals, but it’s still possible and companies are doing it.
Eleni: Where do user interview and user testing come into play when you’re building a machine learning model?
Inga: I think it’s so important to do a lot of it up front. When I was talking about validating the manual solution with users, you do that up front before you’ve even done machine learning.
I think it’s also important to get user feedback once you launch in order to monitor if your model has weaknesses. You’ll see a lot of products do this. For example: Slack Highlights, which predicts the messages it thinks will be important to you, always includes a little feedback mechanism that asks if it was helpful or not. Google does that with a lot of their search results, too.
There’s definitely a balance at the launch phase. You don’t want to wait too long to push a model live because you want the user feedback as soon as possible. And I’ve always been surprised about user feedback that I’ve gotten.
Eleni: So how do you know when it’s that sweet spot—when it’s time to launch?
Inga: To the extent possible, you should dog food internally. You’ll know you’ve hit the jackpot if people internally use it, really engage with it and say, “Wow, this feels like magic.”
Eleni: It’s interesting that you use a word like “magic” to describe something so technical.
Inga: I think to the end user, it should feel like magic. It has to feel like something they weren’t expecting you to know about them. But internally, I think it’s dangerous to think of it as magic. If we think about it as magic, then we expect that it can do anything. We should know the limitations.
I think the only part that is a little bit magic, particularly in deep learning, is that you might not always know how a machine learning model arrived at a specific prediction. However, this makes it really hard when you’re trying to debug things and find model weaknesses.
Eleni: You spoke earlier about making sure that the output from the machine learning product reflects what the user actually wants. How do you think about any biases that the learning process could be surfacing? And how do you deal with that?
Inga: I think a good rule of thumb is garbage in, garbage out. How representative your data set is when you train the model is how representative it’s going to be when it starts predicting. This is a case for having a diverse set of people working on machine learning projects and problems. You might not consider edge cases if you haven’t thought about it before, experienced it or seen it. You also see this in the user feedback, and that’s a case for shipping your model sooner under a beta flag.
Sometimes your model will make really bad mistakes. Something that just happened with Facebook was that their ad algorithm allowed advertisers to target people based on hate terms that they put in their profile. The machine learning model presumably isn’t able to distinguish between a racist thing that someone put in their profile as “interested in”, or “works at.” The machine learning algorithm might allow you to target a bunch of people based on where they work but then someone trolls the “works at” part of their profile and says, “works at I hate Jews”, for example. The machine learning model doesn’t know this is a hate term but we know this as human. You have to be really careful in monitoring that, and I think Facebook just came out with an apology about that.
Eleni: Is there anything else PMs need to be aware of to make sure that sort of thing doesn’t happen?
Inga: I think it just goes back to designing for when the model fails—and making sure that the data that you feed into the model is really representative.
Eleni: How do you guys make sure of that at Squarespace?
Inga: We try to keep in mind that our userbase is extremely diverse and that we probably don’t know all of their different use cases and nuances. We try to not focus too heavily on the big verticals that we know and are familiar with. For example, we have falconry websites on Squarespace. And then we have Chance the Rapper and Pixar. We’re always thinking about how we can design solutions for the really small businesses and the large ones.
Eleni: Do you have any examples of models failing? Has that happened to you, and how did your team correct in that instance?
Inga: I think the model with always fail. It’s just about the amount of user patience and user trust you have—and what format you’re delivering your predictor determination in. I think it’s very different if it’s a mobile push notification or an email versus a thing that a user can easily dismiss or ignore. With push notifications, you need to be a lot more careful. If you fail a bunch of times and you send a bunch of push notifications that are useless, the user will never use it again.
There’s also a learning curve. Sometimes you might feel like a feature is really important, and you want to take the risk of being too intrusive and being too loud to get feedback. If you are less intrusive or less loud, you might not get the right feedback. It might be ignored.
Eleni: What are some of your favorite machine learning applications that you use in the real world?
Inga: I love the Google Translate app whenever I’m traveling because it can translate things in real-time—like menus at a restaurant or signs. I think that’s an awesome application of deep learning.
With Spotify Discovery Weekly, I’m of the camp that it is really predictive and great for me. I think they really try to make a cohesive playlist for you. It’s a great example of all of the additional product decisions that they made on top of the machine learning algorithm. The fact that Discovery Weekly updates every Monday was an intentional decision. You have to save the songs that you get every week in order to keep. It seems obvious in retrospect. It makes sense to get a new feed every Monday to look forward to, but that’s not super obvious when you’re designing something like that.
Eleni: When you encounter a machine learning process in the wild, do you try to dissect it? What’s your reaction when you see something like that?
Inga: I use a lot of different products and I seek out ones that I think use machine learning. It’s actually not super obvious all the time. Sometimes the company will publish blog posts about their models. I’ll also talk to my machine learning team and be like, “Spotify Discovery Weekly, how do you think they built that? What do you think the models they’re using are?” That’s how I continue to build intuition as a PM.
Eleni: Let’s say there was a product team who was really overzealous with machine learning. What would be your words of advice to that team to kind of reign it in and self-correct?
Inga: Machine learning is quite expensive and time-consuming. So if you don’t need to use it, don’t. I think of machine learning as one tool of many that I could be using. You should apply it to the right problem. There’s a list of questions that I go through where I’m like, “Do we have the data? Is the solution going to be 10x better if you used massive amounts of data versus making a rules-based solution?” Asking yourself all those questions will probably weed out a lot of ideas.
Eleni: On the flip side, if there’s a product team who’s reticent and they haven’t yet dived into machine learning, what would you tell them?
Inga: If you think machine learning could transform your product or your problem area (which I think it will in most industries), figure out what you can do today to get yourself ready for machine learning down the line—in six months, in one year. Do you have that data yourself? Are you collecting it? Is it in the right format?
If you ultimately determine machine learning isn’t going to transform your product area or product problem, then at least you considered it and you’re being intentional about it.
Eleni: What do you think is the most surprising thing you’ve learned about machine learning since you started working on it as a product manager?
Inga: In some ways, machine learning is still super, super early. It is actually really hard to determine how a machine learning model got to a certain decision—and sometimes when you’re trying to get to model weaknesses, you’re not actually really sure how it got to that decision. It is a little bit of a black box.
Eleni: What makes you really excited about this technology?
Inga: The fact that we’re now in a world where there’s so much data everywhere. The Internet of Things has given us so much data for physical objects that we can now harness all of it to help make our lives better and easier.
Machine learning is able to harvest and process a lot more data than we can as humans. We can use the power of machine learning to help us improve our lives and solve our problems so we can hopefully focus on even bigger problems that only we can solve. I think there’s all this talk about, “Will AI eventually overtake us and become our overlord?” I think we’re still far away from that.
Eleni: What is your reaction when someone says something like that? Is AI going to become our overlord?
Inga: It’s funny. When you dig deeper into the state of the art of machine learning, you’ll realize how early it is. We are very, very far away from a sentient AI that is able to perform all the complex human tasks that we do over our lifetime.
I think a good example or heuristic of that is from Andrew Ng, one of the major thought leaders and contributors on machine learning. He says that tasks that take less than one second of human thought are great for machine learning. These tasks are not that complex. They’re a lot more complex than maybe rule-based solutions, but they’re not very complex yet. That’s kind of where we are so far.
Eleni: Inga, thanks so much for doing this chat. It was really awesome talking to you.
Inga: Definitely. I really enjoyed it.