Analytics in E Major
The Echo Nest marries technology and analytics to connect music lovers with an ever-changing database of sound.
Topics
Competing With Data & Analytics
The music business has been radically changed by the move to digital media. Not only can music be downloaded with just a few clicks to any number of devices, but there is also more music being made, and more sources from which music lovers can get it.
In fact, the sheer mass of sound out there in the digital universe can be overwhelming.
The Echo Nest, a self-described “music intelligence” company recently acquired by Spotify, uses machine-learning technology to connect people with music they love. The company’s goal, says CEO Jim Lucchese, is to do “what a great deejay does, or the friend that you rely on musically: to better understand who you are as a fan, understand all the music that’s out there and make that connection.”
In this interview with MIT Sloan Management Review guest editor Sam Ransbotham, Lucchese describes how the company merged two perspectives — machine learning and cultural analytics — to describe music in a way that made it analytics-friendly, with the goal of using analytics to help users find new music they’d enjoy.
Let’s start with the basics: How do you describe what your company does?
We describe ourselves as a “music intelligence” company. And we really view our job as understanding all of the music that’s out there in the world, and understanding each individual music fan, to do a better job connecting the two — do a better job connecting people with the music that they love. And we apply a lot of technology and machine-learning approaches to do that, but really, at our core, what we’re trying to do is what a great deejay does, or the friend that you rely on musically: to better understand who you are as a fan, understand all the music that’s out there and make that connection.
We do that, particularly prior to the acquisition [by Spotify, in March 2014], as a B2B company, so we’re in the business of providing this data platform to other consumer-facing services. Now, most of our work, or much of it, is focused on improving the Spotify user experience… but that would be my quick description.
The Echo Nest is in a unique position — I think 2007 was the start — of being truly digital native. So, my assumption is that your data, working from an evidence base, is pretty core and central to what you’re doing. Is that fair?
Yes. So, I think now, I think if we’re talking about probably more external product strategy versus internal operational strategy, it’s true in both cases. But I think one of the advantages that we had — you hit on it — was in the world of music intelligence approaches to trying to understand music, we came along at, I think, a very opportune time, in that music had just transitioned from physical retail — going to a Tower Records — to digital.
And what that left anyone in the consumer-facing music space with was a huge challenge, where there were 20 million songs now available on a hard drive in the sky, but there had been very little work in trying to understand such an enormous amount of music in any meaningful way.
And the previous approaches were largely manual. They were editorial. So, for example, one of the larger players in this space, they hired editors to write information down about music, which is a great way to get a certain level of understanding, but, obviously, has a scale problem.
Oh, certainly.
These guys actually bought a music retailer so they could kind of get earlier access to the physical CDs, so that they’re literally shipping physical CDs around and paying editors to write about them. Whereas, our approach is all software and machine-learning based.
So, if you come into this with no legacy processes or approaches that are based on the kind of physical or manual editorial world, I think it is a lot easier to move more quickly and to move at a level of scale that just wasn’t possible even a few years before we really got going.
That actually speaks to how you operationalize things internally. So, how do you get that machine-learning piece plugged into an industry that maybe doesn’t come from that point of view?
That’s where I think another huge opportunity that we exploited — one that was so central to our success — was the open developer API [Application Program Interface]. So, our first product was our developer API, and it is still our primary product today. Not only did we kind of enter the market with a data analysis and synthesis approach that was novel, we did it at a time where the idea of a developer API was young but generally accepted — basically, we could walk into that office that you described and say things like, “We can generate a great radio listening experience on any artist you guys can come up with.” We could show that, in real time, through the API.
And prospective customers, literally developers in that room, could build on it right then, I mean right there in the meeting. Compare that to kind of the previous approaches to Enterprise Software where a shrinkwrapped box shows up, and they need a bunch of guys to drop a server into a room. And it could be literally months and hundreds of thousands of dollars of risk that someone needs to absorb to even get an understanding as to whether or not a technology provider solution could work.
We leave you with an open developer key. Your developers can build on our stuff. We don’t even need to be there, and you can test us all you want — and not only can you test us, you can start prototyping. So, that really dramatically lowers the time to evaluation and ultimately to implementation. And then you go back to our kind of entrenched competitors who came from more of the Enterprise Software, and what you get is, “We’ll send a couple of engineers to your site for a couple days.”
I mean, we could just move at such a level of speed. That was quite simple, and if you had a small front-end development team, that’s really all you need to know. You don’t even need to believe us on the underlying science or technology when you can see the actual results in the developer API. So, I think that the distribution approach through an open developer ecosystem was equally important in getting us out there.
So as I understand it, your only asset is really the data and algorithm? Is that fair before I go much further?
No, I don’t think so. It depends on the nature of the data. And so, to back up a little bit, I can give you a quick baseline around how we do what we do, and from there, I think that will probably help inform the what we’ve got question that I think you’re asking me.
So, how we do it? Basically the vision of kind of understanding the world of music is a combination of Brian [Whitman] and Tristan [Jehan]’s research, both of which came out of their work at [MIT] Media Lab and before.
Yeah, so Tristan’s background is in machine listening, so we’ve developed a piece of software called Analyze that can actually listen to music. It can analyze a song in a few seconds, a full-length song, and render psychoacoustic attributes, like the pitch of the song, the key, the tempo, whether there’s vocals in it or if it’s instrumental, whether it’s live or studio, the overall energy of the songs, things that kind of indicate mood — all kinds of things. It basically informs with software how people describe the songs based on what they hear. That’s half of this musical brain that Brian and Tristan built.
The other half is Brian’s side, which we call cultural analysis, which is large-scale web crawling and language processing, where we index and read about music everywhere on the web. We probably index about 10 million documents every day, analyzing the text on those documents to understand how the online world is describing every artist, album and track, and then doing analysis around who’s doing that describing.
So, you combine the two, and you get an understanding of how the music sounds, and then you’ve got all of this social or cultural signal. We put all that together through a wide series of web services, APIs, to generate playlists, make recommendations, do all kinds of things.
So, when you ask what are the assets that we’ve got, there’s this software to analyze. There’s the actual analysis. There’s the enormous web crawling infrastructure and natural language processing infrastructure to understand what’s happening on the web all the time. And then there’s all the data that we kind of derive and arrive at from that analysis, and the algorithms that kind of mix-and-match that stuff together to solve specific use cases.
And so when we license, we license access to those APIs or web services, and we license any of the incidental Echo Nest data for a term that would be available to a customer.
So, I think the data is not just “out there”; most of the data that we are applying is Echo Nest data that is licensed incidentally to someone for a term. And it’s really in the kind of — I agree with your second point, in that the algorithms and technology and web services that make use of those data [are] really what our customers are most focused on, but they’re really licensing everything kind of in a package. If you ask them what’s most important, they’d probably say the solution. They know that—
The end result is what matters, as long as the data sets you use to get to the result are accurate, then?
Yeah, exactly. They know the data has got to be good or they’re going to get bad results, so they probably say, “Yeah, the data needs to be there.” But ultimately, they want to generate the right recommendation, or playlist, or provide the right context for a listener. And so, it’s a combination of the quality, depth and accuracy of the data, the infrastructure to kind of keep that moving all the time and growing all the time because, as you know, music is changing every day. The trends are changing, and you need to be on top of that, and then the services make all that usable in a specific application.
So, when we position it to an application developer or our customer, it’s really just about just the problem they’re trying to solve. But if you break down the component parts, I would say it’s the mix of all of those things that really kind of are the collective asset.
It seems like you’re on the far edge of an ecosystem that’s developing around the idea of algorithms and data doing what maybe 100 years ago the machines were doing in production. I’m kind of curious where you see that going — what your projection or thinking is for the future of information-based products?
Data alone in massive quantities is a business challenge, but I don’t think gets you lots of ton of value. I think really the key is where you can not only bring in massive amounts of data, but then make sense of it in a way that’s also a specific problem. In our case, it’s help understand this individual consumer end-user, help understand this music fan, and then help understand this massive catalog of music to connect those two things, and then we do that as a service layer.
And I agree with you that there are enormous opportunities there that go back, I think, to those two differentiating factors that we started with. Number one is the scale of data processing that we can enable and then number two is delivering that in a really simple developer platform to make it easy for a specific customer base to get at it. That applies to, I think, all kinds of industries.
Music is a great problem set because there is such an enormity of music out there, and it’s constantly changing, as are music fans, and there’s a diversity of music applications out there who want to connect the two. But I think you could look at all kinds of other markets out there that check a lot of those same boxes. And where those boxes are checked, I think that’s where there’s still a ton of growth — not only for data-oriented companies, but ecosystem players, if they can figure out ways to make the data and understanding available in an open-type platform.
I’d say that when we talk about it internally or at a board level, we used to refer to the model as data as a service, which takes the idea of — the approaches to data as our core asset are our ability to generate and make sense of it, but then positioning it more like a SAS business. I think that basically our business model has been — I mean, our priorities have changed a bit post-acquisition.
But I think our pre-acquisition B2B business model was a recurring license fee for a term where there’s a lot of lock-in, because once you start relying on a SAS service, whether it’s a Salesforce, or The Echo Nest, or whatever, the switching costs are very, very high. I think they’re even higher when you’re talking about a core underlying data service that powers your application. I mean, that becomes central or foundational, but it’s also very capital-efficient in this one version of The Echo Nest product. And it’s sitting as a web service that is supporting lots and lots of customers who are doing lots and lots of different things where all the service and support lives right here.
So, I think just stepping back, a business model, the idea of solving specific data-intensive problems but doing so through a web service, allows for that type of a really high lock-in, capital-efficient model where you can serve a relatively large market space — and a diverse one — with a common set of tools.