MIT SMR Connections

The content on this page was commissioned by our sponsor, AWS.

MIT SMR Connections is an independent content creation unit within MIT Sloan Management Review. We develop high-quality content commissioned and funded by sponsors. We welcome sponsor input during the development process but retain control over the final product. MIT SMR Connections operates independently of the MIT Sloan Management Review editorial group.

Learn More

Twitter Facebook Linkedin

In this Q&A with MIT SMR Connections, Rahul Pathak, general manager for analytics at Amazon Web Services (AWS), describes why data-driven enterprises matter, explains how companies can build strong data foundations, discusses technical and organizational challenges, and offers forward-looking advice.

This conversation has been condensed and edited for clarity, length, and editorial style.

MIT SMR Connections: How do you define a data-driven enterprise, and why is it important?

Pathak: Data-driven enterprises base their strategic decisions on data. This requires thinking about data as an asset, figuring out how to gather it from across the enterprise, bringing it together, and analyzing and using it to inform decision-making. These decisions can be about driving efficiencies or reducing costs, but also about finding new opportunities and new areas to expand. So it’s not just a technical challenge; it requires companies to change how they operate and think.

Companies are looking for ways to get early signals about how to respond and react. Being data-driven helps them make decisions based on what they’re actually able to measure and see, rather than acting on gut feelings.

For instance: How can you maximize your ability to interact with your customers? How can you retain them? How do you reduce churn? How do you think about what else to offer them? How can you find more customers like them? That all comes back to how efficiently you use data about customers and about how the organization is operating.

MIT SMR Connections: Flexibility and agility seem to be more important than ever now. Why is that the case?

Pathak: It’s impossible to plan for every decision that companies face. In just the last few months, we’ve seen so much volatility in the environment, and companies need to be able to adapt quickly.

Again, being data-driven enables you to start getting all the signals about what’s happening, and then you can put yourself in a position to respond. Once you’ve got the ability to understand what’s happening in your business and the ability to act on it, you can start to make good decisions. The faster you can adapt to the changing environment, the more successful you’ll be at navigating it.

You need to trust that your data foundation — your combined cloud-based technology infrastructure, your culture, and your processes — will support any manner of future change. Data can inspire confidence during unexpected times, although it requires meaningful data and the right type of data foundation.

About AWS

For 14 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud platform. AWS offers more than 175 fully featured services for compute, storage, databases, networking, analytics, robotics, machine learning and artificial intelligence, internet of things, mobile, security, hybrid, virtual and augmented reality, media, and application development, deployment, and management. Services are offered from 77 Availability Zones (AZs) within 24 geographic regions, with announced plans for nine more AZs and three more AWS Regions in Indonesia, Japan, and Spain. Millions of customers — including the fastest-growing startups, largest enterprises, and leading government agencies — trust AWS to power their infrastructure, become more agile, and lower costs.

Learn more at aws.com/big-data and aws.com/databases.

MIT SMR Connections: What does a data foundation require in terms of both technology and policy?

Pathak: A strong data foundation starts with low-cost, reliable, highly durable data storage that’s scalable. It’s not about hoarding data in silos; the exponential value comes from combing and using data over and over at scale, quickly. To gain the agility needed to respond to the unexpected, you need a data foundation that allows you to capture, clean, and curate data from across the business to be shared and reused for making everyday decisions.

Then you need to think about security and governance. We’re finding that our customers want to give data access broadly to analysts and users in their organizations — but in a controlled way. So you have to define your governance policies, your security, who’s allowed to see what. Once you’ve got those guardrails in place, then you’re able to let people experiment and innovate using that data, secure in the knowledge that you’ve set the foundation for ensuring that data is used in ways consistent with what you’re trying to do from a data policy perspective.

MIT SMR Connections: How should organizations interested in becoming data-driven enterprises be thinking about their infrastructure?

Pathak: The first step is to move away from antiquated, monolithic apps that run on one-size-fits-all relational databases toward highly distributed microservice-based systems running on multiple purpose-built databases. We’ve evolved to a world where it’s no longer one-size-fits-all. You want to be able to use a service that’s mapped to the workload in question.

The other part is thinking about moving to managed services, where you allow someone like AWS to take care of the undifferentiated heavy lifting — the provisioning, patching, scaling, and securing of the systems that you’re using — and you focus instead on working with [application programming interfaces] and what differentiates your business. [For a real-life example, see “From Struggling to Seamless: Ensuring a ‘Flawless’ Global Education Event.”]

From Struggling to Seamless: Ensuring a ‘Flawless’ Global Education Event

Code.org’s signature initiative is known as the Hour of Code, but that’s a deceptively simple name. After all, the annual event involves millions of students and teachers worldwide engaging with hundreds of computer science tutorials offered in more than 45 languages.

For many students, the Hour of Code provides first-time access to computer science — a potentially life-changing experience. But for Code.org, a Seattle-based nonprofit dedicated to expanding access to computer science education, the weeklong event also creates an enormous technology challenge.

“During the school year, our online platform is already a very high-usage learning management system, executing about 1,000 database write queries per second,” says Will Jordan, Code.org’s lead infrastructure engineer. Usage surges by two to three times that amount during the Hour of Code. As Jordan notes: “That’s an extraordinary workload.”

And as more people participated in the Hour of Code each year, that workload grew as well. By 2018, the database was clearly reaching its performance limits, and event organizers struggled to provide consistently reliable service for that year’s event, disabling some features to reduce database workload. Ultimately, Code.org migrated its MySQL databases to Amazon Aurora, a fully managed database service.

The result? “In December 2019, we had an absolutely flawless Hour of Code,” Jordan recalls. “We offered the full range of functionality on our learning platform to all students around the world for the entire week.” The 2019 event saw 100% uptime, 50% fewer support tickets than were created in 2018, and 10 times better database write latency than Code.org’s previous solution.

Today, about 30% of all U.S. students have Code.org accounts, and the organization is working to expand its reach internationally. “Thanks to AWS, the performance of our systems for students around the world is very high,” Jordan notes. “They get a seamless experience, no matter where they are.”

You need to think about similar activities for analytics and how you get your data into one place using managed systems and a mix of open-source or specific purpose-built systems.

We also see a lot of modernization of data warehouses. Customers typically have been running expensive, constrained data-warehouse appliances on premises. Now they’re moving to more decoupled models in the cloud, where you’ve got a data lake integrated with a data warehouse, which we call a lakehouse architecture. Customers are modernizing and moving to services like Amazon Redshift, which gives you performance at scale at a radically different cost point than customers had before.

This also involves building modern applications using those purpose-built databases. Once you’ve got all your data operating in the right set of services, you want to derive insight from it. So it’s also about breaking down silos within the organization so that you can build a complete picture of your data, then providing democratized access to users who can do something with it. By providing controlled access to a broader set of people in the company, you’re able to innovate and experiment faster, figure out what’s working and what isn’t, and double down on the areas that are successful.

MIT SMR Connections: What role does organizational culture play here?

Pathak: When companies want to become more data-driven, they’ve also got to teach people in the company to become more data-driven. That requires reskilling and retraining. But it also requires the ability to put data in everybody’s hands so that the whole organization can experiment and interact with data.

So this is about helping to accelerate learning and building data literacy into the organization. That starts with finding and eliminating silos, but also by establishing a culture where folks think about data as a strategic asset and think of what they want to measure when they put new initiatives in place, and where that data needs to end up, and how it needs to be acted upon. Accomplishing that requires thinking about questions like “What’s the data to back this up?” and “How can we use this data to make a better decision?” and making that part of everyday conversations within the company. Essentially, leaders need to set the tone that data is a tool, an asset they can use to make better decisions, and then reinforce that message at all levels of the organization. That’s how culture can help transform companies in terms of how they use data.

MIT SMR Connections: What challenges do companies face in transitioning to data-driven enterprises?

Pathak: Typically, the biggest challenge is inertia and processes that have been ingrained in the company over time. It starts with moving away from the mindset of “This is how we’ve always done it.”

A simple example is in the legacy world, when we had really expensive systems; we would make decisions about what data to keep and what to throw away. We found that the vast majority of data that companies had was discarded because it was too expensive to put it into a system and think about how to analyze it when it wasn’t precisely clear what it would do for you.

But when you transition to a modern cloud-based architecture, we’ve really tried to remove that cost constraint. And it turns out that much of that discarded data might actually be valuable. So now the default is “Let’s just store everything because we might not know what we want to do with the data or what insights we might get from it down the road.” But going that route involves a mindset shift. It requires a company to set up those guardrails we talked about and then let people loose with the data, train them to do something useful with it, give them the tools that they need to iterate at their own pace, and have the ability to adapt.

As companies move from legacy systems to more modern ones, they’re able to take advantage of the best tooling that’s in place at any time without having to do expensive migrations or re-platforming. That allows them to do new things with data that they couldn’t do before.

A simple example: You could use machine learning services to transcribe audio and then start to run sentiment analysis from that audio. So now what used to be a recording is actually data that you can start to run queries on. For instance, “How many of the calls that happened in the last week were positive versus negative, and what does the trend look like?” That’s enabled by new technology that any customer on AWS can leverage, even without prior machine learning expertise.

MIT SMR Connections: What else should companies know about becoming more data-driven?

Pathak: We recommend that companies think about modernizing, that they think about data strategically, and that they set up to future-proof themselves. That means thinking about embracing open data formats and data lakes, thinking about lakehouse architectures so they get the best of data warehousing and data lakes, and thinking about how to focus on what differentiates them rather than focusing on the classic undifferentiated IT. These things together put you in a position to iterate rapidly, to learn rapidly, as an organization. The faster you’re able to learn, the more successful you’re going to be, especially as the environment continues to be dynamic.

EXECUTIVE BIO

Rahul Pathak, general manager for analytics at AWS, is responsible for Amazon Athena, Amazon Elasticsearch, Amazon EMR, AWS Glue, AWS Lake Formation, and Amazon Redshift. Previously, he was AWS general manager for emerging databases and blockchain, AWS general manager for several managed database and analytics services, and the product manager who oversaw the launch of the Redshift cloud data warehouse service. He has cofounded companies focused on digital media analytics and IP geolocation. He received a bachelor’s degree in electrical engineering and computer science from MIT and an executive MBA from the University of Washington.

MIT SMR Connections

Content sponsored by AWS

Tags:

On Behalf of

Becoming a Data-Driven Enterprise: Meeting the Challenges, Changing the Culture

On Behalf of

MIT SMR Connections

About AWS

From Struggling to Seamless: Ensuring a ‘Flawless’ Global Education Event

MIT SMR Connections

Tags: