Building Connections Through Open Research: Meta’s Joelle Pineau

Topics

Artificial Intelligence and Business Strategy

The Artificial Intelligence and Business Strategy initiative explores the growing use of artificial intelligence in the business landscape. The exploration looks specifically at how AI is affecting the development and execution of strategy in organizations.

In collaboration with

BCG

Transcript

Shervin Khodabandeh: Why is being open about your company’s AI research a benefit more than a risk? Find out on today’s episode.

Joelle Pineau: I’m Joelle Pineau from Meta, and you’re listening to Me, Myself, and AI.

Sam Ransbotham: Welcome to Me, Myself, and AI, a podcast on artificial intelligence in business. Each episode, we introduce you to someone innovating with AI. I’m Sam Ransbotham, professor of analytics at Boston College. I’m also the AI and business strategy guest editor at MIT Sloan Management Review.

Shervin Khodabandeh: And I’m Shervin Khodabandeh, senior partner with BCG and one of the leaders of our AI business. Together, MIT SMR and BCG have been researching and publishing on AI since 2017, interviewing hundreds of practitioners and surveying thousands of companies on what it takes to build and to deploy and scale AI capabilities and really transform the way organizations operate.

Hi, everyone. Today, Sam and I are happy to be joined by Joelle Pineau, vice president of AI research at Meta. Joelle, thanks for speaking with us today.

Joelle Pineau: Hello.

Shervin Khodabandeh: OK, let’s jump in. A good place to start might be for you to tell us about Meta. A lot of people know what Meta is, but maybe you could describe it in your own words, and also your role in the company.

Joelle Pineau: Well, as many people know, Meta is in the business of offering various ways for people to connect and build community, build connections, whether it’s through Facebook, WhatsApp, Instagram, [or] Messenger. We have billions of people around the world using our products. I have been at Meta for about seven years now leading AI research teams. I’m based in Montreal, Canada, and now I lead FAIR, which is a fundamental AI research team across our labs in the U.S. and Europe. The role of our group is actually to build next-generation AI systems and models [and] discover the new technology that will eventually make the products better, more engaging, and safer as well.

Sam Ransbotham: That’s a great overview. Can you give us a sense of what some of those projects are that you’re excited about or you’re working on? You don’t have to give us any secrets, of course, but what are some fun things you’re excited about?

Joelle Pineau: Well, I hope we have a chance to talk about it, but there’s not a ton of secrets because, in fact, most of the work that we do is all out in the open. We adhere strongly to open science principles. We publish our work; we share models, code libraries, and so on and so forth.

Our teams cover the full spectrum of AI open problems, so I have some teams who are working on understanding images and videos, building foundation models — the core models that represent visual information. I have some teams that are working on language models, so understanding text — written; spoken language as well. I have some teams doing robotics — so understanding how AI systems move in the physical world, how they understand objects, people, interactions — and a big team of people who are working on core principles of intelligence: How do we form memories? How do we actually build relationships between different concepts and ontology of knowledge? — and so on and so forth.

Sam Ransbotham: It seems like there’s almost nothing within artificial intelligence that you’re not working on there. Tell us a bit about why you think open is important.

Joelle Pineau: So FAIR has been committed to open research for 10 years now, since day one. We’ve really pushed on this because whenever you start a project from the point of view of making it open, it really puts a very high bar in terms of the quality of the work, as well as in terms of the responsibility of the work. And so when we decide what algorithms to build, what data sets to use, how to evaluate our data, and how to evaluate the performance of our model through benchmarks — when we know that all of that work’s going to be open for the world to scrutinize, it really pushes us to put a very, very high bar on the quality of that work, on the rigor of that work, and also on the aspects of responsibility: safety, privacy, and so on.

The other reason I think that open is really helpful is, a lot of researchers come from a tradition of science, where you’re always building on the work of others. Science does not happen in a silo, and so if you’re building on the work of others, there’s also a desire to contribute back to that community, so our researchers are incredibly interested in having that kind of a culture. So it helps us recruit the best researchers [and] keep them. It is quite different from how other industry labs operate, and so from that point of view, I think it’s definitely a big advantage.

What’s interesting is, from the point of view of Meta, there’s no concern in terms of keeping some of that research internal, in the sense that it doesn’t in any way stop us from using this research in our products. [It’s not as though], because we’ve published the results, we can’t use it in our product. Really, the power of the product comes from all the people using it; it doesn’t come from having a secret sauce about AI. And we know how fast AI is moving. A lot of that knowledge is disseminated across the community very, very quickly, and we are happy to contribute to that.

Sam Ransbotham: That makes a lot of sense. A lot of my background is in computer security, and so I think openness is a great segue there from security because of both of those points. First, in security, anybody can design something that they can’t break. But the question is, can someone else break it?

Joelle Pineau: Yes.

Sam Ransbotham: And I think that’s always a more interesting and more difficult problem. But then there’s also the idea of building on others’ work. I think that’s huge. If you think about what’s happened in research from the history of all of mankind, historically, research happened in academia, and then, eventually, more basic research became more applied within industry. But it seems like with artificial intelligence, a lot of this has shifted to industry-first. In fact, what you described to me sounds very much like an academic lab. So is that a problem that we’re moving basic science from academia … ? Or are we? Maybe I’m begging the question. Is this a move that’s happening? Is this a problem? What do you think?

Joelle Pineau: Well, I think both academia and industry have advantages when it comes to AI research, and I’ll maybe not speak broadly across all areas of research, but for AI, in today’s context, I do think both have significant advantages.

On the one hand, on the industry side, we do have access to vast resources, in particular with respect to compute, and so when it comes to scaling some of the large language models, you know, you need access to thousands of GPUs, which is very expensive; it takes a strong engineering team to keep these running. And so it’s a lot more feasible to do this with a large industry lab. On the academic side, it’s harder to have the resources and the personnel to be successful in terms of scaling large models.

Now on the academic side, there are advantages, and I do have a position in academia. I have many colleagues; I have some grad students. So I think you have to be very smart about what research questions you ask, depending on your setting. On the academic side, we have the privilege of often working with highly multidisciplinary teams. I work with people who come from philosophy, cognitive science, linguistics, and so on and so forth. We ask much broader questions, and, as a result, we come up with different answers.

One of the places where I sort of track the different contributions is looking at some of the very top conferences in the field and seeing, where do the outstanding paper awards go? Do they go to academia? Do they go to industry? And in many cases, we see a mix of both. There’s some really seminal work coming out of both industry and academia that is completely changing the field, that is bringing forth some breakthroughs in AI, so I’m quite optimistic about the ability for researchers across different types of organizations to contribute. And beyond that, we haven’t talked about startups, but there’s a number of small startups that are doing some really phenomenal work in this space as well. And so overall, having a thriving ecosystem is [to] everyone’s advantage.

I think I’m more interested in a lot of our work in looking at ways that we can work together because, in general, I strongly believe that having more diverse teams helps you ask different questions. So a lot of the intent behind our work on open-sourcing is actually to make it easier for a more diverse set of people to contribute. You made the analogy with the security community really relying on open protocols. I think there’s a lot of that in how we tackle this work from the sense of, I have amazing researchers who are putting their best every day into building models, but I do believe, by exposing these models to a broader community, we will learn a lot. So when I make the models available, researchers in academia and startups take these models, in some cases find flaws with them, [and] give some quick feedback.

In many cases, we see derivatives of the model that have incredible value. One of the big launches we had last year is our Llama model: Llama 1, Llama 2, Llama 3. Thousands of people have built derivative models from these, many of them in academic labs, fine-tuning models — for example, to new languages to open the technology to different groups. And to me, that’s where a lot of the value of having different players really comes from.

Shervin Khodabandeh: I think we certainly see the value in, let’s say, collaborating and iterating and keeping things open, but that’s not always guaranteed to happen. What kind of incentives are there for us all to work together like this?

Joelle Pineau: It’s always hard to predict the future, and in particular with AI and how fast things are moving, and so I completely agree with you. What I will say is, as you mention, there’s a strong culture toward open protocols at Meta that predates the AI team. The basic stack — the basic software stack — is also based on many open protocols. And so that culture is there to this day; that culture continues. It goes all the way to the top of the leadership, and that commitment to open-sourcing the models is strongly supported by Mark Zuckerberg and his leadership team. So I don’t see that this is going to stop very soon.

What is going to be important is that we continue to release models in a way that is safe, and that’s a broader conversation than just one company. The governments have several points of view of how should we think about mitigating risks for this model. There’s also a lot of discussions about how to deal, in particular, with very frontier models — the largest, most capable models. And so we’re going to have to have these conversations as a society, beyond just the labs themselves.

Sam Ransbotham: You raise the specter of risks. The worry out there is that “Oh my gosh, these models are going to take over everything, and our world is going to collapse. … This is an existential threat.” I’m kind of setting you up with that straw man, but do you buy that?

Joelle Pineau: I don’t really spend a lot of time planning for the existential threat, in the sense that many of these scenarios are very abstract. They’re excellent stories in terms of science fiction, but in terms of actually taking a scientific and rigorous approach to that, it’s not necessarily the existential risk that takes most of my attention. I will say, with the current generation of models, there are several potential harms to different populations. You know, algorithms have been known to have biases toward underrepresented groups — for example, in facial detection systems — as well as it being, on the language side, very Anglocentric. And so I do look quite carefully at the current set of risks and try to measure them as much as possible in a rigorous way.

We build mitigations whenever we can. We’ve invented new techniques for doing watermarking to make sure that false information can’t circulate. We’ve done a lot of work on bias assessment so that we can actually measure the fairness performance of our algorithm. So I look a lot more at current risks rather than these really long-term ones, just because I feel we can have a handle on it that is based on a rigorous approach, based on metrics, based on really analyzing what the harms are and how to mitigate them. The very farfetched scenarios, it’s really hypothetical. It’s hard to build good systems; it’s hard to do good science. It’s also hard to do good policy.

Sam Ransbotham: I think your point’s well taken about bias and metrics. You mentioned, for example, these models that have biases built in, but, I mean, my gosh, they’re built off training data that has massive bias built in. I find it hard to attribute that to the model itself and more to the training data, and your point there is that you can build in bias mitigation there. What kinds of things have you done toward that?

Joelle Pineau: Yeah. In fact, on the question of bias, it’s a little bit of both. There’s no doubt that many of our data sets are biased. The data sets are a reflection of our society, and unfortunately, a large amount of unfairness remains — discrimination, as well as having underrepresented groups in our society. So there’s no question that the data sets themselves don’t start us off on a very good foot.

However, the models themselves also tend to enhance these biases, in that most of the machine learning techniques we have today, they’re very good at interpolating the data — so you sort of take data, distribute it in a certain way, and the models will really push toward the norm of that data. The models tend to be very poor at extrapolating, so making predictions outside of the data set; they tend to have a larger error. So, if anything, when we train the models and we try to sort of minimize the error, you do well by predicting more toward the norm versus toward the sides of that distribution.

And so the data’s responsible, [but] the models are also responsible for doing that. And then there’s the way in which we deploy the models. We tend to often look at aggregate statistics, so we look at the overall performance of the model, and, based on the overall performance, we’ll say, “Great. We’ve got 95% performance on this model. It’s ready to be deployed.” But we don’t take the time to look at a more stratified analysis of results: “What is the performance with respect to different groups, and how are these groups differently impacted with respect to how the system is deployed in a bigger system?” I think there’s different points where we can be much more rigorous and thoughtful to make sure that we don’t enhance biases and, ideally, that we actually use AI toward a more fair and equitable society.

Sam Ransbotham: Yeah. I think that point of averaging is huge. … Models feel right when they give us the answer we’re expecting. The image generation feels right when it gives us the image that fits our stereotypes.

Joelle Pineau: Yeah.

Sam Ransbotham: And fighting that seems like it’s a quite difficult problem. But on the other hand, I feel like these models can try to solve it in a way that we’re not going to convince everyone in the world to suddenly stop being biased tomorrow or suddenly not have a stereotype tomorrow. But we could convince an algorithm not to have a stereotype tomorrow by tweaking some weights and changing things, and that gives me a little more hope to manage the risks. Perhaps it’s not the existential threat we’re getting there yet, but it seems more plausible to me that way.

Joelle Pineau: I think one of the challenges is determining what we want out of these models, right? We’ve seen some pretty egregious examples recently of groups … from [what] I assume is well-meaning intent to rebalance data sets, especially with representation of, for example, different racial groups in images. Of course, if someone asks for an image of an engineer, you don’t want only men to show up. You would hope to have a few women show up. And there are ways to rebalance the data, there are ways to sort of recompensate at the algorithmic level. But sometimes you end up with very unusual results. And so it’s also a question of, what [is] the distribution of results that we expect and that we tolerate as a society? And in some cases, that’s not very well defined, especially when the representation is biased within the real world as well.

Sam Ransbotham: That seems incredibly hard because the problem switches from being an engineering problem — and engineering problems, you can typically solve with enough pizza and caffeine. And when you get to these more difficult problems, then they tend to be trade-offs and they tend to be choices, and these choices are very difficult. They’re not improving an algorithm, which is the kind of thing that we can get into, but knowing what it should do seems like a much harder problem. And again, that seems much worse, too, as these technologies become so pervasive. If, for example, Meta does make these algorithms available to people as part of the open-source process, by definition more people have access to them, and then more people have to make these difficult decisions. That seems much harder to scale than algorithms.

Joelle Pineau: I agree. I think, in many ways, deciding as a society what we want these models to optimize for and how we want to use them is a very complicated question. That’s also the reason why, at Meta, we often open-source the research models. We don’t necessarily open-source the models that are running into production. That would open us up, I think, to undue attacks, and it’s something we have to be careful about, but we often open our research models. And so that means, very early on, if there are opportunities to improve them, we learn much faster, and so that gives us a way to essentially make sure that by the time a model makes it into product, it’s actually much better than the very first version. And we will release multiple versions as the research evolves, as we’ve seen, for example, with the Llama language models I mentioned earlier.

You know, we released Llama 1, Llama 2, Llama 3, and so on, and every generation gets significantly better. Some of that is, of course, the work of our own fabulous research teams, but some of that is also the contributions from the broader community, and these contributions come in different forms. You know, there are people who have better ways of mitigating, for example, safety risk, there are people who bring new data sets that are allowing us to evaluate new capabilities, and there are actually some very nice optimization tricks that allow us to train the models faster, and so all of that sort of converges to help make the models better over time.

Sam Ransbotham: Yeah. I think the analogy that, I don’t know, that sticks with me is how image processing improved from the 2012 ImageNet competition, you know, again, that came out of academia originally — Toronto — but then exploded as everyone could see what everyone else was doing. Everyone brought something better: a faster implementation, a smaller implementation, a bigger [implementation]. And the accuracy, just over that very short time, got really, truly phenomenal.

Shervin Khodabandeh: Let’s shift gears a little bit. Joelle, you are an AI researcher and a professor. How did you find yourself in this line of work?

Joelle Pineau: I’m very driven by curiosity, I have to say. I first got into robotics. That was sort of my gateway into AI. I was doing an undergrad degree in engineering at the University of Waterloo, and near the end of that, I had the chance to work on a robotics project building a six-legged walking robot — in particular, the sensor system for that robot. So we had some sonars and had to process the information, and from that decide sort of where the obstacles were in the environment. And so that led me to doing graduate studies — master’s, Ph.D. — at Carnegie Mellon University in Pittsburgh, which is a phenomenal place to study robotics. And from there, I really got into machine learning.

I found that for the robot to have relevant, timely information and to be able to make decisions, you needed to have a strong model. So my thesis work was in planning under uncertainty — the ability to take decisions when there’s some uncertainty about the information, and developing algorithms for doing that. And from then on, I took on an academic career at McGill University in Montreal, where I’m still based, and pursuing work across areas of machine learning — a lot of applications of machine learning in health care. We have a fabulous faculty of medicine here at McGill, and so I had many very exciting partnerships there. And also, a lot of work on building dialogue systems, which today we recognize as language models and chatbots, but I was building some of the very preliminary versions of this work in the early 2000s. And so, because I do use curiosity as my main motor, it has allowed me to work across several subfields of AI: robotics, language, perception, and applications. And so that gave me a pretty good set of knowledge and experience to then come into a place like Meta, where the teams that I work with do fundamental research, but we work closely with product teams and try to both push the frontier in terms of the science but also push the frontier in terms of new products, new experiences.

Sam Ransbotham: So clearly, there’s lots that Meta’s doing around the core Meta products, but there’s the general scientific discovery that Meta research is working on. What are some examples of projects that are in progress there?

Joelle Pineau: This is such an interesting area. I think there’s enormous potential to use AI to accelerate the scientific discovery process. When we think about how it works, often, you know … Let’s say you’re trying to discover a new molecule or discover a new material. There’s a very large space of solutions, often combinatorially large, and the traditional methods have us looking through the space of molecules one by one, and we take them into the wet lab, and we test them out for the property that we want, whether it’s to develop a new medication or develop a new material. And so we’ve had a few projects over the years that look at this problem.

More recently, we have a project that’s looking specifically at direct air carbon capture — really, the desire to build new materials that could capture carbon in a way, of course, to address our environmental crisis.

Now, when you do this work, there are many steps. One of them is even just building up the data set for doing that. So we’ve built up a data set synthesizing many different possibilities for this problem, and out of that, we often partner with external teams to try to validate which of these solutions may bring the most value.

We’ve done previous work also in the area of protein synthesis that had a similar flavor, though the specifications of the protein were a little bit different, but in a core, fundamental way, the problem looks very similar. So I’m really excited to see what comes of this. I’ve had some cases where partner teams came to me and said, in the space of about a year of working with AI, they were able to cut down the scientific process in terms of experiments that would’ve taken them, like, 25 years if they were going through the search space with more traditional methods.

Sam Ransbotham: And I think that’s something that we’re seeing from other people we’ve talked to. We talked to, for example, Moderna, talking about their vaccine development and how it helped explore that space, and we talked about Pirelli and how they use it for tire components. I think this idea of exploring a combinatorially large space is really pretty fascinating. It’s not something that I would’ve expected Meta to be involved with, at first blush. I can see, for example, the “carbon dioxide from the air” problem; that’s probably just something you’re facing in data centers. But I wouldn’t have expected that.

Joelle Pineau: Yeah. I mean, you bring up the case of data centers. I would say that’s a prime application for this. We are building several data centers, and it’s in everyone’s interest for those to be very energy efficient. We also have some strong commitments in terms of using renewable energy, and so there’s a strong motivation in that space. And, not to be forgotten, we also have all of the work that’s happening on our work toward the Metaverse, the Reality Labs side of Meta, which is really the longer-term vision of building AR[augmented reality] and VR [virtual reality] devices. And when it comes to that type of hardware design, there’s a lot of really hard problems, whether it’s in the space of optics or other components, where AI-guided design can actually be very useful to accelerate that work.

Sam Ransbotham: Yeah. That’s pretty interesting. We actually just talked with Tye Sheridan, who is the star of the Ready Player One movie, and so that’s a perfect kind of segue from the metaverse to there.

We have a segment where we ask you rapid-fire questions, so just the first thing that comes to your mind. What’s the biggest opportunity for artificial intelligence right now?

Joelle Pineau: I do think that the ability to open up, to connect people across languages, is huge. We’ve had systems where we’re building up machine translation to go up to 200 different languages, but there are many more languages that are spoken only, and so we’re really having the ability to build technology for anyone to understand anyone else across the planet. I think that’s going to be really crucial for us, to figure out how to all live together on this Earth.

Sam Ransbotham: So, what’s the biggest misconception that people have about AI?

Joelle Pineau: I don’t know if it’s the biggest, but one that really gets to me is thinking of AI as a black box. People think information goes in, something happens, and then something comes out. I think, in many ways, from where we stand today, the human brain is a lot more of a black box than AI. When I have an AI system, I can trace down, with a lot of precision, how information circulates, how it’s calculated, and how we got to the output. I cannot do that with the human brain in the same way. So, yeah: Whenever someone says, “AI is a black box,” I sort of frown a little bit and feel like “No, it’s a complicated box, but we have a lot understanding of what goes on inside there.”

Sam Ransbotham: Yeah. Other people’s brains make no sense to me. Mine makes perfect sense, but everyone else’s doesn’t.

What was the first career that you wanted?

Joelle Pineau: Oh, early on, I wanted to be a librarian. I loved reading books — I still do; I still read quite a bit — and I thought, you know, having a job where you can just sit in a space filled with books and read all day sounded delightful.

Sam Ransbotham: When do we have too much artificial intelligence? When are we trying to put that square peg in a round hole?

Joelle Pineau: I don’t think of it as, like, one day we have enough and one day we have too much. I think it’s really about being smart about where you bring AI into a system. So already, there are places where AI shouldn’t go, and there are places — or at least the version of the models we have today — and there are places where we could bring in AI much more aggressively, I think. So I think what’s really important is figuring out how to bring it in, in a way that it brings real value — academic value, of course, but real social value as well, and being thoughtful about that.

Sam Ransbotham: Yeah. That ties to your previous answer about the difficult parts of using the technology or the technology itself.

So, what’s one thing that you wish that artificial intelligence could do now that it can’t do currently?

Joelle Pineau: I wish that AI systems could understand each other. Right now, we’re building a lot of AI systems that are individual; they’re all fine-tuned for an individual performance. But once we start deploying many of these AI agents together in the same place, our methods for understanding the dynamics between several agents are very primitive, and I think there’s a ton of work to do. You know, if we look to humans as the society of agents that is most evolved today, we derive a lot of our longevity, our robustness, our success through our social ties, and AI systems today have no idea how to build social ties.

Sam Ransbotham: That’s interesting because I think we spend so much time thinking about the human-computer interface and computer-human interface and then not as much about the computer-computer interface.

This has been a fascinating discussion. It’s really kind of opened my eyes to all the things that Meta’s doing that are beyond just that sort of surface research that’s more obvious in the newspapers and media reports. Thanks for taking the time to talk with us today.

Shervin Khodabandeh: Yes, a very inspiring conversation. Thank you.

Joelle Pineau: My pleasure. Happy to be with you.

Shervin Khodabandeh: Thanks for listening to Season 9 of Me, Myself, and AI. Our show will be back in September with more episodes. In the meantime, if you missed any of the earlier episodes on responsible AI, please go back and have a listen. We talked with Amnesty International, Airbnb, and Salesforce. Thank you for listening, and also for reviewing our show. Talk to you soon.

Topics

Artificial Intelligence and Business Strategy

In collaboration with

BCG

About the Hosts

Sam Ransbotham (@ransbotham) is a professor in the information systems department at the Carroll School of Management at Boston College, as well as guest editor for MIT Sloan Management Review’s Artificial Intelligence and Business Strategy Big Ideas initiative. Shervin Khodabandeh is a senior partner and managing director at BCG and the coleader of BCG GAMMA (BCG’s AI practice) in North America. He can be contacted at shervin@bcg.com.

Me, Myself, and AI is a collaborative podcast from MIT Sloan Management Review and Boston Consulting Group and is hosted by Sam Ransbotham and Shervin Khodabandeh. Our engineer is David Lishansky, and the coordinating producers are Allison Ryder and Andy Goffin.

Tags:

Building Connections Through Open Research: Meta’s Joelle Pineau

Topics

Artificial Intelligence and Business Strategy

Transcript

Topics

Artificial Intelligence and Business Strategy

About the Hosts

Tags:

Add a comment Cancel reply

Subscribe to Me, Myself, and AI

Topics

Artificial Intelligence and Business Strategy

Transcript

Topics

Artificial Intelligence and Business Strategy

About the Hosts

Tags:

More Like This

Add a comment Cancel reply

Subscribe to Me, Myself, and AI