Improving Analytics Capabilities Through Crowdsourcing

Syngenta developed an award-winning suite of analytics tools by tapping into expertise outside the organization — including talent available through open-innovation platforms.

Reading Time: 17 min 

Topics

Permissions and PDF Download

How does a company operating outside the major technology talent centers gain access to the most innovative data scientists that money can buy? Assuming you can’t recruit the right data analysts to join your team full time, how do you tap into contractors with the knowledge and creativity you need outside your technical core? In a nutshell, this was the predicament Syngenta AG faced in 2008.

Syngenta, an agrochemical and seed company based in Basel, Switzerland, was formed in 2000 by the merger of the agribusiness units of Novartis and AstraZeneca. Among its more than 28,000 employees are more than 5,000 highly trained experts in biology, genetics, and organic chemistry, many of whom hold doctorates in their field. As a company, Syngenta’s mission is to develop innovative crop solutions that enable farmers to grow basic food staples such as soybeans, corn, and wheat to feed the world’s growing population as efficiently as possible. That means pushing the envelope on genetics.

For centuries, plant breeding has been a labor-intensive process that depended largely on trial and error. Farmers tested different seeds and cultivation techniques in an effort to find plants with the best yields and most desirable characteristics. Luck played a decisive role, as breeders relied heavily on intuition and guesswork to decide which varieties to cross-pollinate. To find the most successful variety of corn, for example, a breeder might have pollinated hundreds or even thousands of plants by hand to see what happened.

Syngenta had been involved in a large-scale version of trial-and-error research and development (R&D), conducting field tests on hundreds of thousands of plants each year in more than 150 locations around the world. But given that the results of experiments are often shaped by quirks and idiosyncrasies, it was sometimes difficult to draw meaningful conclusions. Did one plant grow taller than other plants because of a genetic trait, or was it because it received more water and more sunlight? With traditional research methods, the only way to find out was to invest significant amounts of time and money conducting large numbers of additional tests, which becomes an expensive proposition. Indeed, it takes seven years, on average, to move a new plant variety from the early testing stage to a full commercial product. When you are spending hundreds of millions of dollars each year on R&D for seeds and crop protection (Syngenta’s R&D budget in 2015 was about $1.4 billion), even small savings have a big payoff.

Our idea1 was to use data analytics to study a wide range of plant and seed varieties so we could identify the most desirable plants early and make optimal use of resources (everything from capital to labor, land, and time). What if we could make smarter choices at every stage of the breeding process? In breeding, the process begins with selecting promising parent plants, crossbreeding them, evaluating the offspring, and commercializing the variety that demonstrates a proven ability to outperform existing products. Constant testing and retesting is central to plant breeding, but what if we could eliminate the cost of testing and retesting varieties that weren’t good enough and select the likely winners earlier? Rather than investing time and resources on more and more testing, our aim was to make decisions about our plant portfolios using hard data and science.

Syngenta’s product research and development site in Slater, Iowa, just outside Des Moines, is well off the beaten track for many of the people whose analytics skills we hoped to tap. We knew that we had limited capacity to compete with employers such as Google Inc. or the U.S. National Security Agency for people trained in analytics. So we figured that our best bet was to be creative — to augment our in-house resources by partnering with consultants and academics in fields unrelated to biology and agriculture.

Learning to Use Crowdsourcing Platforms

Open innovation can help companies tackle complex business problems that they can’t solve on their own. In some cases, the barrier is a lack of expertise; in others, it’s cost. However, leveraging the potential of outside experts requires close cooperation from in-house employees, who need to feel that it’s good for the business and doesn’t threaten their jobs. Cooperation from staff is also essential for framing problems and evaluating options. At Syngenta, we turned to several online crowdsourcing platforms to find talent that could help us increase our R&D efficiency.

But before we looked outside, we looked inside. Every problem, challenge, or contest was posted internally so that staff had the first opportunity to offer solutions. Even when in-house talent lacked the particular skill set needed to address complex mathematical issues, their practical experience in plant breeding helped refine the questions we were asking. Rather than seeing the shift as threatening, employees saw that their input was being used to advance an ambitious project.

We were trying to make the most of crowdsourcing platforms and also learning how we could leverage advanced mathematics to develop better varieties of plants. Although crowdsourcing was not new, we wanted to learn how to apply it to our circumstances. We realized early on in reviewing the available crowdsourcing websites that there were different types of platforms for different purposes. We tried several. Some of them were static: When you posted a problem online, individuals stepped forward with a solution. This worked well in identifying, for example, statistical approaches to plant breeding issues that could be solved by pure mathematics. However, static platforms were not well suited to solving more complex problems that cut across multiple disciplines. For these, more curated platforms proved useful by gathering experts from several fields into teams — where people could address problems iteratively. For instance, a plant’s ability to adapt to different locations is driven by biological nuances that don’t lend themselves to solutions that an individual operating on his or her own could develop using only mathematics.

At Syngenta, we set out to use open innovation to harness the power of data analytics so we could identify genetic combinations that unlock desirable characteristics in soybean plants, such as the highest yield. There is no one perfect soybean plant; rather, there are different varieties that are particularly well suited for different climates and growing conditions. Given that a soybean has 46,000 genes that determine its potential, and the number of possible combinations is practically infinite, identifying the best plants was a huge challenge. To find the best-performing soybean varieties, you needed to put them to the test, comparing how one seed performed against others grown in different conditions across multiple locations around the world.

Our vision was to create a suite of software tools that would replace intuition in plant breeding with data-backed science.2 For the initial tool, we wanted a data monitoring system that enabled breeders to glance at data from a given field and know immediately what had happened. We established a contest on the platform of Waltham, Massachusetts-based InnoCentive, which hosts a diverse community of users, including mathematicians, physicists, and computer scientists, who are eager to put their problem-solving skills to the test. The contest was open to all of the platform’s 375,000 users. We wanted contestants to create a tool to represent field test results visually, taking the raw data from field trials and highlighting the anomalies for further investigation.

The tool we envisioned would conduct what’s known as a “residual analysis” — the calculated difference between the observed value of a genetic trait and the predicted value of that trait based on a statistical model across many locations. Since we were looking for a methodology, we intentionally targeted the broadest possible audience. We wanted to test and obtain as many creative options as possible, which meant sending the challenge to the widest possible network. We were pleased by the amount of attention our contest generated within the InnoCentive community. Over the course of about three months, more than 200 problem solvers downloaded the detailed problem description and data so that they could begin developing new ideas for tackling the problem. In order to participate, individuals agreed to sign an online nondisclosure agreement and to follow the contest’s ground rules. In the first round, we were essentially asking them to submit a white paper outlining their approach to solving the problem. Our in-house staff members reviewed each online submission, and the reviewers found two different entries that solved the data quality problem. We picked the approach that we believed could be converted into a practical tool most easily. Under the agreed-upon rules, this was the participant who was paid.

From our perspective, we were building impressive analytics capabilities for our organization at a bargain — running several challenges cost us less than it would have to hire a mathematics professor to work in-house, for example. Each platform had template agreements that effectively lowered the cost of solving problems; we only paid for correct answers, usable methodologies, and the production of viable tools. We stated up front exactly what we were looking for, and participants understood that they would only be paid for giving us what we wanted.

With the first challenge’s winning methodology in hand, we posted a second challenge on the same platform that shifted the focus toward improving the tool. We wanted to automate as many steps as possible so that data entry would be fast, easy, and intuitive. We also wanted the output to be as easy to understand as possible for plant breeders using the tool in the field. This time, contest participants came up with a system that visually represented the yield data from various field trials based on results; areas of low yields were shown in one color and areas of high yields in another. Based on this technique, scientists would be able to identify the problem areas more easily than they could from scanning columns of numbers.

The following example highlights how the data visualization tool worked. In one field, we saw that the results didn’t align well with the expected growth patterns of the different seed varieties we were testing. Instead of the performance being determined by the variety, results appeared to be based on positioning in the field. Some rows underperformed on one side but performed as expected on the other side. We wondered what was going on. Thanks to the tool, our testing field’s manager knew where to look. He discovered the reason for the anomaly: Due to a factory defect, the combine that had been used to measure performance in the field in question was not properly calibrated, so measurements were skewed on one side, but not the other. Thanks to the power of this analytics tool, the equipment manufacturer was able to correct a factory defect that, had it not been resolved, would have skewed testing results for anyone using this machine for years, wasting time and money.

Defining Problems for Problem Solvers

Encouraged by our initial efforts to use open innovation and analytics to improve our plant breeding efforts, we set out to build on our early success by defining a third challenge for problem solvers: identifying a mathematical approach to designing the most efficient experiments for evaluating plant yields. In plant breeding, scientists managing yield trials have three key decisions: (1) how many varieties to test; (2) how many locations to test them in; and (3) how many times to repeat the test. Each of the three decisions recurs in three distinct test stages. Because there are so many possible combinations of these design variables (for our products, there are more than a trillion), breeders often sidestep complexity by supersizing their trials — that is, performing more and bigger tests on the assumption that better results will naturally follow. However, we knew from prior experience that “more” didn’t always yield “better” results. In many cases, it just meant spending more time and money.

We knew that the nature of this problem was many times more complicated than the data quality issue. Data quality is something that could be treated for the most part as purely a mathematics problem. The problem solvers didn’t need a deep understanding of biology to compete. The yield trial design challenge, on the other hand, could only be completed by those with both advanced quantitative skills and knowledge of biology. Thus, Syngenta turned to a different open-innovation platform that focused on teamwork.

Naturally, the team approach narrows the number of participants in the challenge. Only a handful of people had the subject-matter backgrounds to attempt a solution. The initial responses we received regarding the mathematical solution for managing trials were not what we had envisioned. We realized that the problem we wanted to address was more complicated than we originally thought. Despite our best efforts to frame the challenge clearly and unambiguously, participants inevitably interpreted it from their own perspectives, solving questions that we hadn’t asked and focusing on elements of the problem description that, from our perspective, were less important.

It became a source of frustration on both sides: The problem solvers struggled to understand what we were looking for, and we had difficulty framing the question in a way that engaged people.

In our experience, one of the biggest challenges of open innovation is learning how to define problems you want to solve in ways that engage potential problem solvers. As a result, we have learned that rather than presenting problems broadly, it’s often better to divide them into smaller chunks. (See “Solving Problems Through Open Innovation.”) In the yield trial design challenge, we realized that an approach that tackled the big picture (which would have required simulating over a trillion possible combinations for each yield trial) could take weeks and put too big a strain on computing resources. So we decided to redefine the problem, asking problem solvers to come up with a solution that would demand fewer resources. As much as we could, we removed biology from the question to focus on the mathematical component.

Validation is the critical step. In agriculture, we attempt to sort out whether the effects being modeled are the result of plant genetics or something in the environment; it’s very easy to be led astray by false correlation. Validation is the only way we can know whether one part of a problem has been solved and that we are now ready to proceed to the next step.

One problem-solving team came up with a two-stage statistical approach that seemed promising. We tested it against the real-world, historical results we had. The mathematics held up when it was validated against the results, so we had a winner.

It took more than two years of testing and development to build the yield trial design optimizer. Along the way, we went through four major iterations and four minor revisions of the optimizer challenge on the open innovation platform until we arrived at the first version of our tool, fully validated and ready for use.

At this point, we knew that we were onto something big. We were raising our game in ways that wouldn’t have been possible relying solely on in-house talent. The open-innovation platform gave us access to individuals able to develop the sorts of solutions that we never would have imagined possible. We have often found that the best answers come from unlikely places. For example, we worked with mathematicians and statisticians, as well as businessmen and engineers. Some of them were based many time zones away from Iowa — in Europe and even Australia.

Building New Capabilities for the Organization

As we became more familiar with how to manage input from outside experts, we found ways to engage outside problem solvers on a variety of issues. We have harnessed outside talent to come up with a tool that manages the genetic component of the breeding process — figuring out which soybean varieties to cross with one another and which breeding technique will most likely lead to success. We also used outside talent to create a tool that simulates the outcomes of all the different logistical choices our breeders might make — where to grow, which particular soybean traits are most important, and so on.

Over the past eight years, we have used open-innovation platforms to develop more than a dozen tools in our data analytics suite, which have cumulatively revolutionized the way we breed plants. By replacing guesswork with science, we are able to grow more with less, and that’s exactly what needs to happen as the global population continues to increase.

We have found that there are several advantages to open innovation versus expanding our analytics capability in-house. First, since we look for people with mathematical and analytic expertise who can solve problems — as opposed to individuals who would also have the right fit with the organization — we haven’t gotten bogged down in the traditional hiring process. When we found people who performed consistently, we signed contractual agreements to work with them directly. They became our “regulars.”

So what’s in it for the problems solvers? Amazingly, a lot of people participate in open-innovation contests because they enjoy the game. They might spend nights and weekends working to come up with a solution that ultimately doesn’t work — and if they don’t win, they don’t get paid. One of our winning entries came from a man who owned a successful concrete manufacturing plant. He wasn’t doing it for the money; his satisfaction came from solving a complex, real-world puzzle.

Of course, achieving results requires a great deal of effort. Managers who expect that working with online innovation platforms will be as easy as using online shopping websites will be frustrated and disappointed. Running a challenge requires more than answering a few questions on a form. For each contest, we formed a team of in-house experts to review submissions and communicate with participants. The team members shared a commitment to the ultimate goal of optimization because they had an idea of what success would look like. The tools we were building would make them much more effective at their jobs, but getting to that point wasn’t easy.

Open innovation is a hands-on process. It involves frequent interaction with contest participants. When proposed solutions turned out to be more complicated than expected, we regrouped. Our in-house team would set up meetings with the problem solvers, either on the phone or in person, in an effort to better understand why something that seemed to our biologists to have a simple solution was more complicated from the perspective of a data scientist. We found in every instance that challenge participants who invested the effort to provide a submission were eager to work with us to redefine the problems we wanted to solve.

As a company, Syngenta has made a commitment to addressing the issue of global food security: A rapidly growing world population needs to eat, and that requires growing more food. Given the importance of this commitment, management recognized the company needed to upgrade its capabilities. Among other things, this meant learning to leverage data analytics in ways that had never before been attempted in our industry.

Before developing our new suite of tools, the average rate of improvement of our portfolio was 0.8 bushels per acre each year. Eight years later, we’re looking at more than three times that — an annual improvement of 2.5 bushels per acre. Based on our analysis of our soybean crop portfolio, we estimate that we would have had to spend another $278 million on traditional field testing to achieve the equivalent level of genetic gains that we are realizing with the new tools. We are in the process of expanding what we have learned with soybeans to our entire range of crops, including sweet corn, field corn, sunflowers, and watermelon.

In April of 2015, an independent panel of academic and business experts in operations research validated our analytics efforts and their applicability beyond agriculture, awarding Syngenta the prestigious 2015 Franz Edelman Award for Achievement in Operations Research and the Management Sciences.3 What turned the judges in our favor? In the end, it was the combination of our ability to make better breeding decisions and the increased accuracy of measurements through all phases of the seed development and breeding process that paid off. The increase in yields was measurable.

In every industry, there’s room for innovation — even if it means searching outside the company for new capabilities. Throughout the process, we have gone out of our way to explain open innovation to employees and also to show that our goal wasn’t to undermine internal jobs. Employee input is essential in framing the challenges and evaluating the options. Our goal has always been to focus on the outcomes and to access the very best talent, wherever it happens to be.

Topics

References

1. Although Alpheus Bingham, one of the authors of this article, does not work for Syngenta, he served as an early-stage advisor to Syngenta’s project, so the authors opted to use the first person plural when describing Syngenta’s experience.

2. More details about Syngenta’s use of analytics can be found in J. Byrum, C. Davis, G. Doonan, T. Doubler, D. Foster, B. Luzzi, R. Mowers, C. Zinselmeier, J. Kloeber, D. Culhane, and S. Mack, “Advanced Analytics for Agricultural Product Development,” Interfaces 46, no. 1 (January-February 2016): 5-17.

3. For more information about the Edelman Award, see www.informs.org/Recognize-Excellence/Franz-Edelman-Award.

Reprint #:

57411

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.