What Should We Do to Prevent Software From Failing?

Business and society run on increasingly complex software, so it’s time we require a license to write critical code.

Reading Time: 4 min 

Topics

Frontiers

An MIT SMR initiative exploring how technology is reshaping the practice of management.
More in this series

It takes an army of trained, licensed, and accredited professionals to build a skyscraper in most cities around the world. But what about the software platforms and machine learning tools that have become crucial components of the world’s financial, military, medical, and communications ecosystems?

The critical software and technical systems we rely on daily are like invisible skyscrapers all around us — yet we often don’t know who designed them, how they were constructed, or whether they hide defects that could lead to massive inconvenience, financial chaos, or catastrophic failures.

Design flaws are introduced into software systems around the world every day, but the most serious errors can have widespread and costly effects. Even industries that require higher levels of regulation and certification can face catastrophic consequences of software design flaws. In October of 2018, for example, the U.S. Government Accountability Office warned that many sophisticated weapons systems are vulnerable to cyberattack after testers playing the role of adversaries hacked numerous weapons’ control systems. In that same month, the Food and Drug Administration issued an alert on two medical devices due to software vulnerabilities that could allow a hacker to hijack the device and change its function, potentially with lethal consequences for the device user.

Such consequences have been in the spotlight in recent months, as Boeing has faced sharp scrutiny over whether software design flaws were a factor in two back-to-back fatal crashes of 737 Max jets — and if these errors were preventable.

To help prevent other catastrophes, industries that provide critical products and services built with rapidly evolving hardware and software need to consider how they can ensure their businesses have a level of digital resiliency that justifies the trust society has placed in them. This will require — on the part of technical architects, software developers, and hardware designers — creating a commonly accepted set of requirements that software, hardware, and network professionals must satisfy in order to practice their craft. If they don’t, the government just might.

Consider the construction industry, which has had formal standards in place for decades. A licensed architect must create the design for a building, and an army of professional engineers must approve the structure as well as the electrical and mechanical systems, to ensure the project meets or exceeds all building codes and safety standards. All of these professionals have years of schooling and relevant work experience and have passed rigorous certification exams.

Tall buildings rarely collapse, and when they do fall down — or even display structural weaknesses — extensive resources are deployed to figure out what mistakes were made so procedures can be modified. In some cases, the professionals who made mistakes lose their licenses.

When it comes to software and hardware design and development, the requirements are far less formalized. While many of the billions of lines of software code that run big parts of society’s infrastructure are written by highly skilled engineers and computer scientists, there is no requirement to ensure this is the case. There are industry standards for some elements of technical infrastructure development, but because there is no enforcement mechanism, the standards are rarely followed.

Except in rare cases, such as the platforms used in the airline industry and the space program, no professional engineer or architect signs off on the plans for critical computer programs and hardware platforms, and no government inspector certifies them for use. Not every software application or coding project carries the same level of potential risk, so with this focus on quality and resiliency, a tiered approach is likely required.

To mitigate the massive risks of critical system failure, the private sector should join together to further professionalize the design and implementation of software. To start, coders who work on critical infrastructure should have a professional accreditation framework that issues licenses. One approach might be something like the Financial Industry Regulatory Authority, a nongovernmental organization that tests, certifies, and monitors those who work in the U.S. brokerage industry to ensure they have the skills to perform their jobs.

Of course, licensing and registration wouldn’t solve all problems, as anyone who has ever experienced a bad financial adviser, architect, or doctor can attest. But it is a step in the right direction. A potential positive outcome of such an approach would be the further leveling of the playing field from a diversity perspective. After all, it would be difficult to argue that an individual programmer or designer wasn’t as qualified as another if they both had the same level of industry certification.

If industry fails to self-regulate, governments might seize the opportunity. Already, financial regulators in the U.S. are venturing down this path in efforts such as several agencies coming together to propose the Enhanced Cyber Risk Management Standards framework, which is focused on cyber resiliency, and California’s Consumer Privacy Act, focused on enhanced data privacy. Issues in both areas often result from poorly written code or badly designed hardware.

The future clearly will run on increasingly complex software. Yet it is only a matter of time before another mistake in a critical piece of software or hardware results in a sensitive data breach, financial instability, or further loss of life. It is time to recognize all the invisible skyscrapers that exist all around us and take the steps to prevent them from falling down.

Topics

Frontiers

An MIT SMR initiative exploring how technology is reshaping the practice of management.
More in this series

More Like This

Add a comment

You must to post a comment.

First time here? Sign up for a free account: Comment on articles and get access to many more articles.

Comments (3)
daz _
None of the regulations or licenses or government bodies helped at all for Grenfell Tower.

Maybe the answer might be AI or ML + formal verification

I fear your only real requirement is an extra TAX or barrier to operate, which for the sort of projects/products you mention already have these taxes and barriers in place.
Chad Juliano
Compared to other types of engineering the software development process is a wild west and I have thought about this exact question over the years. Consider that the nature of software development is different enough that the regulations applied to other industries would not be effective.

During the construction of a skyscraper a design will specify exactly what needs to be built. A peer is able to effectively see enough information in the design to make sure regulations have been followed. Likewise when an engineer inspects a construction site they can walk around, evaluate what people are doing, and validate that work is being done according to specification.

That is how some people naively think a software project should work but in reality it rarely does and what actually gets released has features and functionality that were not considered or thought of when the requirements were written. The software development process has always been an iterative approach and people who try to force it to be something else end up iterating on their failure.

There is also the problem of validating that the software works as designed. Designs are inherently ambiguous and less detailed than code and in the process of writing the code developers make important decisions to fill the gaps. Code reviews by peers catch some problems but can only supplement good unit testing by developers. Writing integration test cases that provide good coverage of the functionality takes a level of understanding that a newcomer to the project could not have.

Finally, development is never really finished on a software product as long as there are users of it. There is always another bug or another feature requiring updates leading to more bugs and requests for functionality. The skyscraper is a static entity and and if something is wrong with its construction then the fixes are difficult and costly. Software is easily updated and its nature is to change and evolve throughout its lifecycle.

This is just a comment and I need to cut myself off. The thing I am trying to point out is that the article assumes that regulation of software development is possible without addressing the subtleties of how it could be done.
Hu Nu
Thoughtful article, yet it is important to note that 737 Max V1 software was working as designed. Boeing's supposedly mature internal processes and FAA oversight agreed that the MCAS code correctly compensated for the inherent instability of the aircraft.  It's going to take more than certifications to overcome systemic complacency that literally kills.