In less than 10 years, some predict that Artificial Intelligence (AI) may be the #1 driver of global GDP growth1. This is a staggering prediction. Over this next decade, we will see incredible adoption and innovation; in fact, applications that aren’t AI-enabled may feel broken.
Despite this excitement, I have seen first-hand that trust in AI is a growing barrier to adoption for enterprises. In a survey of global business executives, over 90% reported encountering ethical issues in connection with adoption of an AI system. Of these, 40% abandon the project completely2. Without robust evaluation of ethical concerns and responsible building and deployment of AI, we run the risk of the benefits of this technology being realized should more projects be abandoned.
At Google, we believe that rigorous evaluations of how to build AI responsibly are not only the right thing to do, they are a critical component of creating successful AI.
We began developing our AI Principles mid-2017, and published them a year later in June of 2018. They are a living constitution we use to guide our approach to building advanced technologies, conducting research, and drafting our policies. Our AI Principles keep us motivated by a common purpose, guide us to use advanced technologies in the best interest of societies around the world, and help us make decisions that are aligned with Google’s mission and core values. They are also inseparable from the long-term success of deployed AI. Over two years in, what remains true is that our AI Principles rarely give us direct answers to our questions on how to build our products. They don’t – and shouldn’t – allow us to sidestep hard conversations. They are a foundation that establishes what we stand for, what we build and why we build it, and they are core to the success of our enterprise AI offerings.
How Google Cloud puts our AI Principles into practice
Our governance processes are designed to implement our AI Principles in a systematic, repeatable way. These processes include: product and deal reviews, best practices for machine learning development, internal and external education, tools and products such as Cloud’s Explainable AI, as well as guidance for how we consult and work with our customers and partners.
In Cloud, we created two separate review processes. One focuses on the products we build with advanced technologies, and the other focuses on early-stage deals involving custom work above and beyond our generally available products.
Aligning our product development
When we set out to develop new products and services that involve advanced technologies, as well as similar products and services that were already generally available when we started this process, we undertake rigorous and deep ethical analyses, assessing risks and opportunities across each principle and conducting robust live reviews that can involve difficult but often inspiring conversations. We freely discuss critical but challenging topics such as machine learning fairness, implicit bias, how our own experiences may differ greatly from those potentially impacted by AI, and a range of other considerations that may affect whether we move forward with a given product. We spend considerable time focused on creating and maintaining psychological safety among our review teams to ensure all voices and topics are heard and valued. Over time, these reviews have made it easier to have difficult discussions about potentially adverse outcomes that we can then work to prevent.
Aligning our customer deals using custom AI
We also do a review of our early-stage commercial deals – well before they are signed – that will involve our building custom AI for a client. We work to determine early on whether the project will involve use of an advanced technology in ways that might contravene our AI Principles.
As we’ve built expertise in our evaluations, developed “case law” over time, and thought deeply about where we might draw the lines of our responsibility, we have iterated on what we consider to be in or out of scope for our engagement reviews. Because we have done the in-depth reviews of our generally available products and created alignment plans for each of them, we have been able to focus our deal reviews on unique and custom use cases in how our generally available products might be applied.
Our product reviews inform our roadmap and our best practices
Out of these reviews, the paths forward can take many forms. We might narrowly scope the product, perhaps focusing on a specific use case versus a general-purpose API. For example, in approaching facial recognition, we designed our Celebrity Recognition API as a narrow, carefully researched implementation. Other mitigations might be educational materials or best practices tied to launch, such as a Model Card or more specific implementation guides like our Inclusive ML guide. In some cases we might implement policy or terms of service, and in others we might decide not to move forward with a product at all.
Over the last few years we’ve developed a number of best practices I often share when working with organizations seeking to put their own principles in place. These are:
- No ethics checklist. It can be tempting to create a decision tree list that easily buckets a set of things as fine and a set of other things as not fine. I’ve tried it myself, as has almost everyone I know embarking on this journey. I hate to be the bearer of bad news, but this is not possible. It is the intersection of the technology, the data, the use case, and where and how it is applied that guides alignment decisions, and those may be similar, but are never the same.
- Responsibility by design. We aim for in-depth reviews early on in the development lifecycle, which has proven to be essential to ensuring alignment.
- Diversity of input. The standing members of our review committees are pan-Google and cross-functional, technical and non-technical. Multiple members have social science, philosophy and ethics backgrounds. We are purposefully multi-level, spanning members early in their career to senior executives. Internal and external input is a priority for us, especially when our own lived experiences cannot robustly inform our decisions. Direct input from external domain experts and impacted groups is also key. For example, the Human Rights Impact Assessments provided by organizations like BSR as a component of our Human Rights Diligence have helped us tremendously. Ensuring all voices are heard is essential, and not always easy.
- A culture of support. A top-level mandate is necessary, and also not sufficient. Cultural transformation across an organization is key. Part of this comes from training on tech ethics in order to actively connect ethics to technology that might otherwise be believed to be values-neutral. It’s not always comfortable to talk about how these transformative technologies can be harmful, but it’s important to build that commitment by modeling it, even when difficult.
- Transparency. Trust is at the core of why we have these principles and practices in place. While we aren’t transparent with every decision as that would go against our commitments to privacy, I believe that level of transparency is not necessary. There is no world where any decision we make will be met with universal approval. This is okay. What we strive for is an understanding of how we make our decisions, as this can build trust even within disagreement.
- A humble approach. AI is changing rapidly and our world is not static. We try to consciously remember we are always learning, and while it’s an unreachable goal to certify a product as perfect, we can always improve. We will make difficult decisions over time, such as removing gender labels from our Cloud Vision API. These changes can be hard to explain, both internally and externally, but we try to honor the complex reality of advanced technologies.
- The work is not (easily) measurable. Principles alignment is often unquantifiable, subjective, culturally relative and impossible to pin down. It’s precisely because these challenges exist, however, that we take action.
We are unwavering in our commitment to this work
Humans are at the center of technology design, as well as impacted by it, and humans have not always made decisions that are in line with everyone’s needs. I’ll offer two examples of what I mean, both of which have been corrected.
- Until 2011 – automobile manufacturers were not required to use crash test dummies that represented female bodies. As a result, women were 47% more likely to be seriously injured in a car accident than men3.
- For decades, adhesive bandages, pantyhose, and even crayons were designed with white skin tones as the benchmark, forcing anyone with darker skin to use a soft-pink band-aid meant to match skin color, nude stockings that lightened their skin, or have the only “flesh” colored crayon be a peachy tone.
Products and technology should work for everyone. Unfair bias can be caused by lots of factors, and we work to disentangle these root causes and interactions – including the societal context infused in data you may use as an input.
Our products and services don’t exist in a vacuum. We strive to take societal context into account in our approach and go beyond mere technical analysis. We try to identify the potentials for harm, and do our best to mitigate or avoid them, while maximizing the benefits technology can bring to all of us.
Responsible AI is a key theme at Google Cloud’s Next OnAir virtual conference, with my session highlighting our best practices and examples of responsible AI in action.
1. McKinsey & Company, “Notes from the AI Frontier Modeling the Impact of AI on the World Economy” (Sept. 2018).2. Capgemini “Organizations must address ethics in AI to gain public’s trust and loyalty” (July 2019)