Put Behavior on the Scorecard

Most codes of conduct prohibit bad behavior but say little about the conditions that let good people think clearly under pressure. Health and employee benefits strategist Jaqueline Oliveira-Cella argues that the fix isn’t a better code but a performance review that scores how leaders hit their targets, not only whether they do.

A senior executive recently described spending nearly a year recovering from working under a C-suite leader who made him feel steadily diminished through daily behavior that wore his confidence and narrowed his judgment, one interaction at a time. His performance was strong. His commitment was there. And none of it appeared anywhere in the organization’s code of conduct.

This should not come as much of a surprise. Most codes are built to prevent misconduct, when what organizations actually need is something that shapes the conditions in which good people either think clearly or don’t.

A harder question for the organization: What would it have taken for the code to have prompted someone to pick up on what was happening? Something short of a formal complaint or a policy violation, just the steady loss of someone’s capacity to do their best work. The honest answer is that the code had no tools for it.

The familiar failure

I reviewed publicly available codes of conduct from more than 30 large global organizations, including Amazon, Apple, Bank of America, Boeing, Goldman Sachs, Johnson & Johnson, JPMorgan Chase, Microsoft, Pfizer and Walmart, among others. The companies span financial services, technology, healthcare, energy and consumer goods and collectively employ tens of millions of people.

They set clear boundaries around prohibited conduct. Where they consistently fall short is in translating values into behaviors specific enough to guide decisions when conditions are demanding, times when priorities conflict, pressure rises and judgment is most likely to narrow.

Most codes fail in three ways.

First, they describe values without defining behaviors. Words like “integrity,” “respect” and “collaboration” appear frequently but are rarely translated into observable actions. What does respect look like when a team challenges a decision? What does integrity require when two priorities genuinely conflict? The code doesn’t say, so people infer, and what they infer is shaped by what they see rewarded, far more than what they see written.

Second, codes assume stable conditions. Under pressure, when priorities shift, timelines compress and information is incomplete, even well-intentioned people default to speed, self-protection and deference. The code often provides the least guidance in the situations where conduct risk is most likely to rise.

Third, incentives operate almost entirely outside the code. Performance systems typically reward output, responsiveness and individual achievement. Over time, people align with what is reinforced. When that happens, the code becomes wallpaper.

The result is predictable. The executive above was a victim of a culture shaped entirely by what got tolerated and rewarded, a culture the code could not see.

Measuring the wrong things

Most organizations are reasonably good at measuring failure after it happens — incidents, complaints, regrettable attrition, legal exposure. But the conditions that make those outcomes predictable tend to go untracked.

The EEOC has reported record results, including $660 million for employment discrimination victims in FY2025, the highest pre-litigation recovery in the agency’s 60-year history. False Claims Act settlements and judgments exceeded $6.8 billion in FY2025. In parallel to that, a 2026 Gallup benchmark report covering 141,000 employed adults across more than 160 countries puts global employee engagement at 20%, a new low, with 40% of employees reporting significant daily stress. The cost of disengagement, according to Gallup, is $438 billion in lost productivity annually, with only 46% of US employees strongly agreeing they know what is expected of them, a 10-point drop since 2020.

These are lagging indicators, evidence that something was already broken long before it became measurable. Leading indicators would look different. They would measure whether the conditions for sound judgment were present; whether dissent was heard before decisions were finalized, whether concerns surfaced early enough to change course, whether people felt safe enough to slow down when slowing down was the right call.

Most organizations collect pieces of this data: engagement surveys, exit interviews, pulse checks, disability, health and death claims, but they rarely connect them to the performance conversation, where behavior actually gets reinforced or corrected. Leaders don’t see the pattern until something breaks. By then, it’s already set.

Behavior as a measurable input to performance

The shift I’m proposing is straightforward to describe and harder to execute: Treat behavior as a measurable input to performance — tracked, evaluated and tied to how leaders are rewarded.

The standard “values rating” in annual reviews carries little consequence for how leaders are rewarded or whether the conditions for people to do their best work are actually protected. So does 360-degree feedback filed away in an HR system. The idea here is structural: For each performance metric, define the behaviors most likely to achieve it sustainably, alongside the behaviors most likely to undermine it. Then make both visible in the performance conversation.

Consider what this looks like in practice. A leader’s performance dimension might be “drives results.” The KPI is straightforward: Did the team hit its targets? The behavioral layer asks how. Were priorities made explicit, or did the team operate in ambiguity? Were trade-offs surfaced, or avoided? Was the pace sustainable, or did it run on stored capacity that is now depleted? Did the leader actively coach others through the work, or solve problems alone? When results became apparent, did credit reach the people who produced them?

The final performance score takes both into account. A leader who hits targets by burning out the team, suppressing dissent and avoiding hard trade-offs scores differently from one who builds the conditions for sustained performance. Both achieved the KPI. Only one built something durable, and that difference is exactly what should show up in the performance conversation.

The mechanics are less complicated than they sound. Organizations can begin with three steps.

Step One: Identify where decisions most often break down. Prioritization, escalation and cross-team coordination are common fault lines. These are moments where ambiguity and pressure intersect and judgment is most likely to narrow, and they follow patterns that can be mapped.
Step Two: Define what effective behavior looks like in those moments. How should competing priorities be surfaced? When is it appropriate to slow down and reassess? What does handling dissent well actually look like in a meeting? Effective behavior has to be made explicit to be evaluated.
Step Three: Build behavioral metrics into the performance conversation.

The table below illustrates what this looks like across six common performance dimensions:

The final score, for compensation, promotion or development conversations, reflects both the KPI and the behavioral pattern. A leader who delivers results through behaviors that compromise judgment, trust or team capacity is performing at a different level than the KPI alone suggests.

What this actually changes & the AI effect

The shift is perceptual as much as methodological.

When behavioral metrics become a genuine input to performance, tracked consistently and tied to real consequences, leaders start seeing the leading indicators they currently miss — decisions made without surfacing trade-offs, concerns that arrive too late to change course, teams that hit their numbers while quietly losing the capacity to sustain them.

The values in most codes are fine. The problem is that they live in a static document while the actual signals live in the performance system. Bringing them together is where the work is.

As organizations embed AI into core decision-making workflows, the stakes are rising. A McKinsey report frames AI, economic disruption and shifting workforce expectations as forces reshaping how organizations operate and lead. The message is practical: Scaling AI is not only a technical challenge; it is also a leadership and operating-system challenge.

AI can speed up analysis, recommendations and execution, but it can also make weak norms and strained relationships harder to see. If people are already hesitant to challenge a senior leader, will they challenge an AI-supported recommendation? If trade-offs are already unclear, will AI make them clearer or simply move the decision faster? And if AI replaces the conversations where trust is built, work may become more efficient while relationships weaken.

Codes of conduct need to address more than just what responsible AI means. Codes should be woven into AI-enabled workflows so guidance is available at the point where judgment is needed — drafting sensitive emails, summarizing meetings, assessing project status, updating pipeline activity, recording sales, training leaders, reviewing performance or escalating risk. If AI is becoming part of daily work, conduct guidance needs to show up there, too.

Some companies are beginning to build responsible AI principles, guardrails and review expectations into the design and deployment of AI systems, making guidance more active at the point of decision rather than only documented after the fact.

The same logic applies to human performance management. The principles have to be present where the decisions are made — in the performance conversation, in the tracked metrics, in what happens when a leader delivers results through behaviors that undermine the people around them.

The executive who spent a year recovering needed an organization that could see what was happening, one that had built the tools to name it, measure it and act on it. The code becomes real when it changes what leaders are held accountable for when the pressure is on.

Tags: Code of Conduct Corporate Culture

Put Behavior on the Scorecard

Treating behavior as a tracked, consequential input to performance turns values from wallpaper into something leaders are held to

Your Compliance Dashboard Can’t Tell You Everything About Employee Relations

Redesigning the Trade Compliance Operating Model for an Era of Structural Disruption

‘Why Didn’t Anyone Do Anything?’ Teaching Employees to Step In, Not Just Speak Up

Jaqueline Oliveira-Cella

Related Posts

What a D-Day Weather Forecast Teaches About Decision-Making Under Pressure

Culture Is Not a Workstream

Yes, You Are Allowed to Take Your Vacation

When Misconduct Reaches the C-Suite, Who Investigates?

‘Why Didn’t Anyone Do Anything?’ Teaching Employees to Step In, Not Just Speak Up

Browse Topics:

Put Behavior on the Scorecard

Treating behavior as a tracked, consequential input to performance turns values from wallpaper into something leaders are held to

The familiar failure

Your Compliance Dashboard Can’t Tell You Everything About Employee Relations

Measuring the wrong things

Behavior as a measurable input to performance

What this actually changes & the AI effect

Redesigning the Trade Compliance Operating Model for an Era of Structural Disruption

‘Why Didn’t Anyone Do Anything?’ Teaching Employees to Step In, Not Just Speak Up

Jaqueline Oliveira-Cella

Related Posts

What a D-Day Weather Forecast Teaches About Decision-Making Under Pressure

Culture Is Not a Workstream

Yes, You Are Allowed to Take Your Vacation

When Misconduct Reaches the C-Suite, Who Investigates?

‘Why Didn’t Anyone Do Anything?’ Teaching Employees to Step In, Not Just Speak Up

Follow Us

Browse Topics: