Can We Build a Performance Evaluation System That Earns Real Buy-In?

According to this year's JobKorea survey, 57.1% of employees said their company's performance evaluations were unreasonable, and a stunning 82.7% said the evaluations had pushed them to consider changing jobs or actually start looking. How should we read a situation in which more than half of an organization is unhappy with its evaluations?

Psychology gives us something called the Dunning–Kruger effect. It is a well-known cognitive bias: people with limited ability overestimate their own competence, while highly competent people underestimate themselves. If the theory holds, then at the individual level, a "fair evaluation" — judged subjectively — may be little more than an idea that does not exist in reality. If we only approach evaluation as a passive state in which there is no individual dissatisfaction with "fairness," dissatisfaction simply persists. What matters more, I believe, is something stronger than fairness: employees' "buy-in" — a felt trust that even if they have some reservations, the criteria and procedures the organization uses will produce a reasonable outcome. So how do we build evaluations that employees can actually buy into?

The Precondition for Buy-In: Defining "Performance"

The first thing organizations with widespread evaluation complaints tend to think about is probably a system overhaul. In consulting conversations with clients, a fairly common refrain lately is that they are considering adopting OKRs to raise fairness in evaluation. That confuses ends with means, or misweights them. The essence of evaluation is a judgment about performance, and unless people deeply buy into the definition of "performance" up front, they will never buy into the results either.

On that score, Microsoft is worth looking at. Microsoft defines performance as "Impact." Impact covers not just what an individual accomplishes on their own but also the effect they have on the success of others. Under this definition, employees naturally come to treat collaboration as the most important element of how they work, and they accept low marks for achievements that stay purely at the individual level. Of course, clearly defining company-wide performance and reaching alignment on it is not easy.

My recommendation: start by thinking about what is distinctive about your sector (IT, manufacturing, finance, etc.), and then about how the performance of individual functions within it — sales, research, production — can be defined in differentiated ways.

Think About "Institutional Reasonableness"

Designing and running a system that employees perceive as reasonable is another critical piece of the puzzle when it comes to earning buy-in.

As the business environment shifts rapidly and the MZ generation takes the lead, evaluation has seen plenty of change too. The most prominent themes: ongoing performance management and feedback, 360-degree (multi-rater) diagnostics, and a move toward absolute evaluation. For a sense of how these changes have been institutionalized, the Microsoft case in <Figure 1> is worth reviewing.

Continuous Performance Management and Feedback

Evaluating countless performance activities that happen throughout the year with a single review at year-end is, fundamentally, a hard structure in which to earn employee trust — especially when people have received no feedback in the meantime and then get a result that does not match their expectations. Today, unlike in the past, we can lean on sophisticated performance management systems to flex goals and deliver real-time feedback, continuously building shared understanding of performance levels throughout the year. But now that the infrastructure is in place, what needs attention is less the format and more the substance. Formulaic feedback of the "nice job!" variety will not win employee buy-in.

In a JobKorea survey, MZ-generation employees chose "a manager who gives clear feedback" as their number-one ideal boss.

Leaders who have historically struggled with uncomfortable feedback should take this shift seriously and start giving unvarnished feedback on concrete performance.

Maximizing the Strengths of 360-Degree Diagnostics

Many organizations remain distrustful of 360-degree diagnostics. There is a vague worry that the process will devolve into a popularity contest, and negative incidents at certain companies — amply covered in the news — have reinforced that unease. And yet, given that the biggest single cause of evaluation complaints is typically judgments skewed by a single evaluator's subjectivity, 360-degree diagnostics remain a compelling way to earn employee buy-in.

Because of this, a growing number of organizations are looking for ways to maximize the strengths of the method while containing its weaknesses — limiting input to a choice of strength/weakness keywords, for example, or eliminating numerical scoring entirely and delivering only descriptive feedback. Some go further and use 360-degree diagnostics as the primary evaluation mechanism outright.

Netflix replaced its traditional annual review entirely with 360-degree feedback. Each employee must provide feedback on at least ten colleagues, and the text field used to write that feedback is the evaluation rating.

Absolute Evaluation Tailored to Your Organization

For a long time, relative (forced) ranking has been blamed as the primary driver of the gap between actual performance and the ratings that represent it. Only a few years ago, most organizations saw a switch to absolute evaluation as premature — worried about ratings inflation and the over-rewarding that would follow.

But today's employees no longer buy into a system where their rating is adjusted after the fact because of someone else's performance. With a body of research building around approaches like limited absolute evaluation (which accepts relativity at a subset of rating levels) and simulations that link rewards to the resulting rating distribution, there is no longer a strong reason to hesitate on absolute evaluation.

One interesting study is worth mentioning: at S Group, where we consulted, leadership tracked the evaluation rating distribution for several years after moving to absolute evaluation. Contrary to the initial worries, the results showed only a very small upward drift (the share of A ratings rose by about 5%), and no broader evaluation distortions were detected.

Of course, these outcomes reflect the specific company, its leadership, and its evaluation practices, so interpret them with your own judgment.

A Mechanism of Mutual Benefit

Genuine buy-in is hard to surface from a vertical, one-way relationship. From the organization's point of view, eliminating unfairness is only a hygiene factor. Real buy-in presumes voluntary participation grounded in mutual trust, and that voluntary participation tends to surface when motivators — things like personal benefit —

are present.

Consider a recent trend as an example. Until recently, in many companies, job capability was assessed by grading individuals on a fixed scale and then aggregating a single composite score that fed mechanically and uniformly into rewards and promotions — an extremely organization-centric evaluation element.

By contrast, the more recent concept of "Skill-based HR" lets the organization identify skill gaps and use them in hiring and deployment, while at the same time giving individuals a personalized, granular skill set they can use to proactively explore concrete career opportunities aligned with the skills they hold, and to differentiate rewards according to the importance of those skills.

Compare the two: it is obvious which system's employees will more readily buy into their job evaluations and engage with them actively.

From that angle, it is worth HR asking whether the individual activities that happen across the evaluation process contain content compelling enough to offer employees concrete benefits such as growth and reward.

Even in the AI Era, Evaluation Is Ultimately About People

Less than two years after ChatGPT's formal release, the pace at which AI is being adopted in HR is striking. In the evaluation space, too, there are ongoing attempts to use AI to secure fairness. There are indeed areas where the benefits of AI are clearly predictable. Summarizing a year's worth of performance data, analyzing communication and collaboration patterns, and drafting feedback — all of these, used well, can contribute substantially to building a rational basis for evaluation.

Whether AI can go beyond such a supporting role and actually "replace" human evaluation is a different question. It may feel like a distant prospect, but

attempts in that direction are already underway. One AI solution offered by a company I'll call G has AI generate questions to diagnose job capability, propose model answers and scoring criteria, analyze the responses, and issue an evaluation opinion.

On a personal note, I recently ran sample tests on several major AI large language models (LLMs) and directly encountered errors of various kinds — including hallucinations, in which generative AI produces false information and presents it as fact. If AI-based evaluation continues to advance, verifying the trustworthiness of the AI model itself stands out as a very large piece of unfinished homework that will need to be addressed first.

In the end, how will people react if the era of AI evaluating humans really arrives? Will we willingly buy into the evaluation results AI produces?

A recent experiment at Cornell University is worth pausing on. The short version of the results: simply thinking that AI was doing the evaluation reduced the rate at which participants offered creative ideas or worked autonomously. Even when the feedback was identical, participants expressed dissatisfaction with AI-delivered feedback 23% more often than with the human-delivered control. The biggest takeaway is that even in the AI era, the essence of evaluation is still about people, and genuine buy-in is ultimately a matter of emotion. Recent advances can track gaze and expression, pick up vocal inflection, and count how often positive words are used —

but we should never forget that, behind the screen, the subject is a human being with feelings.

An evaluation system employees can genuinely buy into cannot be built by stitching together a patchwork of plausible mechanisms. It calls for a holistic approach: establishing a framework of sound, reasonable systems on top of a clearly agreed definition of performance, and thinking through both the mechanism of mutual benefit and the role of technology. And more than anything else, it requires an HR sensibility that treats employees not simply as the objects of evaluation but as active participants and as people — and is willing to work at drawing out their positive emotions.