Most managers were never taught to manage
The promotion path in most companies rewards strong individual work with a management title, then leaves the new manager to figure out the people part alone. The data on how common this is is stark. Gallup's State of the Global Workplace report finds that only 44% of managers have ever received formal management training, while managers account for roughly 70% of the variance in team engagement. The same research found that even basic training cuts active disengagement among managers by about half, which means the gap is not just common, it is the lever with the most leverage attached to it.
The cost of leaving it unaddressed has been rising. Gallup recorded manager engagement falling from 31% in 2022 to 22% in 2025, the steepest drop of any group in the workforce, with the largest single-year fall between 2024 and 2025. Managers have lost what Gallup calls the engagement premium and are now about as engaged as the people they lead. When the person who sets the tone for a team is stretched and unsupported, the team feels it.
Judging people is where capability gets tested
A manager does many things, but the one that carries the most weight, and the most risk, is evaluating performance. A rating drives pay, promotion, and who stays. It is also the task where an untrained manager is most exposed, because human judgment of other people runs on predictable shortcuts. These are not character flaws. They are systematic errors that every rater makes, and the people making them are usually unaware they are doing so. Naming them is the first step to controlling them.
Calibration is the established correction
Calibration is a structured session where managers who supervise comparable groups compare their proposed ratings with each other, guided by HR or a neutral facilitator, before any review reaches an employee. The purpose is consistency: to make a 4 from one manager carry the same weight as a 4 from another. It is, in plain terms, a review of the reviews. SHRM describes the core sequence as managers posting names and proposed ratings for all to see, discussing each, and adjusting to assure accuracy and consistency before final appraisals are prepared.
The reason it works is that the errors above are hard to catch from inside your own head but easy to spot from outside. When one manager's team is rated uniformly higher than a comparable team, the discrepancy is visible in the room and gets examined. Performance-management specialist Dick Grote has noted that calibration also makes it easier for managers to deliver honest but negative appraisals, because the standard is shared rather than personal, and it exposes strong performers to a wider set of senior leaders.
What separates a useful session from a political one
Calibration done badly is worse than none at all. The common failure, documented by SHRM, is that sessions defer to the loudest or highest-ranking person in the room and end up calibrating one set of biased ratings against another. A few conditions keep a session honest.
For a manager who has never run one, the gap between knowing calibration matters and being able to facilitate it well is exactly the capability gap this note opened with. The skills are learnable, which is the encouraging part: structured preparation, evidence-based discussion, and a clear rubric turn a vague exercise into a defensible one.
Where these figures come from
Primary sources
- Gallup, State of the Global Workplace 2026. The source for managers driving roughly 70% of the variance in team engagement, the 44% who have received management training, the halving of active disengagement among trained managers, and manager engagement falling from 31% in 2022 to 22% in 2025. gallup.comChecked 24 June 2026
- SHRM, Improving Performance Evaluations Using Calibration. The source for the calibration sequence, where managers post and discuss proposed ratings then adjust for consistency, and for Dick Grote's points on honest appraisals, skilled facilitation, and bringing data rather than views. shrm.orgChecked 24 June 2026
- SHRM Labs, Fixing Performance Reviews. The source for the failure mode where calibration defers to the loudest or highest-ranking manager and ends up calibrating biased ratings against other biased ratings. shrm.orgChecked 24 June 2026
- SHRM Certified Professional, rater errors in performance measurement. The source for the taxonomy of rater errors: halo and horns, leniency and severity, central tendency, recency, and similar-to-me bias. SHRM-CP referenceChecked 24 June 2026
- Dartmouth College HR, Common Rater Errors. A university HR reference confirming the standard rater-error definitions and the point that observers are usually unaware they are making them. dartmouth.eduChecked 24 June 2026
These figures describe patterns across organizations, not a rule for yours. The right rating scale, calibration cadence, and manager-development approach depend on your size, your roles, and how your performance process is built. This note is general information to support better management practice, not a mandate.
Tools that build manager capability
From first-time supervisor to a fair, consistent review
Common questions
Gallup's State of the Global Workplace research finds that only 44% of managers report ever receiving formal management training. The same research finds that managers drive about 70% of the variance in team engagement, so most companies are leaving their single biggest engagement lever undeveloped.
It is a structured session where managers who supervise comparable groups compare their proposed performance ratings before any review reaches an employee, guided by HR or a neutral facilitator. The purpose is to make a given rating mean the same thing across teams, so a 4 from one manager carries the same weight as a 4 from another.
The well-documented ones are recency bias (recent events outweigh the full period), the halo or horns effect (one trait colors every score), leniency or severity (the whole scale shifts up or down by manager), central tendency (everyone clusters in the middle), and similar-to-me bias (familiarity reads as competence). Most raters make them without realizing it.
No. The goal of calibration is consistency, not a forced distribution. A good session aligns what each rating means across managers so the standard is shared. Slotting people into a predetermined curve is a different practice, and calibrating biased ratings against each other or against a curve is the common way the process goes wrong.