Measuring Hiring Manager Effectiveness

Hiring is one of the most important parts of a manager’s job. Make good hires and your team (and thus the whole company) will have better results. Make poor hires, and those people will drag the team down. In the worst cases, a toxic hire can drive other staff to quit, totally destroying the team.

Strangely, for such an important part of the job, hiring performance seems to be very poorly measured. I’ve never had a hiring-related metric be part of my measures as a manager, nor have I heard of other managers who have their performance around hiring measured. I’ve mostly seen it measured in a weird punitive way: make a poor enough hire, and the manager gets fired, too. But we don’t seem to measure hiring performance in a way that can separate average from good, and good from great. More importantly, without good measures, we don’t drive improvement.

Over the last few years, I’ve developed a metric I use to measure hiring performance. It’s simple to calculate, and reasonably effective at revealing performance differences between managers. Here’s the formula:

For every report you hire, track that report’s bottom-line performance review score for the first two years of their tenure. (If you’re not doing at-least-yearly performance reviews: you should, and here’s one good reason to start.)
It doesn’t matter how frequently you do performance reviews (yearly, quarterly, etc); we’ll control for that in Step 4.
If someone transfers to a different team, or is promoted to a different role: continue to track their reviews. A hiring manager should continue to be measured by their hiring performance, even as that person moves around (but also see Rule 3).

Normalize those scores to a 5-point scale:

Score
1	very poor performance; well below expectations
2	somewhat below average performance; below expectations
3	average/expected performance; meets expectations
4	somewhat above average performance; exceeds expectations
5	very high performance, vastly exceeds expecting

This step’s optional, but it allows scores to be compared between companies with different performance review scales. This means a manager can track their own score across their career, if they want, and it means you can compare manager performance with peers at other organizations.

For this to work as a point of comparison between teams, or for you to carry this with you across many teams and jobs, these ratings need to be fair. This means a “3” should be considered a normal, common rating for someone doing a fine job; not a slight. Many organizations suffer from grade inflation; where managers are expected to really only give 4s and 5s. Note #2, below, addresses this.

If you fire someone, score a zero.
Having to fire someone is an extremely poor outcome – worse than the worst performance review – and thus rightfully drags the score way down.
You should also record zeros for scenarios where someone didn’t technically get fired, but should have been. So if someone quits to avoid being fired, or quits when it comes apparent that they simply can’t perform, or if HR blocks a firing and transfers the person instead, etc., – record a zero. Be honest.
Calculate an overall score by summing each individual rating and dividing by the number of reviews
This is how to normalize for reviews of different cadences (e.g. yearly, quarterly, whatever), or for a different number of reviews per direct. For example, I prefer to give a review every time someone changes roles, or at least annually. (So a promotion, or a change in job responsibilities, or a transfer to a different team: these all trigger a review.)

Example

To make this concrete, here’s a fictional example. Suppose Shawn has hired six people over the last three years, and suppose that they give performance reviews to those people twice a year. Their ratings might look like

Person	Hired	2018-Q1	2018-Q3	2019-Q1	2019-Q3	2020-Q1	2020-Q3
Alice	Nov 2017	3	4	4	3	-	-
Bob	Jan 2018	-	1	0 (fired)	-	-	-
Carol	Jan 2018	-	2	3	4	5	-
David	Nov 2018	-	-	4	3	2	1
Erin	Nov 2018	-	-	3	3	3	3
Frank	Feb 2020	-	-	-	-	-	3

Given this table, Shawn’s total rating can be calculated as

total scores = sum(all individual scores) = 54
score count  = count(individual scores)   = 19
rating       = total scores / score count = 2.84

2.84 isn’t particularly good. It’s below average, in fact: by definition, 3.0 is average. Notice what happens if we remove Bob from the equation (i.e., if Shawn avoids making that bad hire). Shawn’s score increases to 3.05 (52/17), a reasonable score. This shows the intentionally-large effect a poor hiring outcome has on this metric.

Please use this and report back if you do!

I hope you’ll use this metric, yourself or with your teams. If you do, please let me know how it goes! I’ve validated this myself at a couple of organizations, and it’s been very useful. But it’s not widely used enough for me to know how well it tracks across many teams and companies.

I’m particularly interested in establish what a “good” score should be. Over 3.0, certainly, but by how much? Intuitively, I feel like any manager who’s below, say, a 3.5 probably needs some help on improving their hiring practices… but is that true? I don’t really have great data on how “good” is “good enough”. I think it’s hard to say without more data and feedback.

Marginalia

If you’ve made it this far, a couple of sidebar items that didn’t quite fit above, but are worth mentioning for those who want to get into the weeds a bit more.

Why two years? That’s the least scientific part of this metric; it’s pretty arbitrary. Over time less and less of someone’s performance can be attributed to the skills they entered with, and hence less and less of their performance can be attributed to a good or bad hire. Two years just feels like a reasonable point when these factors converge, and we can stop attributing someone’s performance to their hiring manager’s skill.
What if my workplace treats “3” as a poor review?
If your workplace suffers from performance review grade inflation, you need to correct it for tracking to work well across teams. This means maintaining your own records with the normalized rating, separate from the “official” inflated ratings.
I worked in one place where a 3 was considered a failing grade, but weirdly HR seemed highly reluctant to allow 5s, and would push back on more than one 5 per team per review cycle. This meant that I ended up more or less giving everyone 4s – but in my own records, I kept their normalized ratings.
This is unfortunately fairly common. For whatever reason, the tech industry has decided that “does the job as expected” is somehow a bad thing. It’s not; most people should do the job basically ok. Look at it this way: if someone keeps getting 5s, they’re well overdue for a promotion. In a fair and normalized review system, 3s would be common and expected.
Some background and theory: this is based on a very simple premise, which is that at its core a manager’s job can be distilled down to two components:
- Results: a manager is responsible for the aggregate output of their team
- Retention: a good manager keeps her people (the good ones)
This measures both at once: results (in that someone’s performance review is an aggregate of the results they’re able to get), and retention (in that scoring a 0, i.e. failure to retain, hurts the score a ton.)
It doesn’t measure other times of retention failure – i.e. someone leaving on good terms because they get a better offer – but it’s not really designed to capture that. Use other metrics for that.
You can easily adapt this metric for a Director, VP, and all the way up: a manager-of-manager’s score is simple the aggregate of all of their reporting managers. This can easily be used to reveal hiring performance differences on a department-wide or organization-wide basis.
I’ve done this once and it was revelatory: it showed one VP with absolutely abysmal departmental performance. Some digging revealed that this VP had his team using entirely different hiring techniques from the company at large. Since his results were demonstrably worse, he switched, and we watched the numbers start to pick up.