Jacob Kaplan-Moss

Work Sample Tests:

The tradeoff between inclusivity and predictive value

In my previous article introducing work sample tests, I wrote:

Some form of work sample test is critical to hiring well. However, work sample tests are also a minefield: the space is littered with silly practices like whiteboarding, FizzBuzz, Leetcode, and “reverse a linked list”-style bullshit. The point of this series is to separate these silly practices from the good ones and to give you a framework and several examples to use in your hiring rounds.

It’s time to start building that framework. To begin, let’s explore why I say that work sample tests are a “minefield”:

The tradeoff between predictive value and inclusivity

When I think about good hiring processes, I’m trying to maximize two things:

  • Inclusivity: we want to make sure that our hiring process is accessible to as many candidates as possible. We don’t want a process that screens out good candidates for things that won’t matter to their ability to do the job. This is why, for example, good interview processes give candidates choices of times to interview: we don’t want to miss a great candidate just because they have a scheduling conflict!
  • Predictive value: we want our screening process, whatever it is, strongly correlate to a candidate’s eventual job performance. When someone does well in the screening process, we want to be as confident as possible that they’ll be great at the job.

Unfortunately, work sample tests often bring these goals into conflict.

Work sample tests are predicated on the premise that the best way to figure out if a candidate can do the job is to just ask them to do the job. Of course, real jobs don’t take place in nice tidy short chunks. Real work is ongoing and long-term. Any distillation of a real job down into a small bite-sized chunk is necessarily a simplification. And the more we simplify, the more we lose predictive value. Thus, if we want to maximize predictive value, a work sample test should be as long as possible.

The gold standard of predictive value would be a contract-to-hire arrangement, where we contract with someone for weeks or even months before deciding whether to bring them on full-time. I’ll talk more about contract-to-hire later in this series; for now, I just want to note that it’s probably unrealistic for most situations. For one, if you have more than a couple of promising applicants, extending a long-term contract to them starts getting unrealistic. But more importantly for this article: there will be many promising candidates for whom “quit your job and contract with us for 3 months and then maybe we’ll offer you a full-time job” is an unfair request. Contract-to-hire would screen out anyone who couldn’t afford to take that risk. And unless you have some wild job requirements, “very high tolerance for financial risk” is unlikely to correlate at all with the duties of the job!

This is an extreme example, but it clearly shows how work sample tests cause inclusivity and predictive value to come into conflict. This conflict exists in every version of work sample tests. If we ask candidates to work on an exercise over a weekend, that screens out people who have other obligations on the weekend. If we ask candidates to spend a few days on-site with us, that screens out candidates who have child- or elder-care responsibilities at home, or who just can’t get away from their current job without causing problems. If we ask someone to write code on their personal computer, we screen out people who don’t own a computer!

Now look, my point is not that work sample tests are “bad” because they require some sort of inclusivity tradeoff. This is true of every step in the selection process, work sample tests or not. Here’s the other extreme: we ask almost nothing of a candidate, and hire based on an initial snap judgment (this is the “would want to have a beer with” test). This also fails, and the failure is worse: when you inevitably hire someone who’s terrible at their job (or toxic), and the rest of your team is forced to work with that dingus.

The point is that there’s always a tradeoff between predictive value and inclusivity. The first, most important, and guiding principle of work sample tests is: construct a test that balances predictive value and inclusivity. Fair work sample tests will be predictive enough to give you a high degree of confidence that you’re making a good hire, while also being designed to be as accessible to as many candidates as possible.

So that’s the first principle of good work sample tests. Next up in the series, I’ll cover my framework for constructing good work sample tests, most of which is predicated on this first principle. Then we’ll dive into some examples.

Watch this space, and if you want to be notified when these next parts ship, follow me on Twitter, or subscribe to my RSS feed or newsletter.