A Framework for Good Work Sample Tests: Eight Rules for Fair Tests

Welcome back to my series on work sample tests. Previously, I explained why they’re important but difficult to get right because the tension between inclusivity and predictive value. (If you haven’t read those previous articles you probably should before continuing; I don’t know that what follows will make as much sense without that background.)

Now, let’s start moving from the abstract to the concrete. What makes a “good” work sample test? Here, “good” means that a test has high predictive value, while also being as fair as possible to all potential candidates. Several kinds of tests meet this bar – and I’ll cover some of them later in the series. Here’s my framework for constructing those. It’s a set of eight principles that, if followed, give you a great shot at constructing a good work sample test.

Summary

This is a long article: these principles are important, and they interrelate, so I want them all in one place. So let me start with an executive summary of these eight principles:

Simulate real work as closely as possible: always use exercises that are close as possible to the real tasks candidates would perform if hired.
Limit work sample tests to less than 3 hours …
… but allow candidates to find those three hours anywhere they can; be very flexible with scheduling, and avoid deadlines.
Provide as much choice to candidates as possible: give them the choice of several kinds of work sample tests, programming languages, environments, etc.
Use tests as the start of a discussion, not as a simple pass/fail.
No surprises: tell candidates ahead of time about the work sample test, and give them clear instructions when assigning the test.
Test your tests internally before giving them to candidates.
Offer exercises late in the hiring process. I recommend they be the penultimate step, before your final wrap-up interview with the hiring manager.

1. Simulate real work as closely as possible

The goal of a work sample test is to be as close as possible to a real task that the person would perform if hired.

This is easier in some cases than in others. For example, when I’ve hired bug bounty triage staff, I’ve asked them to triage real past bug bounty submissions, the same as if they worked there¹. We then compared candidates’ results to the work produced by our staff. That’s about as direct a simulation as you can get – the gold standard for an accurate simulation.

But for other roles, particularly software engineering, this is harder. The ideal would be to ask candidates to work on your actual real codebase, but frequently that’s not realistic. So you need to find a task that is very similar to real work while being standalone enough to be doable within the timebox.

My go-to exercise is usually some variation of “write a program to parse this file (CSV, JSON, etc.) and answer some questions about it”. (This’ll be the first exercise I cover in detail, in the next entry in this series.) I like this question because nearly every engineer at the places I’ve worked ends up facing some sort of version of this task. But that’s not universal; if the role you’re hiring for would never have to do something like this in real life, don’t use this exercise!

Always use an exercise that closely simulates real work. This means:

Don’t include tasks in the exercise that wouldn’t be part of the real job. For example, one place I worked used pretty realistic work sample tests, but then asked candidates to give a presentation (meeting room, slides) about their work. This might have been fine for roles where public speaking was important, but that’s not what we were hiring for. These were pretty typical software engineering roles. Do typical software engineers need to get up in meetings and present their pull requests? If not, don’t ask candidates to do it.
Use the real languages and tooling used at the company (within reason). If you’re primarily a Python shop, don’t ask candidates to write C++. If the company uses multiple languages, let them use multiple languages. This is a rough rule; there are some exceptions that we’ll cover later.
Have the workflow simulate your real workflow. Pair programming is a good example: if your team often practices pair programming, your work sample tests should include pair programming. If you never pair program, it’s not as good a choice for a work sample test. (I’ll cover pair programming in more detail later in the series, too.)
Don’t introduce bullshit restrictions. I’ve seen some work sample tests that tell people not to use Google or Stack Overflow – or, worse, force them to use disgusting spyware crap that actively monitors their computer and even their room via webcam. Unless the job requires writing software on an air-gapped computer², this restrictions like this make the work sample test more of a hazing ritual than anything predictive.
One thing that makes great engineers productive is their ability to quickly find answers to questions they don’t know! We have an amazing wealth of knowledge at our fingerprints, and we want to hire candidates who know how to access that information.

2. Use a strict timebox: limit tests to under 3 hours

As previously covered, the more time you ask candidates to spend, the less inclusive and equitable your hiring process will be. How much time is appropriate?

Three hours. A work sample test that takes longer than three hours moves into “unreasonable” territory.

There’s nothing magic about this number; here’s how I get there. The rough norm in the tech industry (well, pre-COVID) is a day on-site. There’s wide variation here, but it’s pretty typical to ask candidates to spend one day, 9-5ish, at your office doing some interviews. This isn’t perfect: it’s not something that everyone can do. But few would argue that asking a candidate to spend a day interviewing is unusual. This typical practice establishes an upper bound on what’s reasonable to ask for candidates: about 8 hours, maximum, for all selection-related activities. Even that’s a lot: in the typical on-site scenario there’ll be some downtime: breaks, lunch, an office tour, etc. If you find yourself trying to justify “just one more thing” and doing clever math to fit into 8 hours, you’re probably going wrong. A good hiring process can complete in well under that mark. Any more than is unreasonable, in my book, unless you pay candidates for their time.

A good interview process will involve several interviews (2-3 interviews, 60-90 minutes each), plus time on logistics (about an hour). That’s roughly 4-5 hours of interview and logistics time. Putting the limit for a work sample test at 3 hours ensures you stay well below the 8 hour total, with a bit of buffer.

A final note on this time limit: you should make this limit as “hard” a limit as you reasonably can. If it’s more of a suggestion than a limit, then you’re implicitly rewarding people who can afford to spend more time. This means your selection process is weighted towards people with lots of free time – people who don’t have family care obligations, demanding jobs, etc.

What I do is clearly explain to candidates that the exercise should take no more than 3 hours, and tell them “if you hit that 3-hour limit, stop.” Or use exercises that are synchronous and scheduled, so I can enforce the time limit. I also tell them that completion isn’t necessary to get an offer; I’m clear that the exercise is the start of a conversation, not a pass-fail. This is usually sufficient to make sure that the vast majority of candidates spend about the same amount of time.

3. Be very flexible with scheduling; avoid deadlines

Most work sample tests that I use are asynchronous – that is, I assign the work, but the candidate completes it on their own time. Not all are: some of the exercises I’ll cover are synchronous. But since I’m mostly hiring people to work on distributed, mostly-asynchronous teams, asynchronous exercises most closely match work conditions, so I use them the most. These teams are increasingly becoming the norm, so I feel comfortable recommending mostly asynchronous exercises to others.

Asynchronous exercises also have the advantage of providing maximum flexibility to candidates: they can fit those 3 hours around whatever all else they have going on. You should always offer that scheduling flexibility; wouldn’t it be silly to miss out on a great candidate because they just can’t find the time to complete your challenge before some arbitrary deadline? I like to give candidates at least a week, preferably more, and to be very clear that there’s no hard deadline.

4. Provide as much choice to candidates as possible

This is one important case of a more general rule, which is to give candidates as much choice and flexibility as you possibly can during the whole process. It’s tempting to come up with all sorts of “rules” around work sample tests and to try to fully constrain what candidates can do. This can sometimes come from a good place – a desire for fairness – but remember that fair is not the same as equal.

In real jobs, people have different strengths and weaknesses and approach work tasks very differently. At work we care about outcomes, not methods; we want our colleagues to approach problems in ways that play to their strengths. We don’t insist that every programmer on our team fit into some average archetype of a programmer; we let them approach problems in the ways that work best for them. The same goes for job candidates.

The most important application of this rule is to offer multiple choices of exercise. I’ll write about several different kinds of work sample tests, but these are not mutually exclusive! It’s a great idea to offer candidates a few different choices, and let them select the one that’ll work best for them. I’ll usually offer at least three choices:

Some form of take-home coding “homework”, which candidates can complete on their own time
Synchronous, “let’s get on a video call and work together” option, which could be formal pair programming, or something less formal
Some sort of “show me code you’ve written in the past” option, where candidates can use work they’ve done at some point in the past

I’ll also offer as many choices of language and environment as I possibly can. I tell candidates which languages will be used in the job, but invite them to submit code in similar languages as well. For example, if I’m hiring a Django developer, I’d be very happy to hire someone who knows Ruby/Rails and is happy to learn Python/Django. These are similar enough that if the rest of the package is there – great interpersonal skills, strong conceptual understanding, knowledge of other tech skills that will apply like SQL, HTML, etc. – it’s an easy hire over someone who already knows Django but otherwise kinda sucks.

Make sure these choices really are choices. If you’re going to think worse of a candidate who chooses Ruby over Python, or who chooses pair programming over homework, check those biases. If they’re not applicable, do the work to get over your biases, and offer the option.

The exception, the place where you do need to lock down and not allow choice, is for skills or behavior that are fundamental to the job. For example, Microsoft recently was hiring for someone to work on CPython performance. If I were the hiring manager, I would feel comfortable making the work sample test require that the candidate use C: the job will require reading and writing some fairly complex C code, so I’d feel comfortable asking the candidate to demonstrate their skill in a work sample test.

5. Use tests as the start of a discussion, not as a pass/fail

It’s terrifically tempting to try to automate work sample tests – particularly those involving code submissions. After all: we’re asking candidates to write code, with an expected output, so why not just run the code and check that the output matches.

This is a mistake. Resist the temptation to automate.

Likewise, also resist the temptation to directly map “correct code output” to “pass”, and “incorrect output” to “fail”.

I’ve hired candidates who submitted broken code; I’ve also rejected candidates who submitted perfect code. I’ve never regretted those decisions. There’s a lot more to software development than writing code.

As I’ve said over and over in this series, in the real world, programming is a team sport. Think about the last few pull requests at your company: how many of them were 100% correct on the first try? How many times did the author need a bit of help, or just a question answered, before or after they submitted the pull request? In real jobs, the expectation isn’t that you get everything perfect on the first try (if it were, we wouldn’t need things like peer review, unit tests, or continuous integration). The expectation is closer to “do a reasonable job, then ask for and incorporate feedback.” We should set a similar bar for work sample tests: “perfect” doesn’t mean “job offer”, and mistakes don’t mean a rejection letter.

Instead, use the work sample test as the beginning of a conversation. Don’t just ask someone to write code; ask them to talk about it with you. Don’t just ask them to find software vulnerabilities; ask them to tell you how they found them. This expands the scope of the work sample test to include all the important communication behaviors we expect from good software engineers. We want people who can write good code and also explain how it works.

This also helps with cheating. Cheating – which I’m defining as submitting someone else’s work instead of your own – is rare, but not unheard of. I suspect cheating in roughly a dozen cases out of several hundred work sample tests I’ve offered, which leads me to believe that around 1-5% of candidates will try to cheat on a work sample test. This is rare enough that I’m not overly concerned but common enough to warrant a little bit of defensive action.

Luckily, defensive action also makes work sample tests better; adding a follow-up conversation makes the test more inclusive and raises its predictive value; win-win. So I don’t consider work sample tests to be good enough unless they incorporate some sort of communication or follow-up about the exercise. And, I always consider candidates’ performance on work sample tests in the context of the rest of the selection process.

All of my example work sample tests in this series will have follow-up conversations as part of the exercise; yours should too.

6. No surprises

I remember showing up to what I thought was going to be a conventional interview, only to be plunked down in front of a strange laptop and asked to write some code. The computer was an old Windows machine running an IDE I’d never used, and the language was one I barely knew (it was listed in the job ad along with a half-dozen others I knew much better). It was pretty excruciating. Embarrassing for me, and a waste of time for them. Thankfully they’d only needed to cover my train ticket and not some expensive flight and hotel!

This sort of surprise work sample test is annoyingly common (though, thankfully, not always that bad). I had a friend get several interviews deep before suddenly being asked to do a work sample test that would have taken her at least 2-3 full days. If she’d have known about this unreasonably a requirement, she never would have applied for the job. Again, a waste of time all around.

The principle here is no surprises: be very clear up-front about all the requirements. This means:

In the job posting, write a bit about the selection process (how many interviews, what kind of work sample test, how much time to expect, etc).
In your initial call with a candidate (a.k.a. “phone screen”), go over the selection process in a bit more detail, make sure they know how much time it’ll take and what the timeline looks like.
When you want to give a candidate a work sample test, give them a complete briefing that covers everything about the assignment, what’ll happen next, and so forth. 18F’s instructions to candidates are excellent (but, bias alert: I was the original author of that document).

There will always be some candidates who won’t want to do your work sample test, even if it’s one you think is reasonable. I have a friend who’s job hunting right now, and he’s decided not to accept work sample tests. He hates them and has decided that the hiring market for his skills is hot enough that he’ll find employers who won’t require them. He’s probably right! If he was applying for one of my jobs I’d try to find some sort of middle ground option we were both comfortable with, but if we couldn’t I’d be OK with it. If I’m comfortable that my process is fair – by following all the guidelines in this doc – I’ll feel fine if I lose candidates to places with less onerous hiring processes. I just want to be clear about the requirements right at the start so that I’m not wasting anyone’s time.

7. Test your tests internally before offering them to candidates

How do I know that an exercise will take less than three hours? I test my tests.

After developing a new work sample test, but before offering it to candidates, you should run a few people on your team through the exercise. Ideally, you’d test the test on staff in roles similar to the one you’re hiring for. If people already working for you perform poorly on the exercise, it’s probably too hard for candidates (or you’ve made some terrible hiring decisions in the past).

You should assume that people already on staff will on average have an easier time of it than candidates. They’re working on it on official work time, not trying to squeeze in your work sample test around other obligations. They’ll also generally be more familiar with the problem space than the average candidate because your work sample will closely match real work.

This means that if your team struggles, or if it takes them right up to that three-hour limit, the test is probably too hard or long. I generally look for exercises that most people on my team can complete in about one hour: that way I know for sure, even adjusting for candidates vs staff, I’ve got a properly scoped exercise.

8. Offer exercises later in the hiring process

My final principle answers a common question: when in the process should the work sample test be assigned?

I think they should be late in the process. Late enough that there are already some reasonable signs that this person might be a good match. Work sample tests require a big chunk of time and are often the least pleasant part of a selection process. It’s rude to ask candidates who have no real shot at an offer to spend three hours writing code.

This is why I particularly dislike work sample tests that are the initial “gate” to a job interview. It’s just incredibly rude to ask a candidate to spend hours writing code only to get to a hiring manager who rejects them after 15 minutes.

The earliest I would consider asking for a work sample test is after your initial phone screens. I usually recommend offering them as part of your general interview panel, scheduled among your other interviews. If you have a final “wrap up” interview with the hiring manager (I usually do this), I’d recommend putting the work sample test just before that final one. As long as the candidate knows what to expect, that feels fair.

Next up: examples!

With all of this foundation under our belts, I can start to share some examples of work sample tests that you can apply to your hiring process. I’ll cover:

coding “homework”
code review, both of the “you review their code” and “they review yours” form
simulations and “lab” exercises
and perhaps one or two others

If you want to be notified when those articles drop, you can follow me on Twitter, or subscribe to my RSS feed or newsletter.

That is, submissions we’ve got in the past, and already triaged – not brand new ones. This is an important distinction. Asking them to do unpaid work is highly unethical and probably illegal. ↩︎
A very unlikely scenario, but not unheard-of! ↩︎