Jacob Kaplan-Moss

Work Sample Tests:

Wrap Up and Q&A

This is the final post in my series on work sample tests. It’s a wrap-up post: I’ll address a few random points I couldn’t quite fit in elsewhere, and answer some questions from readers. I don’t think this’ll make much sense without having read the rest of the series, so you should probably do that before finishing this post.

How do work sample tests fit into on-site interviews?

Throughout the series, I’ve mostly assumed that these tests are taking place in the context of a distributed/remote interview1. How would they fit into a traditional, on-site interview?

I maintain that the candidate time investment limit needs to stay the same. A day of onsite interviews (which, remember, might include up to a day of travel on both ends) is about the upper limit of time investment I think is fair to ask of a candidate. This means that asking a candidate do some coding at home and then come spend a full day on-site is unfair. Either have the coding homework time replace some of the on-site interview time, or use a work sample test that’s more suited to a synchronous experience. Pair programming and “reverse” code review seem like they’d work very well as part of an onsite loop.

Other than that, I don’t see other big differences between on-site and distributed interviews with regard to work sample tests.

Can work sample tests work for non-engineering roles?

I assume so. I’ve heard various anecdotes from folks using work sample tests for other kinds of roles: an editor told me about giving candidates unedited material to edit and comment upon; I’ve talked to support reps who role-play conversations with customers as part of an interview; I’ve talked to designers who give design challenges. I even ran across official guidance on work sample tests from The US Office of Personnel Management (OPM), the agency responsible for US Federal employment regulation and policy.

So I would tend to assume that the same principles apply to any kind of role. However, I’ve almost exclusively hired for engineering roles, and only really studied hiring practices in that context, so it’s hard for me to speak authoritatively.

If you’ve used work sample tests in non-engineering contexts, and want to write about it – let me know! I’d love to co-author a post with you, or link out to your blog, etc.

What about people you’ve worked with before? Should you let them skip the work sample test?

The point of a work sample test is to verify that someone can deliver on the core job requirements - write code, find security vulnerabilities, etc. If you, or someone on your team, has already worked with a candidate, and seen them do those things in other contexts, do you still need to give them a work sample test? Or is that a waste of their time?

I don’t have a hard and fast answer. My answer depends; sometimes I think it’s better to make everyone do the same test, even if it’s redundant for some candidates; sometimes I think it’s better to be flexible and allow prior colleagues to skip that step. I ask myself a few questions when I’m trying to decide:

  • Is the previous work relevant and recent? If I’m hiring a Python programmer, and a candidate is someone I wrote Python with just six months ago, it seems silly to verify that they can write Python. But if the candidate is someone I worked with five years ago on a Java project, I should probably give them the test. Generally, if it’s been longer than two years, or the technologies are radically different, I’ll give them the test.
  • Would letting this person skip the work sample test mean missing out on other candidates? Often, the former colleague in this situation is one of the first candidates. They know you, saw the job post, and applied immediately. If you let them skip the test, you can be ready to make an offer in like a week or so! This is awesome for filling a role quickly, but often means you’ll miss out on candidates you don’t know. Are you comfortable making a very quick decision here even if it means missing out on some hypothetical great candidate you don’t already know? Generally, I’d like to have at least 3-5 candidates, including the one I know, before I’m comfortable making an offer. But if the person I know is just an amazing match, I might not follow this rule.
  • Is the rest of my team comfortable trusting my experience with this person over seeing their work first-hand? If anyone’s even a bit uncomfortable, better to ask the candidate to take the test.

Depending on the answers to those questions, I might let someone whose work I was familiar with skip a work sample test. I’d still want to have them go through a round of interviews, and there’s a lot to say about how to interview people you’ve worked with before. But this is a series about work sample tests, so that’ll have to wait for some other time.

How secret do work sample tests need to be?

The word “test” usually implies secrecy; something the test-taker shouldn’t know about ahead of time. Should work sample tests be kept secret? Do we need to take steps to ensure that candidates don’t know the details of the test before applying? Do we need to ask them not to post their solution code on Github?

Generally, no. Most work sample tests don’t need to be secret. My favorite single test, Ad Hoc’s SLCSP exercise, has solutions on Github if you look for them. But vanishingly few candidates cheat, and the ones that do are easily caught in the post-exercise interview (rule #5): they can’t explain the code or answer questions about how they wrote it (because they didn’t). Further, if someone did find existing code that solved the problem, read through to understand it carefully, and then told me in an interview, “yeah, I found this on Github, and it solved the problem” – well, that’s pretty close to how any number of real-world problems get solved, right? I wouldn’t consider that cheating. Cheating, in the context of most work sample tests, is dishonesty, not reuse.

There are a few tests that require some degree of secrecy, though. Lab environments sometimes do: exercises where candidates look for bugs or vulnerabilities can be “spoiled” easily if the candidate knows where to look. So that test, and ones like it, benefit from being kept secret. Still, I’d argue this makes them worse simulations: in the real world, we have the entire sum of human knowledge at our fingertips; we want to hire people who know how to effectively use the Internet to help them solve problems.

If your test requires secrecy to be effective, that’s a sign it could be better. It might be acceptable given other tradeoffs, but it’s something to be wary of.

Should we pay candidates for their time on work sample tests?

I established a rule of no more than three hours for a work sample test, out of a maximum of 8 hours for the entire interview loop (rule #2). And I wrote a few times that if companies want to exceed that mark, they should compensate candidates for their time.

Why stop there? Should we compensate all candidates for their time?

Sumana pointed me towards Software Freedom Conservancy, who did exactly that; every candidate who made it through the initial screens was paid for their time:

Because we are a small organization, adding another employee is a big deal. We knew that to do this job right we were going to need to take some time talking to them to figure out if they were the right fit for the role. We also know that not everybody does their best when put on the spot in an interview, and wanted to make sure that we allowed people the chance to know what we’d be asking and to prepare if they wanted to. We didn’t want to take our applicants’ time for granted, even though we are a small publicly supported organization.

Because of this, we decided to pay each [of] our five finalists $500 to proceed with the rest of the interview. While $500 is not a huge amount, we thought it was a nice amount for a charitable organization to give to an applicant who would dedicate some time and thought to our hiring process, which would cover strategic thinking about our organization’s mission and operations in our communications and other related areas.

I love this and am going to try for something similar in the future. I think it might be a hard sell to some organizations because it’s outside of the norm. But paying candidates for their time as a mark of respect is excellent.

How do you measure the effectiveness of a work sample test?

Throughout the post I’ve mentioned “effective” and “successful” work sample tests (and also occasional “ineffective” and “unsuccessful” ones). How do I know? What am I measuring to know if a test is working?

Measuring the effectiveness of hiring techniques – interviews, screens, tests – is a much larger topic and one I hope to write about in more detail in the future. For now, here’s a quick sketch:

  1. For a given role, establish competencies – the core behaviors that define success in the job. For example, for a penetration testing role, “finds vulnerabilities” might be a competency; “written communication” (i.e. vulnerability write-ups) might be another.
  2. Use those competencies as part of your performance review process; record everyone’s scores.
  3. Design an interview question or work sample test that measures that competency. Test it out internally to verify that performance on the question correlates with your staff member’s recent performance reviews.
  4. Hire some people using the question/test.
  5. Monitor their performance over time. If the question is effective, their performance on the question will correlate with their performance reviews for that competency.

Once again, this is a very quick sketch; I’ve left out a ton of nuance and detail. If you’d like to see me write about this in detail sooner rather than later, drop me a line.

How do you grade work sample tests?

In this series, and my previous series on interview questions, I glossed over how exactly I recommend measuring candidates’ responses. Should you have grading rubrics, and score candidates’ responses to each question? What about overall: should candidates get a numeric score for the interview?

This is another topic I hope to write about in detail at some point, but for now, another quick sketch:

It’s not critical: fundamentally, hiring is a binary decision; you either make someone an offer or you don’t. So if the only “score” you record out of an interview is a “hire”/“no-hire” recommendation, that’s sufficient.

But if you can establish grading rubrics, I think it’s a good practice. Certainly, it gives you a way to measure effectiveness, see above. I use a grading rubric based on the Apache Voting System +1/0/-1 scores, so something like this:

  • +1: the candidate’s response to this question was incredible and makes a strong case to hire them
  • +0: a typical good answer, roughly what we were expecting
  • -0: not a strong answer, but not terrible enough to be a red flag
  • -1: red flag: a bad enough answer that I might not want to hire them based on this question alone

I like this system because it’s not linear: +0/-0 capture answers that are strong or weak but in a typical way, while +1/-1 capture exceptionally good or bad responses. I also like that it’s fairly easy to teach and understand. A more typical 1-5 scale often causes a fair bit of confusion over the difference between a “3” and a “4” or whatever. But I don’t think the specifics matter very much: as long as you’re consistent, any rubric will be useful.

Could a short work sample test replace a traditional resume screen?

This question was asked by Alex Yang, founder of Mighty Acorn - a startup that runs work sample tests. They’re thinking about work sample tests in a way that matches much of what I’ve written here, so he reached out and we spoke about work sample tests for a while.

Resume screens are ridiculous. We look at this piece of paper and try to make a snap judgment about whether we should invest some time in an interview, or reject them out of hand. We’re not measuring someone’s suitability for the role; we’re measuring how well they write resumes. If we don’t take steps to anonymize the resume, we can easily end up making decisions shaped by unconscious bias.

Unfortunately, most job listings get a lot of completely irrelevant applicants. It seems like some candidates believe the best approach is to blast their resume at every job opening, regardless of any potential fit. I’ve seen applications for quite senior engineering roles where the resume doesn’t show a single month of software development experience anywhere. Interviewing obviously unqualified applicants is a waste of everyone’s time, hence the resume screen.

So maybe we could solve two problems at once, and give a work sample test instead of a resume screen?

Perhaps, but I’m unconvinced. Work sample tests are usually time-consuming, and I don’t like the idea of asking applicants to spend that time until we’re both at least a little bit sure there’s a match (see rule #8). Possibly if the work sample was truly very short, like well under an hour.

One alternative that come up in talking with Alex and his team was offering a work sample test as an alternative, should someone not pass the resume screen. Instead of rejecting them outright, give them a chance to get into the running by completing a short test. This would help reveal candidates who do have some of the skills you’re looking for, but have done a poor job showing it on the resume. This is a very interesting idea, and one I’m hoping to try out sometime soon.

Fin

That’s all folks. Thanks for reading. I hope you found it useful. If this series helped you improve your work sample tests, I’d like to hear about it. Get in touch!


  1. That’s the style of interviewing I’m most familiar with – I’ve worked on distributed teams for nearly 15 years – and COVID has made this practice much more common in the last few years. ↩︎