Jacob Kaplan-Moss

Work Sample Tests:

‘Reverse’ Code Review

Welcome back to my series on work sample tests. This post describes one of the kinds of work sample test I use (and suggest others use as well). For some theory and background, see the first few entries in the series: introduction, the tradeoff between inclusivity and predictive value, and my “rules of the road” for good work sample tests.


For most software engineering roles, the best work sample test will be some combination of coding homework, pair programming, and/or reviewing previously-written code (preferably all three). But not every role; there are some circumstances where other types of tests fit better or are better at revealing some critical piece of information relevant to hiring.

So this post, and the next in the series, will cover a couple of those less common but sometimes useful types of tests. These will be a bit less well-defined than the previous ones; you’ll need to do more work to flesh these out into something you can use.

First up: a “reverse” code review: instead of you reviewing the candidate’s code, you have them review yours.

Who this is for: strangely, I find this exercise most useful for either very junior roles (interns, apprentices, entry-level programmers), or very senior ones (staff engineer or higher), but not in between.

For juniors, my “default” exercises don’t work as well: they don’t as often have previous code to share, pair programming can feel overly intimidating, and it’s harder to be sure that they’ll be able to accomplish the coding homework within the three-hour limit (rule #2). For the most junior roles, this shouldn’t be a disqualifier: for these roles, you’re hiring as much for potential as for anything else. A reverse code review “scales down” really well: it meets candidates where they are and can be more revealing of the potential you’re looking for.

For the most senior roles, this exercise fills a different purpose. The previously-covered exercises are fine; candidates at this level should find any of them pretty easy. However, most candidates for a Staff-or-higher role will do very well on those work sample tests, so they end up not being very useful when you need to rank candidates. Ultimately you can (usually) only make one offer, so having six candidates who all were “great” can be a problem. A good problem, to be sure, but wouldn’t a test that could help you more cleanly find the best candidates be better?

Senior-and-above roles typically include mentorship and technical leadership in among their responsibilities, and a reverse code review exercise can reveal how candidates might perform in that area. I find a much wider skill gap here. It’s rare to find experienced software developers who can’t complete a “parse this file” assignment, but it’s unfortunately quite common to find senior engineers who still haven’t learned not to be a jerk.

What it measures: primarily code comprehension and communication skills. You’re talking through and about code, but not having the candidate write it, so it measures the ability to think and talk about code more than the ability to write it. Be wary of assuming that good performance on this exercise indicates strong coding skills. It usually does, but not always: some people are substantially better at reading than writing code. Be sure that’s an acceptable tradeoff before using this exercise instead of one of the previous ones.

The Exercise

There are two variants of this exercise I’ve seen work well, depending on what sorts of behaviors you want to measure most directly:

  1. Pull request review: you send the candidate a pull request (or diff, set of patches, etc.), and have them review that pull request as if they were reviewing a colleague’s pull request. This works best when you want to measure asynchronous, written communication skills.

  2. Live code review: you send the candidate a code sample ahead of time, and then meet with them and have them talk you through what the code does, how it works, any problems they see in it, and so forth. This works best when you want to measure synchronous, spoken communication skills.

You’ll want to make sure that the code or pull request you send has some problems in it for the candidate to find. There should be a mix of small issues (things like typos, minor bugs, formatting problems, etc.) and larger ones (structural/factoring issues, poor test coverage, inaccurate comments, etc.) so that the candidate has plenty of potential problems to find.

In either case, remember to follow the rules of the road for good work sample tests. In particular:

  • Make sure the scope is small enough to comprehend and review in less than three hours (rule #2). This can be particularly tricky to get right for the pull request review since the candidate will need to comprehend both the change and the codebase that the pull request is made against. This almost certainly means you can’t use a real pull request from your company’s codebase; there’ll just be too much to comprehend.
  • You’ll want to do quite a bit of internal testing (rule #7) for this one; it’s can be difficult to come up with a good code sample here. You should test until you get consistent results that are under that time limit. This could take three or four iterations, or more; this isn’t an exercise to try to develop in a hurry.
  • For variant #2, remember to send the code sample well ahead of time (a week is ideal) so the candidate has time to prepare. Surprise code reviews are a terrible practice (rule #6).

The interview

Even though variant #1 is mostly asynchronous, still schedule some time to talk about the results (rule #5). You’ll want to follow up on why they made the recommendations they did; what other things they considered; etc. You likely don’t need more than 15-20 minutes to debrief about the pull request review, so you can roll this time into another interview slot.

For variant #2, the interview time should accomplish two things:

  1. Have the candidate walk you through the code, and explain how it works (to demonstrate their comprehension of the code).
  2. Have the candidate explain the problems/issues/concerns they see in the code and propose solutions.

You may want to also do a short debrief after the exercise. See the discussion about debriefs in the pair programming exercise for more.

Behaviors to look for

  • Do they understand what code sample/PR does, more or less?
  • How’s their communication? Is it clear and focused, or hard to follow, rambling, or difficult to understand?
  • What’s the tone of their feedback, particularly anywhere they’re critical or pointing out problems in the code/PR?

Positive Signs

  • 👍 Kind, collaborative communication, focused on the code rather than on the person.
  • 👍 Asks questions about parts they don’t understand.
  • 👍 Clearly identifies problems in the code.
  • 👍 Communicates the relative level of importance of the various problems they find (e.g. separates out stylistic concerns from bugs or serious architectural problems).
  • 👍 Adapts their communication to your level of experience (e.g. asks questions like “does that term I used make sense or should I explain further?").

Red Flags

  • 🚩 Is mean or unkind, particularly if when directed towards individuals (e.g. “whoever wrote this code is an idiot”).
  • 🚩 Excessive use of jargon. Some jargon is normal, but they should mostly be terms that are assumed knowledge for this role, or if not they should be checking in with you that you’re following.
  • 🚩 Incorrect about what the code does or how it works.
  • 🚩 Doesn’t find problems with the code, or only finds minor issues (typos) not any of the major ones.


I don’t use this exercise very often, but it’s a useful option to have available. I particularly like how well it focuses on communication and collaboration skills; it makes this work sample test very nice for roles where those are critical skills.

Be careful to watch out for unconscious bias here when evaluating answers to this question. Part of the criteria here is tone, which is very difficult to measure without getting caught up in ideas that can be linked to race or gender. It’s quite important to question your evaluations of tone and how they might be shaped by biases. But it’s still important to pay attention to tone since someone who can’t give feedback compassionately is not a good colleague.

As with a lot of interviewing practice, you also need to pay close attention to the power imbalance when scoring answers. Interviews are weird and stilted situations, where the interviewer is holding something the interviewer wants (a job), and therefore this isn’t really a conversation between true peers. Part of this test asks the candidate to give you critical feedback. Candidates are walking a delicate line: you’ve asked them for feedback, but they don’t want to appear overly critical or like a jerk. Don’t be surprised if they’re a bit timid or overly deferential.

On the other hand, if you do see antisocial behaviors – if the candidate does act like a jerk – run. If a candidate is a jerk in a situation where they have every incentive to be on their best behavior, imagine how they’ll behave when they have more structural power.


If you have questions about this example or anything in the series, send me an email or tweet at me. If I get some good questions, I’ll answer them in a series wrap-up.


A fun thing about this kind of test is that sometimes it accidentally goes better than you might expect. Sumana has a funny story about the time a candidate scored a 13/12 on a code review work sample test.