Jacob Kaplan-Moss

Tracking Engineering Time

Here’s another answer to common question that I got during a recent “Ask Me Anything” session at TinySeed (here’s the previous one).

Managers sometimes can find it difficult to understand what engineers are doing with their time — and whether that time is being allocated correctly. This is particularly true for two classes of managers:

  1. Managers without engineering backgrounds themselves – e.g. a non-engineering founder of a very small startup with a couple-three engineers working for them. (This is the case for a lot of the TinySeed founders, including the one who asked the question sparking this post.)
  2. Managers leading engineering from a distance – e.g. a engineering VP or Director with too many reports to really follow any one person’s individual work closely.

If you’re in a situation like these, you can usually pretty clearly see the outputs: how long does it take to ship new features? How often are there bugs or downtime in production? How many security issues get discovered from a pentest or bug bounty? And so forth. But if those outputs aren’t what you want – it’s taking too long to ship new features, or there are too many serious security issues for comfort, etc. – it can be super-hard to diagnose why you aren’t getting the results you want. Is the team shipping features too slowly because of too much technical debt? Or are engineers spending an appropriate amount of time on new features, but are wasting time working on the wrong features? Are the security issues a sign of an untrained staff? Too little security review? An underlying systemic weakness that needs time allocated to address?

Leaders in situations like this often have the distinct impression that the team is “wasting time”, or working on the “wrong” things. If they’re feeling uncharitable, they wonder if their staff is incompetent, lazy or sloppy.

How should a manager in this situation discover what their team is working on and figure out if their time is being allocated correctly?

This is my playbook:

  1. Measure the time engineers are spending. This doesn’t need to be super-precise, nor does it necessarily have to be wall-clock time (you can use story points, or ticket counts, or probably a dozen other things).

  2. Split that time into “buckets” that map to the kinds of activities that influence output. I start with three buckets: features (time spent developing new things), debt (time spent fixing old problems), and toil (time spent on routine tasks).

  3. Agree on the appropriate ratios for each bucket, and then adjust over time to influence the outputs you care about.

If this sounds suspiciously familiar to my playbook for addressing technical debt: sure is! The basic move here – measure, allocate, track, adjust – is one of the building blocks of good leadership and can applied in many circumstances.

For details on each step, read on:

Measure time

We can’t improve what we don’t measure, so any technique here needs to start with some form of tracking. I’m generally a proponent of tracking real-world wall-clock time. Time is what we care about, so I don’t see the point in tracking a proxy instead of tracking time directly. (More about why I prefer tracking time to other things in this article on estimation.)

But, that said, two things:

  1. It’s not important to be super-accurate. Unless you’re doing something like billing clients by the quarter-hour, it’s usually a waste of time to insist that engineers track their time really closely. It’s usually sufficient to ask staff to estimate the time they spent on various tasks at the end of the week – or even simply fill out a form weekly putting their hours into the buckets you care about and not tracking any individual projects or tasks.

  2. Proxies (story points, ticket counts, etc.) can work, and if you’re already measuring one of these proxies it’s probably not worth introducing more tracking overhead. E.g. if you’ve already got a workflow that creates stories with story points, and engineers are already disciplined about tracking which stories they work on and close, I’d recommend piggy-backing off that data rather than adding new overhead. I really do believe that measuring time directly is better – but not so much better as to be worth additional toil if you can avoid it.

However you measure, you have to be consistent. Pick a technique, commit to it, and follow that same technique over time. This playbook relies on a stable measure time investment that you continue to use over time. If you change up how you measure, this playbook won’t work.

Attribute that time to categories that influence output

As your staff tracks time, they should attribute it to a few major categories. I recommend starting with at least these three:

  • Features — time spent working on new product features. This is all time spent in service of getting new features out the door, from ideation and planning to implementation, testing, and rollout. (Though, depending on your release process, some of that may be “toil”, see below, and you might also want to track testing time separately in some cases, also see below.)
  • Bugs/Debt — time spent fixing bugs, refactoring, or otherwise paying down tech debt. This is any time you’re spending not working on new things, but instead cleaning up or fixing old things. So fixing bugs, solving performance issues, remediating security vulnerabilities, etc.
  • Toil - time spent on routine repetitive tasks. For example, if pushing a deployment to product takes an hour of hands-on babysitting, that’s an hour of toil: it’s not working on the feature, it’s just sitting around making sure it gets to prod safely. Toil is any process that is exactly the same every time and doesn’t deliver any actual value.

You can add other buckets if they’re important to reveal information you care about. A few optional buckets that I sometimes use are:

  • Testing - testing new features, writing regression tests for newly discovered bugs, etc. The ratio of Testing time to remaining engineering time can be interesting, sometimes revealing organizations that are skipping steps and shipping not-sufficiently-hardened features or fixes. However, most teams I’ve worked on practice some form of Test-Driven Development, where testing and developed are inextricably linked, and thus it can be impossible to separate out testing time from other coding activities. So, I rarely use this category unless there really is separate and distinct testing time to be measured.
  • Meetings - usually, I just ignore meeting time in time measurement (e.g. meetings don’t count as Features, Debt, or Toil - they’re just gaps in he record). And it’s not usually clear what to do with a meeting measurement: if I tell you I had 8 hours of meetings this week, is that “too much”? How would you know what “the right amount” of meetings would be? Still, there are some circumstances where knowing how much time you’re spending coordinating work is useful. Meeting time can also be worth tracking in consulting contexts, where you might want to know how much meeting time you’re spending with clients. There are other places where tracking meeting time can be useful, I’m sure, just not a ton of them.
  • Incidents — usually I track time spent running an incident (be it an outage or a security incident) as Toil, and any long-term post-incident cleanup/remediation as Debt. But sometimes, particularly on teams with explicit incident response duties (e.g. ops teams, security response teams, etc.), it can be worth knowing how much time you’re spending firefighting vs preparing for future fires.

I’d caution you against having too many buckets. 3-5 seems fine, but there’s diminishing returns of adding more granular buckets. And, the more categories you add, the more time people will need to spend filling out timesheets, and thus compliance and accuracy will go down. Make this as easy as possible by having just a few buckets, and not insisting on super-detailed granularity.

Monitor and make adjustments

With these buckets, you can begin to understand and quantify where time is being spent. For example:

  • The Feature/Debt ratio tells you how much time is being spent on net new work vs fixing old problems. E.g., if you feel like you’re not deliver new features fast enough, and you have a high ratio of Debt to Features, you can hypothesize that too much tech debt is holding you back. Or, if bugs are cropping up in production frequently but you have a very high Feature/Debt ratio, you can hypothesize that you’re prioritizing new features over quality.
  • Toil shows how much time you’re “wasting” on repetitive work, and can help reveal places where automation might save time. There’s always going to be some percentage of toil - it’s not worth the time to automate every process - but if some team is spending 30% of its time on toil, there’s a very good chance you can get better performance through automation.
  • Different Feature/Debt ratios on different teams can reveal which teams are struggling with some form of technical debt and which teams have solid foundations for new feature development.

And so on: there are many insights you can glean from this data.

I suggest simply gathering data for a while, maybe a month, before starting to make major adjustments. This data tends to be noisy, so giving it some time to shake out. Then, you can start to nudge teams towards the outcomes you want to see. The specifics here are going to be highly situation-dependent, but I’ve seen a few successful patterns emerge:

  • Toil is an important metric to track. As I wrote above, some toil is normal and okay, but too much is like sand in the gears of your engineering process: it grinds everyone down. Successful teams often set an upper limit on the amount of toil they’ll tolerate – 10-20% is a good starting point. Any time the number rises above that for a sustained time, that triggers an investment in automation.
  • The Features/Debt ratio serves as a discussion point between Product and Engineering. Time spent on features vs debt is always a potential disagreement point between those teams, and data that both teams trust is hugely valuable. It’s super common for a Product leader to wonder why the team isn’t shipping faster, and there’s a huge difference between “we’re 90% of our time on new features and still not shipping as fast as you’d like” and “we’re spending 90% of our time cleaning up technical debt”. The perception can be true in both cases but the fixes are dramatically different.
  • These numbers can help prioritize major projects. If engineers have been asking to put a major refactoring on the roadmap, knowing how much debt-time that deferred maintenance is causing is critical in deciding when (or even if) it’s appropriate to schedule that project.

And so on. It’s a constant, ongoing conversation – but having the data means you can make decisions based on information and strong hypotheses, not “vibes”.

If you try this – or something like it – I’d love to know how it goes! Send me an email: jacob at this domain.