Probably Are Gonna Need It: Application Security Edition

About a week ago, Luke Plant and Simon Willison wrote about their lists of exceptions to the YAGNI principle. As Simon writes:

YAGNI—You Ain’t Gonna Need It—is a rule that says you shouldn’t add a feature just because it might be useful in the future—only write code when it solves a direct problem.
When should you over-ride YAGNI? When the cost of adding something later is so dramatically expensive compared with the cost of adding it early on that it’s worth taking the risk. On when you know from experience that an initial investment will pay off many times over.
[…]
Because I like attempting to coin phrases, I propose we call these PAGNIs — short for Probably Are Gonna Need Its.

I love this concept! It applies really well to security engineering: many risk mitigations are difficult to implement and address unlikely threats. You don’t want to over-invest in security engineering versus feature work early on: if you fail to get any customers it doesn’t matter how secure your app is. However, there is also some security engineering that is worth doing up-front: basic security mitigations that are easy to do at the beginning, but get progressively harder the longer you put them off.

Here are some of my Security PAGNIs, including some suggestions by my Twitter followers. To keep this article from getting out of control, I’m scoping this to application security only. If you’re interested in future articles about other areas of infosec, let me know!

Table stakes: use a library/framework that mitigates common flaws

Many attacks in the real world are essentially crimes of opportunity: attackers run broad scans looking for easy vulnerabilities, then exploit the ones they find. If you don’t address these basic flaws, you’ll be compromised before you know it.

Luckily, we have tools that can help. Modern web development libraries/frameworks have mitigations against many common flaws built-in. Django, for example, has built-in mitigations against XSS, CSRF, SQL injection, and a number of other common flaws. Using Django protects you (at least partially) against six out of the top 10 most common web vulnerabilities.

This isn’t an argument that you should be using Django – there are other great tools out there. But it makes a good benchmark: by my estimation, Django is a bit above average in the “secure by default” department. You should choose something at least as good as Django. If your library/framework of choice isn’t as good (or better) than Django, choose something else.

Use cryptography libraries, too

Along similar lines: use good tools to help you with encryption. It’s famously hard to write encryption code correctly. One recent example: the flaws in Kaspersky Password Manager. If you need to encrypt or sign things, writing custom code should be your last resort. Instead, use libsodium, or, even better, your cloud platform’s secret management features (e.g. AWS Secrets Manager, GCP Secret Manager, etc.).

Have a vulnerability disclosure policy and a `security@` email

More table stakes: make it easy and legal for people to tell you about security issues. Many (most?) security vulnerabilities aren’t found by attackers: they’re found by normal users of your site who stumble upon them. So, publish a vulnerability disclosure policy – here’s a good template – somewhere on your site (I suggest ‘/security/’), and set up a security@ email address.

Thanks to dkp for the suggestion.

Consider the “abusive ex” persona

some way to block unwanted communication from other users, built in to the platform at a fundamental level. (Because if you don't PAGNI it, it'll be too hard to add later so you'll never do it and will have to apply ineffective workarounds that don't stop harassment instead.)

— @sil

If you’re building any sort of social feature, you must consider abusive behavior from the beginning. It’s nearly impossible to retrofit mitigations for spam, harassment, or abuse, and it’s highly unethical to wait until an attack happens to do something about it.

I recommend you consider what I’ve heard called the “Abusive Ex” persona along with all of your other user archetypes. Think about what this kind of person could do with your platform, and build in mitigations (e.g. tools to block, mute, and report) from the beginning.

Audit trails

Audit logs are impossible to add after the fact: who did what when? “I can’t see X anymore but I could yesterday!” “Who edited Z?” “What actions did suspicious/compromised user Y take?” Can make them customer-visible later, but for internal/support access they’re valuable.

— @amatix

Setting up proper logging can be mildly annoying. I have to re-read Django’s logging documentation every dang time I set it up; I just can’t seem to be able to memorize how it all works. I’m often tempted to skip it. However, as Robert points out, that’s a bad idea: if something goes wrong, figuring out what happens after the fact can be impossible without a good audit trail.

So, take the time to get logging set up, and make sure your app is emitting messages any time anything interesting happens. Ideally, those log messages should be streamed to some sort of external log service, such an attacker couldn’t delete logs even if they compromised your application servers.

Build safe admin interfaces

Many apps have some sort of administrative/staff interface. You might use admin interfaces for writers/editors to update content; for support to help customers; for back office workflow; for developers to tweak settings or feature flags; or more. Django’s built-in admin interface is one of its marquee features, so most apps I work on have one of these at first.

However, later on, these admin interfaces can become truly terrifying. They tend to grow features over time as your company grows and different departments need more things. Left unchecked, you can end up with a situation where hundreds or thousands of “admins” have access to all sorts of terrifying functions. A single security hole, or a single stolen credential, could allow an attacker to track users, download sensitive data, transfer money, and more.

Don’t let this happen. Early in your app’s lifecycle, take some steps to limit the “blast radius” of admin features:

Separate your admin from your production app. The Django default of hosting the admin panel at /admin/ isn’t a good one, security-wise. At a minimum, the admin app should be a different top-level domain (to prevent cookie/session attacks). Ideally, you’d use different authentication for staff and for end-users, and put the admin panel behind another layer of security like an app-aware proxy or VPN. And you should absolutely require multi-factor authentication for your admin app.
Have multiple admin apps: one app for the support team, another for your billing department, a third for developers to tweak feature flags, etc. That can help keep unrelated functionality out of the hands of users who don’t need it, and can prevent admin feature-creep.
Avoid singular “is-admin” flags; use role-based access control (RBAC) instead. That is, your admin app(s) shouldn’t assume that just because someone is logged in, they have access to everything. Each function should be gated behind a permissions check, and you should use roles/groups to assign the right functions to the right users.
Make sure that all admin actions, including log-in events, are audited.
Be especially careful if you have any sort of assume-user functionality (i.e. a way for someone to assume the identity of another user and log in as them; this is often a feature requested by support teams). Often, this is implemented by creating a login token or session cookie that’s indistinguishable from the real user. This is a problem for audit trails: it makes it impossible to tell whether a user took the action, or whether an admin assumed their account and did it. Make sure that any assumed-role actions are properly attributed to the actual underlying staff account!

Yes, this is a fair bit of work, but it’s the epitome of a PAGNI. If you don’t do this up front, it’ll be incredibly hard to retrofit later, and you’ll have some pretty big risks until you do. If this all seems like too much… maybe just don’t have an admin panel!?

Build safe ways to move redacted data out of production

Having a mechanism to restore a redacted version of your production database to staging such that it's safer to develop against without risk of leaking sensitive stuff (like PII) if there's a bug in staging

— @simonw

As much as possible, we’d like to keep production data in production. But this is rarely possible in any sort of absolute way. We’d like to have staging sites that mirror production data as closely as possible. We’d like to be able to load portions of production data into development to diagnose data-dependent bugs. Business analysts and data science teams need access to account data to build models. And so forth.

All too often, this is done by taking an indiscriminate copy of the production data and loading it somewhere else. Thus, we end up transferring sensitive data (PII, billing information, health data, etc) out of production. This is always dangerous – development and staging servers are rarely as well-protected as production – and sometimes illegal.

So, early on – as soon as you start storing sensitive data – you’ll want to build some tooling to perform data transfers out of production. Make sure any sensitive data is removed or replaced with dummy data. The safest way to do this is with an allow-list (rather than a block-list). Make it so that you have to explicitly list the tables and columns that are safe to transfer, and assume all others are unsafe. That way, you can’t accidentally forget to block new sensitive columns as you add them.

Session or password invalidation

Eventually, every app needs a way to invalidate users’ sessions or passwords. Often this in response to some sort of support issue. Sometimes this in response to an account takeover (one of your users has an account stolen, likely through a shared credential from a prior breach). Sometimes it’s a rare session bug that requires you to invalidate existing sessions.

Either way, there are a few features that are fairly easy to implement but will save your bacon in the future:

a way to trigger a password reset for a user, or some group of users, or all users
a way to require a user (or some group, etc.) to change their password after their next login
a way to invalidate session tokens (for a user, for some users, etc), thus forcing re-authentication next time that person comes to the site
if you allow login via external identity providers (Twitter, Github, SAML, etc), a way to force re-authentication against that external provide the next time the user hits your app

Build these before you need them, and you won’t be scrambling to figure out how when you need them.

What’d I miss?

I’m sure there are others! Let me know your suggestions. I may update this list over time as I get suggestions, or think of new ones.