The Sociology of Input Validation

Maybe writing programs that accept whatever data comes in, and dealing with the consequences later, isn’t as sloppy as it sounds.

Who cares if your data is bad?

You should reject bad input immediately when problems become harder once the data gets stored in your database. If a user supplies an invalid email address through a web form, ask them to correct it immediately. How are you going to repair the email address if you later can’t contact the user because the email address is wrong in the first place?

But in other cases, deferring conflict resolution might be best. Suppose a business partner sends invalid data from a remote system which you can’t reply to. If you reject the data, the source won’t ever know the message hasn’t arrived and the recipients won’t be able to act either, since the data just won’t have entered their system. Rejecting bad input would cause invisible data loss.

Build better software with Conway’s Law

Conway’s Law says organizations produce software that mirrors the structure of the organization, so learn how people who work with the data communicate.

If later on humans process the data through a UI, they can maybe phone someone from the company that sent the broken records and solve the situation by hand. When we detect bad input, we could flag the incorrect data in the UI to attract an operator’s attention. It would make no sense to skip the bad data.

Sometimes you need to reject

Detecting a broken input should not imply rejecting it. But never accept SQL injection attempts! Tradeoffs between early and late validation apply to business data. When validating program code or configuration, fail early instead of behaving incorrectly in the middle of a run.

Sharpen your intuition

To handle bad input, consider whether it’s likely that the error occurs as part of normal operations, who can fix it, and whether it’s best to solve it outside of your software.

Sometimes it’s best to alert people about suspicious data and collect as much information as possible, instead of just preventing the data from entering the system.