RFD 5

Working with Sensitive Data

This document describes how to handle sensitive data during our work. It does not include novel ideas, but summarizes relevant rules of thumb, tools and examples that should be kept in mind.

Sensitive data involves anything that could expose personal data or company secrets. This includes:

  • Passwords and (personal) access tokens

  • (Client) company data, including names, addresses, salaries, invoices and offers

  • API keys and authentication secrets

Handling sensitive data responsibly and securely is a continuous exercise. Its success depends on the systems and culture that are set up and acted on.

Share and Store, Use the Vault

We use Proton Pass, to manage all secrets for ourselves and our clients that are not in-app secrets. This is a single source of truth for all sensitive data we get access to.

Sometimes, secrets need to be shared with colleagues, for example to set up CI/CD. These individual secrets must be shared through Proton - and Proton only - on a need to know basis. This means that you should not share passwords because it "might be useful in the future", but because someone needs them for their work today.

Do not use any other means (whatsapp, email, slack, signal) to share secrets. By using the Proton vault, we maintain a clear audit log of accesses and changes.

Think about In-app Secrets

To access secrets in application code, the world seems to have decided that environment variables and .env files are a good solution. For us, they are just one of the least bad solutions out there.

In general, working with .env files is risky because:

  • They might accidentally get pushed to git

  • They rely on conventions and documentation, and cannot be reasoned about statically

Instead of storing secrets in .env files, we use Phase to manage in-app secrets, with different environments for development, staging and production. Using the Phase CLI we can spawn shells with the required secrets injected, without ever seeing them. We also have kubernetes adapters and pulumi workflows available to use Phase in production.

Always try to use Phase first. When that is not an option, we must resort to .env files. In this case, they should be preserved locally and never be pushed to version control. On the other hand, there are few things more annoying than running a program and discovering that you forgot to set up an environment variable (except for maybe, rerunning that program and finding out you forgot about yet another environment variable).

Hence, environment variables should be described once in .env.example files that are pushed to version control. These example files can be easily derived from .env files and preserve documented constraints, but they omit sensitive data.

From Env to Env Example

An example .env file that is stored on your computer might be:

# This token is retrieved from github.com/gitleaks
# it must be set if you want to use active leak scanning
GITLEAKS_KEY=QTC3yvf6btr!jxu7nwv

# Pick a username of at least 10 characters
# changing the username renders old database entries invalid
APP_USERNAME=postgresrootuser

Clearly, these secrets should not be pushed to version control. Yet, from the .env file one can, and should, automatically derive an .env.example file like this:

# This token is retrieved from github.com/gitleaks
# it must be set if you want to use active leak scanning
GITLEAKS_KEY=

# Pick a username of at least 10 characters
# changing the username renders old database entries invalid
APP_USERNAME=

Gitignore

To avoid .env files being pushed to version control, but to encourage example files, the suggested .gitignore lines are:

*.env*
!.env.example

Do not Expose Data in Logs

Be aware that user data that is collected, synchronized or transferred should never be exposed in logs. In case you want to understand the structure of the data you are working with, mask any exposure behind debug functionalities. These should never be enabled in production.

Be Critical about Snippets You Share

Sometimes, you will want to use AI tools on sensitive data. For example, when asking about Docker compose setups of to generate code that can parse JSON structures.

In these cases, make sure you remove or mask any sensitive data before enters the chat interface. Data might already be sent off to the server when it enters the interface.