How to use serverless as cronjobs to keep your Personal Access Tokens secure

Published on April 12, 2018

Keep your tokens secure

Subscribe for updates

Build better digital experiences with Contentful updates direct to your inbox.

Last year we were really excited about the release of the Personal Access Tokens feature, or PAT. In a nutshell, they are tokens bound to your user in Contentful which you can use to perform actions like using the Content Management API with all your roles and permissions applied (if you want to know more about PATs take a look to our knowledge base page).

Creating them is easy - a couple of clicks in Contentful's webapp or an API call, and voilà, you get yourself a fresh token to use right away. But because they're so easy to generate, it might be that we sometimes forget about them and leave them behind in our source code. And then we check that source code into source control which is already not a good security practice. But this becomes a bigger problem when tools like Github make it really easy to search across thousands of public code repositories for them.

Leaking your Contentful PATs is dangerous and it should be avoided at all costs. Remember that they have the same permissions as your user in Contentful has. This means, that if you're a space admin and your token is leaked, anyone could use it, for example, to delete all your content. And things would only get worse if you were an organization admin.

It's important to think about these tokens as your passwords. You would never write your password in any source code file so why not treat your PATs the same? As a rule of thumb, any time you are dealing with credentials it's better practice to access them using environment variables. This reduces the chances of accidentally leaking them.

But we can go one step further and build some tooling to help us quickly identify those PATs leaked in our organization. And that's what we did on our last hackathon! The idea was to have a cronjob that would run every day and find all the PATs leaked in Github repos belonging to our organization and its users. With the data gathered by the tool, we could then go in and at least revoke those tokens even if we don't fix the code. Following is a brief description on how we implemented it. You can also find all the code on its Github repo: https://github.com/madtrick/cfpat-audit

First of all, we need to write the script that will query Github for files with leaked tokens. Once we have all the offending files we have to check if they belong to users in our Github organization. You can use Github's organization members and code search APIs to do this. Included in the repo is an executable that you can run locally to find leaked tokens in your org:

Ok, so we have a script that we can use to get the list of files that are leaking PATs. And we want to run it regularly so we can react quickly to any incident and revoke the leaked tokens. But if we want to run this as a cronjob, that means at least setting up a machine and deploying the code there, and then of course making sure that this machine is up and running 24/7.

That can seem like a lot of work for such a small script. So we decided to be like the cool kids and use serverless computing: run a small script on a regular basis without having to worry about all the infrastructure requirements. Since we use AWS at Contentful, the choice was clear – we were going to use lambda functions. Think of lambda functions as event handlers that react to different triggers: API calls, CRUD operations on S3, ..., or scheduled events. Our lambda function is simple and small:

It finds the leaked tokens for the org and then saves them in a file in S3. Additionally, not described on this post, we set up an alert so we get notified each time a file was created in the bucket.

Getting your code up and running on AWS lambda requires some initial effort. Things like uploading the code, setting up the right roles, configuring logging. This sounds like quite some work, which we were thinking of getting rid of by using lambda functions. Thankfully there are frameworks like Serverless which abstract all of these and help you a lot along the process. So, unsurprisingly that's what we did.

This is the serverles.yml file which we used to deploy and setup the function in AWS.

So the only thing left is to deploy it and wait for those tokens come your way.

Writing this small script was fun and interesting. Lambda functions are great for dealing with event based workflows and paired with frameworks like Serveless makes it a breeze to use.

Subscribe for updates

Build better digital experiences with Contentful updates direct to your inbox.

Related articles

If you're feeling the limitations of traditional CMSes like WordPress and Drupal, it's time to look at the headless CMS and the composable content platform.
Guides

What are the benefits of a headless CMS versus a traditional CMS?

June 22, 2023

GraphQL variables let you create reusable queries and mutations that are type-safe. This article explains how to use them, with examples.
Guides

How to use GraphQL variables to give queries type safety

October 22, 2021

A tutorial on GraphQL pagination, including cursor and offset-based methods. Examples integrate real-world GraphQL APIs into a React application.
Guides

GraphQL pagination: Cursor and offset tutorials

October 28, 2024

Contentful Logo 2.5 Dark

Ready to start building?

Put everything you learned into action. Create and publish your content with Contentful — no credit card required.

Get started