How to use serverless as cronjobs to keep your Personal Access Tokens secure

Published on April 12, 2018

Keep your tokens secure

Subscribe for updates

Build better digital experiences with Contentful updates direct to your inbox.

Last year we were really excited about the release of the Personal Access Tokens feature, or PAT. In a nutshell, they are tokens bound to your user in Contentful which you can use to perform actions like using the Content Management API with all your roles and permissions applied (if you want to know more about PATs take a look to our knowledge base page).

Creating them is easy - a couple of clicks in Contentful's webapp or an API call, and voilà, you get yourself a fresh token to use right away. But because they're so easy to generate, it might be that we sometimes forget about them and leave them behind in our source code. And then we check that source code into source control which is already not a good security practice. But this becomes a bigger problem when tools like Github make it really easy to search across thousands of public code repositories for them.

Leaking your Contentful PATs is dangerous and it should be avoided at all costs. Remember that they have the same permissions as your user in Contentful has. This means, that if you're a space admin and your token is leaked, anyone could use it, for example, to delete all your content. And things would only get worse if you were an organization admin.

It's important to think about these tokens as your passwords. You would never write your password in any source code file so why not treat your PATs the same? As a rule of thumb, any time you are dealing with credentials it's better practice to access them using environment variables. This reduces the chances of accidentally leaking them.

But we can go one step further and build some tooling to help us quickly identify those PATs leaked in our organization. And that's what we did on our last hackathon! The idea was to have a cronjob that would run every day and find all the PATs leaked in Github repos belonging to our organization and its users. With the data gathered by the tool, we could then go in and at least revoke those tokens even if we don't fix the code. Following is a brief description on how we implemented it. You can also find all the code on its Github repo: https://github.com/madtrick/cfpat-audit

First of all, we need to write the script that will query Github for files with leaked tokens. Once we have all the offending files we have to check if they belong to users in our Github organization. You can use Github's organization members and code search APIs to do this. Included in the repo is an executable that you can run locally to find leaked tokens in your org:

Ok, so we have a script that we can use to get the list of files that are leaking PATs. And we want to run it regularly so we can react quickly to any incident and revoke the leaked tokens. But if we want to run this as a cronjob, that means at least setting up a machine and deploying the code there, and then of course making sure that this machine is up and running 24/7.

That can seem like a lot of work for such a small script. So we decided to be like the cool kids and use serverless computing: run a small script on a regular basis without having to worry about all the infrastructure requirements. Since we use AWS at Contentful, the choice was clear – we were going to use lambda functions. Think of lambda functions as event handlers that react to different triggers: API calls, CRUD operations on S3, ..., or scheduled events. Our lambda function is simple and small:

It finds the leaked tokens for the org and then saves them in a file in S3. Additionally, not described on this post, we set up an alert so we get notified each time a file was created in the bucket.

Getting your code up and running on AWS lambda requires some initial effort. Things like uploading the code, setting up the right roles, configuring logging. This sounds like quite some work, which we were thinking of getting rid of by using lambda functions. Thankfully there are frameworks like Serverless which abstract all of these and help you a lot along the process. So, unsurprisingly that's what we did.

This is the serverles.yml file which we used to deploy and setup the function in AWS.

So the only thing left is to deploy it and wait for those tokens come your way.

Writing this small script was fun and interesting. Lambda functions are great for dealing with event based workflows and paired with frameworks like Serveless makes it a breeze to use.

Subscribe for updates

Build better digital experiences with Contentful updates direct to your inbox.

Meet the authors

Farruco Sanjurjo

Farruco Sanjurjo

Staff Software Engineer, Contentful

Farruco was formerly a staff software engineer at Contentful, with over 15 years building software and an interest in storage and distributed systems.

Related articles

Tagging your images for SEO is the process of applying HTML attribute tags to images in order to help search engines understand the content of your images.
Guides

What is image tagging for SEO?

May 24, 2022

Learn the key differences between TypeScript vs. JavaScript, and which is better. Find practical tips for migrating your existing JavaScript code to TypeScript.
Guides

TypeScript vs. JavaScript: Explaining the differences

October 24, 2023

When deciding between Svelte vs. React for your project, you need to weigh up the performance and developer features of each. This guide will help you choose.
Guides

Svelte vs. React: Choosing the best for features and performance

March 9, 2023

Contentful Logo 2.5 Dark

Ready to start building?

Put everything you learned into action. Create and publish your content with Contentful — no credit card required.

Get started