My Experience Joining Plastiq – John
I must start by saying that my onboarding with Plastiq has been the best of my […]
At Plastiq we move fast and deploy many applications on any given day. We follow the Trunk Based Development model where we commit directly to the master branch. Builds and deployment are automatic and fast after which unit, api and end to end tests are run.
We want to get to a continuous deployment model where our code after automated testing is pushed automagically to production, and if there are issues the deploy, is automated rolled back and the engineer associated with this deploy is notified in an automated fashion. To reach this level of automation, a significant change is required in how we develop software. The quality of our commits need to be at an even higher standard, increasing our confidence levels of a successful deployment. Our continuous integration and our automated testing systems need to be fast and targeted to make sure we only test the parts of the system that are affected by this commit. The release of this commit to production needs to have checks in place to be able to gradually release the artifact to production to a segment of the users, where it would check error rates and then proceed to release to a wider set of users until fully deployed. If at any stage there are issues, the release needs to be paused, possibly rolled back depending on the severity, and the affected engineers notified via communication channels such as Slack, Email or SMS.
About three years ago, we were in a state of chaos. We had recently moved from Boston to San Francisco, and we were still building our team. We had a two week development cycle followed by a three or four day feature testing and regression testing cycle that would culminate in the dreaded Thursday night release. Before we started testing, we would have to ask an engineer to merge their code in the release branch and then manually do a build followed by a manual deploy to our QA environment. We had zero automated tests; let me repeat: no automated tests of any kind. Our Thursday night releases were a taxing event as it was at 11pm with the majority of the engineering team online waiting for the sky to fall to start patching the system, which would usually end past midnight. Needless to say, this was not a super great development environment. Next, we list out the series of steps we took to get where we are today.
We started small and enabled automated builds so that whenever there is a commit to master there is an automatic build of the artifact. This was relatively easy in that it didn’t change the way the way the team needed to work.
Next we enabled unit tests for applications that had unit tests. We disabled broken tests and kept passing tests, and created a single dashboard where you can see the builds. If anything in the dashboard was red, there was a build issue. This was also a low impact change as it didn’t affect the way the team had to work. At this point we had a single dashboard that would give us the latests state of the build. Not all applications had unit tests, but we had a pattern in place that other applications can follow.
In order to keep progress we needed more time, so we automated the more frequently done task that took the longest time, so we automated our manual regression that was run at every release.
With the extra time that we now had we created a second environment to continuously build our applications and deploy them. This gave us the immediate feedback that our applications can be built and deployed.
Next, we enabled our small regression test to run after the builds have run, and the applications deployed the integration environment. With this we now know that the basic functionality of the system in the integration environment has not regressed. Our next step was to disable the QA environment, and our integration environment is now our QA, staging and staging environment all in one. When someone has to test something before it is released, there is one and only one place to look. No more confusion and misunderstanding where to look for the changes. The integration environment is now the future state of the production environment.
Our next leap was to move away from building features in a branch and only merge when we are closing to a production release: this caused a lot of merge issues, and only gave our QA members a few days to test. We were rewriting our cardholder application, and we had to move fast to meet a company deadline. We decided that while we were doing this re-write, it would be faster to just commit to master, break the build and/or application, then fix it. After we finished the rewrite we decided we liked this development model, so we kept it.
This was an easy step. Right after our new cardholder application was released, the QA members were responsible for production releases, and they released to production after every single defect or feature was added. The benefit was that when you released only a commit or two, only that commit or two can break, so deployments became lower risk and easier to test because of their smaller size. Once we flushed out this new system, we made it easier for our developers to release their software with a standard set of steps to release all applications. Making developers responsible for releases also allowed our developers to own their changes end to end, and this increased the quality awareness of all our engineers.
We now want to get our commits to master be automatically deployed to production. If there is an issue with our commits on the way to production, we envision a friendly slack message notification. If you have ideas on how to do this, we would love to hear from you. Check out our jobs page to view the current list of exciting opportunities available. https://www.plastiq.com/careers/