Testing with DevOps and Reclaim
I was recently recommended a book from several different people including our most recent hire, Joe McMahon, called The Phoenix Project. In the book, the authors use a fictional narrative of a sinking auto parts company with an IT unit struggling to keep up with the modern demands of their company. Along the way they learn a lot of great lessons how to be "lean" and work more efficiently. The book is effectively a trojan horse at introducing the concepts of DevOps as well as project management methodologies like Kanban boards. I absolutely loved it and have found myself a bit inspired to act on some of the wisdom of the book in how it relates to Reclaim Hosting.
Truth be told I've been a "build the plane while flying it" kinda guy for a long time. When you're a one man show it's necessary and when you work with a group that works well in that dynamic (as DTLT most certainly did) it makes a lot of sense. When we started working on Reclaim Hosting we didn't keep things under wraps for very long. Our "pilot" was effectively launching and running the company for a year (albeit at a subsidized cost), and learning as we went. We only had a single server starting out and when we ran into issues, well, we fixed them. But there was no "change control", no "staging", no "quality assurance" or other testing of that nature. Our test was whether actual clients could use our system and our measure of that was in our support tickets and complaints.
That works fine for a small company and especially in very lean early years where complexity is very minimal. We've grown a ton (for which I'm regularly amazed and consider myself wholly lucky) and things are quite a bit more complex these days. Between a move to virtualization and the launching of our Domain of One's Own institutional platform last year we now manage north of 50-60 servers and that number grows quite regularly. Our shared hosting environment alone servers several thousand users. Building the plane while flying it hasn't been a great option for awhile and these days it's downright dangerous. We need to start building better systems to innovate in as a company that don't negatively effect our production servers and also still allow us to be lean and ship real code regularly.
A perfect example of this is our billing system. At Reclaim Hosting we currently use the WHMCS hosting automation and billing platform. For probably the first year I was great about keeping it up-to-date as new releases came out. But we've made a lot of changes to code there now and meanwhile a new point version promises a lot of major changes. Security patches will still be released for our version but the clock is now ticking and we aren't benefiting from any of the new enhancements. Why not just update? Well in short we had no idea if it would be permanently broken and we'd have to roll to a backup. How will invoicing work? What about our custom theming? Will our registrar modules function properly? We need an environment where we can proactively answer those questions long before end-users experience any possible issue. While this seems obvious now and has been something I had pondered, I never really had a sense of what I needed to do until this weekend.
We recently (finally!) moved our billing platform off our shared hosting servers entirely and dedicated a virtual server to it. So I started simple by just setting up an additional virtual host to serve as a development environment and copied all the code. I made a copy of the database and got a dev license for the software loaded in there. Easy enough and now we finally have a space that mirrors our production billing system. I turned off things like email notifications, registrar actions, billing gateway usage, etc so that no activity done in development would affect our actual users, but we can see those actions as they perform and see how things function. Ideally we could even automate the process of getting a recent copy of the database nightly and loading it in while preserving the changes to make the dev environment what it is.
The next step I wanted to take was having the knowledge of what changes we make when we make them. Version control seemed the perfect tool for this and so I setup a private git repo for our billing system and had the production environment pull in changes that are pushed up by development. All of a sudden I can now see scenarios where we could do our testing in development, commit sets of changes, and then release by pulling down the code to our production instance. We get all the benefits of moving fast and lean shipping code regularly but also monitoring the work we do with version control and working in safe environments where we can be assured once it finally rolls out that there aren't hiccups.
We've got a long way to go as a company. This was just one weekend with a single application and we use several within our company. Habits also die hard and things like documenting work and staging as opposed to "let me just login and fix it" are tough to resist, but the payoff is a streamlined flow of work and a company that always knows what each hand is doing as complexity grows. I'm actually looking forward to finding where some of these development bottlenecks are for our operations and where we might strategically keep pushing forward so that we can start to institutionalize some of this now while we're still a young company. We're small but already a complex machine and the more work we do to lay the foundation for a streamlined system now the better off it will serve us for years to come.