Every company I talk to has some plan for digital transformation. Most of them understand the implications of the rapidly accelerating pace of change, and the need to transform to thrive at this new pace. Yet there is no common definition for what this means, and certainly no agreed template that is simple to follow. A component that is often a part of the plan is some sort of cloud migration. While we know that “the cloud” isn’t an outcome itself, it is one of the best enablers of agility for companies wanting to transform. It wasn’t too many years ago that many companies were trying to decide if they were going to put a piece of their infrastructure into a public cloud. Today, the question has changed from if to when, but still there are few companies that are taking full advantage of the opportunity. Even after all the evidence, I still hear claims that the cloud is more expensive or less secure than what most companies can do themselves. Statements like these come from companies that haven’t figured out how to seize the opportunities yet.
I am not going to claim that my old team has it all figured out either, but while pushing the boundaries we learned enough to realize way more value than what we had originally dreamed possible. We started the same way that many companies did, trying to plan out and justify the whole migration before we started. When we did get started, we learned so much that we had to throw out the original plan. Now I tell companies that are just starting to pick something simple that will be on the high side of savings, and move it to learn. Capture the learnings and use the savings to fund the next move. If you do it right the whole migration funds itself.
Some context is useful before I explain a few of the lessons we learned. I ran the IT team at Microsoft, and while my bias for Azure is obvious, these lessons apply to any cloud migration, regardless of provider. It is also important to note that Satya, Microsoft’s CEO, viewed my team as a customer of Azure to drive feedback to make the product better. This meant that we needed to consume the Azure services the same way that customers would and pay for them at market rates. This made me a better customer as I had to justify the costs in my budget and realize the value just like real customers. We moved an environment that was already very optimized in an on-premise private cloud, and consisted of three thousand applications running on 60k virtual machines, using many petabytes of data. One of the big indicators of how much we learned as we went was the fact that our total Azure bill from Microsoft when we were half done, was the same as the bill was when we were all done. Hopefully the optimization we learned along the way can help you accelerate your own benefits.
Here is what we learned
When looking at the whole opportunity, it is important to understand that the bigger win is around the agility for your teams, enabling faster delivery of your mission critical needs at the new pace of innovation. The total cost opportunity is a reduction of the expenses occurred to operate the physical data centers (depreciation of the space, power and cooling systems), elimination of the data center network, server and storage costs, plus part of the people cost dedicated to provisioning and managing all of this (savings from reallocation of employees and cutting of outsource contracts). This savings not only paid for more than our entire Azure bill. I also funded the cost of the migration, and allowed us to return tens of millions of dollars back to the company bottom line. More importantly, the agility increase allowed us to begin delivering at a higher level at a time when the speed to value was required to fuel the overall company transformation.
Step one: First, eliminate what isn’t needed. Experiment with quick wins. Move application packages to SaaS.
It is helpful to have a good inventory of what exists in your data centers. Don’t waste time planning out the whole migration, but do pick some easy capabilities to move first to learn. Easy means lower regulatory or security risk with less integration to other apps. It helps save more if the app runs on old hardware that is near end of life or if the hardware can be repurposed to avoid some other spend.
While you are doing the initial experiments to learn how to migrate to the cloud, there are two more things you can do. Leveraging data from the portfolio, in particular data about hardware utilization, eliminate apps and capabilities that are rarely used. This will be hard because someone will insist they are important, but usually it is the team that supports them as part of their job that fights the hardest. Historical data shows that 20-30% of most data center capacity can be turned off rather than migrated. This estimate is larger when you include non-production environments and what I’ll discuss in the next paragraph. The second task in parallel to migration experiments is to move all packaged applications to the SaaS version of that application. If the vendor you are using doesn’t provide a SaaS version or the app is home-grown, I recommend that you find an equivalent application that is SaaS or use a third party tool such as Corent SurPaaS/SaaS to quickly SaaS-enable the app. The principle here is that SaaS is way more economical than IaaS because you don’t need to manage or pay for the infrastructure.
Step Two: Eliminate persistent non-production environments
It took us a while to figure this out, but there is no need to use cloud capacity for things that are not running all the time. In particular, when we first moved all of our development and test environments to Azure, we realized that we could “snooze” capacity when it wasn’t in use. Then we learned that we could build provisioning into our test automation, so the first step of the test automation provisions the test environment, then it loads data as necessary, runs the tests, and then turns everything off saving all telemetry from the tests to for further evaluation. For most companies, somewhere around half of their capacity is used by development, test, integration test and user acceptance test environments. With these off except when necessary, the savings are significant. It also turns out the resulting test automation improves quality by facilitating CI/CD automation as well as significantly improving development agility.
Step Three: Right size your production capacity
Using the same principle that you don’t need cloud capacity provisioned for capabilities not in use, the cloud allowed us to scale for normal and use automation to increase scale when needed rather than provisioning for peaks like we learned to do historically. We also leveraged cloud capabilities to spin up BC/DR environments when needed, rather than keeping them idle all the time. Depending on the level of volatility, this rightsizing can eliminate another half of the capacity that you need to consume. Also, avoid lifting and shifting workloads to IaaS whenever possible. It is way less efficient so ideally you would refactor applications into PaaS and FaaS services unless you need to move something because it integrates closely with something else you are moving.
Step Four: Using automated tools
When my team first did our cloud migration, we had to do everything manually, or develop our own tools. Now there are many good tools available. As an example, for very basic scanning/discovery and Lift and Shift, Microsoft offers Azure Migrate as a free tool. On the Azure Migrate page, they also list other tools for scanning and discovery (Cloudamize, Corent Tech SurPaaS, Turbonomic, Unicloud and Device 42). For actual migration they also list Corent Tech SurPaaS, and Carbonite as ISV tools that could be used. Other cloud providers may also provide tools, and some of the ISV tools listed here can be used for migration to other clouds.
Making this all work
Unfortunately, just following the steps above isn’t enough. In fact the hardest part from my experience was related to people. Change is hard. The infrastructure team will shrink materially. Fortunately, the team that is left will become even more important in governing the templates and spend, and the application teams are going to need the other people with infrastructure skills to design all the cloud provisioning into the test automation. Be transparent with your team about the changes. Offer roles in the application and security teams to infrastructure people willing to move. Outsource their positions so you can ramp down that contract without needing to eliminate employees as the migration progresses. Change the accountability for the cloud spend to the application teams that have the ability to control it. Also hold the application teams accountable for their security, so the new security team becomes a wanted advisor rather than friction in the migration. Measure progress by reduction of your on-premise footprint, optimizing to eliminate whole data centers.
The migration isn’t done when everything is in the cloud
Continue to optimize once you move. You will learn more tricks to drive efficiencies. You should continue to refactor to eliminate anything that was just lifted to IaaS. Better yet, find more things you can move to SaaS. Long before you are done migrating, you will start realizing some agility benefits which should be contagious for your teams to drive the rest faster. If you aren’t seeing all these benefits yet, don’t wait to plan it all out. Try. Learn. Accelerate your transformation.
About the Author
As the most recent CIO of Microsoft, Jim DuBois is a Fortune 500 board director and global technology advisor. He has more than 30 years of experience in a broad range of IT and software development leadership roles including with Accenture and has worked with enterprise leaders from around the globe.