Why do Cloud Migration Projects Fail?

As regular readers will know, I have posted on several elements of this topic previously, but I decided to wrap this up with a more holistic (I do dislike that word) view, based on my experience of working in this area for the past 15 years or so.

What is Failure, or to put it better, what does success look like?

According to various sources (take this with a pinch of salt), cloud migration failure rates run from 30-80%, even if we take the lower scale, 30% is still far too high. But what do we mean by failure? Does this figure include projects that have exceeded the original budget (McKinsey 75%) or overrun (McKinsey 37%), or is this a more technical reference, i.e. the migrated/transformed application did not meet the user/business requirements?

The bottom line for many of these projects is that a clear and definable goal was not defined. I have talked about DC exit programs previously, and arguably, the objective is clear, i.e., close x number of DCs by a given date, but is it? This is more of an aspirational statement than a clearly defined metric and often does not reflect reality; for example, you may have applications in your DC that are not suited for cloud migration for technical or regulatory reasons. What do you do with these applications? You can’t move them to the cloud, so they must stay on-premises, but where will you host them if you close your data centres (probably a private cloud, of some description, but are the costs in the business case)?

The above is a relatively simplistic problem statement, but quite frequently, organisations do not know what the end goal is, whether it is to reduce costs, improve efficiency (always a bit nebulous), transform the IT landscape, etc. You need to fully understand the end state and design metrics that can be easily (and accurately) measured and remediated if required.

Understand the current Application and Infrastructure estate

I would argue that this is the most critical element in your cloud migration journey and, contrary to the subheading, it does not just include an inventory of applications and infrastructure. Running a discovery tool to get a view of your estate is very important and should not be ignored. However, typically, this will only give you a list of servers, hardware specifications, operating systems, usage statistics, applications (if known or discoverable by the tool), and integration points (assuming the application has communicated with all the relevant servers during the discovery phase).

This process typically does not provide business metrics such as the importance of the application, sensitivity of the data, current costs or hardware refresh dates, regulatory & compliance requirements, etc.

As a result, discovery data gathered from various tools is just not enough information to accurately plan or cost a migration/transformation; at least some of these are known gaps and many organisations will perform some sort of interview or workshopping with application owners, etc. However, one thing that I continually see is that organisations do not have an accurate handle on their current costs, yes they will know what their core hosting costs are (especially if they are using a 3rd party), and they will know what costs are associated with any run/support contract they may have in place. Still, I have seen instances where the organisation did not have a clear picture of how the hardware and licensing costs are currently managed and what the implication would be when the application is moved to the cloud. An example of this would be the difference between an on-premises license and a cloud license, as quite often, there are differences, if not directly in cost, but in how this is applied to a cloud instance.

Another key component that can be misunderstood is the required capacity; this can be a tricky component at the best of times. Typically, organisations scale for peak demand in an on-premises world, some scaling can be applied at the hypervisor level (i.e. thin provisioning, auto-scaling etc.), but ultimately, the underlying host has to be to be able to manage the peak demand. It is important to note that you would typically only run a discovery tool for a relatively small amount of time (in the scheme of things), and it is possible that the peak load may not be identified during your discovery period.

For public cloud providers, capacity planning is handled differently, and features such as scale sets need to be planned and implemented to manage the service effectively. When implemented correctly, these features can significantly affect how applications are managed and their costs.

A related but separate item is reserved instances, these provide a defined amount of compute or storage, usually at a cheaper cost, but for a fixed term (1-3 years). This is a great feature and is especially effective for core systems such as SAP, as these are unlikely to change over the 3-year period reserved instances are typically contracted for. However, I have seen examples where Reserved Instances have been purchased but never used, which can negatively impact run costs.

Move groups and dependencies

Many applications, from an infrastructure perspective, can be viewed as stand-alone, however, they are often linked either by the users, or via integrations into other systems, both internally and externally. Whilst the discovery tools (and CMDB) may well be able to identify where the integration points are, they do not usually have sufficient information to identify user groups, etc.

Whilst it would be nice just to collate a list of servers, and move them based on some criteria IT or the consultancy organisation has defined, for practical purposes you want to be able to manage the impact on the business and users, by moving related applications simultaneously. This, in theory, reduces the risk of failures and can reduce complexity (and, potentially, costs).

Controlling the Budget

I have already talked about some areas where costs can be impacted, i.e. capacity planning, licenses etc. However, this is not the only area where cost/budgetary overruns can occur.

I am not going to list everything that needs to be considered here, because the list would be very long, but there are several areas where I have seen changes in costs go unnoticed. One of the more interesting areas (especially considering my role) is the contracting terms of the various support agreements organisations may have, this is especially relevant to any infrastructure or app support contracts in place, and a review of these clauses would be something I would recommend to anyone looking to migrate workloads to the cloud, for example, I have seen provisions that restrict the number of servers leaving the estate i.e. reducing the number of servers that the team supports, which often have some financial penalties. Another common issue, especially for on-premises support services, is there being a different rate for supporting cloud workloads.

One of the benefits of the cloud, and one of its complexities, is that it is constantly evolving; license costs and compute changes are common examples, It is imperative that the run costs of the service are monitored and clear mitigation plans in place if costs are outside of the expected range. There are several tools which can be leveraged to help with this aspect of the service, such as Turbonomic and Apptio (both from IBM).

Are your applications Cloud Aligned?

I have talked about this in my previous Cloud Repatriation article (https://www.linkedin.com/pulse/cloud-repatriation-valid-option-richard-hogan-wo5ee/?trackingId=Zcdzq12TQ2mO61yxSf33RA%3D%3D), but it is definitely worth restating this. In my experience, and there are valid reasons for doing this, organisations tend to start with a lift-and-shift approach to migrations to transform the application later, i.e. once the migrations have been completed.

As a strategy, this is perfectly acceptable as long as the transformation element is completed within a reasonable timeline. The justification for this is that typically, a non-cloud aligned architecture (call it cloud native if you prefer) is more cost-effective and better suited for cloud hosting. In general, the longer you leave an application in its original on-premises architecture, the more it will cost, and the more technical debt you accrue, and you may potentially be forced into a change before you are ready, i.e. OS or application end of support e.g. Windows, SAP ECC etc. Also, don’t just focus on the server level; make sure you are including ALL components of the service, including data and integration layers, etc.

Is your Organisation Aligned with the Cloud strategy?

You would think this would be a given, but there is often a disconnect between the C-Suite and the operational teams on the strategy and delivery of cloud migrations, this can be further complicated by the organisational structure. This seems particularly common in financial services organisations, but when central IT is not the only place where IT decision-making and contracting can occur, all the departments/units must be on board. For example, I have seen instances where central IT hosts most (not all) of the core solutions, but for argument’s sake, Investment Banking “owns” the point solutions for their business, in this scenario both teams must have a shared strategy and goal, otherwise, issues with the planning, architecture and delivery can occur.

The same is true for the CISO office, cyber security should be involved right from the program’s initiation. Unfortunately, this is not always the case, as, incorrectly, in my opinion, cyber security is often seen as a blocker.

For more on this topic, please refer to my earlier article https://www.linkedin.com/pulse/importance-cybersecurity-cloud-migrations-richard-hogan-708we/?trackingId=XmGdmZunQhy2lDRPYbO7Uw%3D%3D

Do you have a user adoption programme for both IT and end users?

It is generally accepted that if the user experience of the application is changing in some form or another then end-user training and other supporting activities should be planned and initiated, hopefully prior to the migration.

What I think is less thought through is the IT teams that have to work on the new platform, both from an operational perspective and also in terms of how new features are developed and deployed to the new platform. In a similar vein to the architecture point, managing cloud solutions is different to working with the same solution in the cloud, even if everything about the application is the same. This is compounded if the application model has been transformed as part of the journey to the cloud, for example, if the organisation has adopted an Infrastructure as Code or DevSecOps approach, as this will bring new tools and processes into play.

Another consideration is around how IT resources deploy and plan new features if they continue to do this in the old on-premises way you will inadvertently cause the very same issues, that you may have avoided with the initial migration/transformation activities i.e. applications which are not architected with the cloud in mind, incorrectly scaled, in-effective us of cloud-native patterns etc.

In Conclusion

Cloud Migration projects are probably one of the most ambitious and critical journeys an organisation can embark on and, looking at the stats can be very tricky to get right and in my opinion need to include several key elements to mitigate the inherent risks of such a program, including (not in any particular order):

Conduct a thorough end-to-end evaluation and discovery phase. Ensure that you include all aspects of the business, including cybersecurity, finance & procurement.
Understand what success looks like and define SMART metrics to track performance against. As you encounter issues, re-plan/baseline the programme if necessary.
Architect and implement a compliant but adaptable cloud landing zone; this should be able to be modified based on changing guidance/best practices from the cloud providers.
Develop and maintain a technical debt register for applications where a “lift and shift” migration was selected. Plan to mitigate any issues related to non-cloud-aligned architectures, paying special focus to legacy N-Tier applications and date/storage architecture.
Review and update the operating model if necessary to ensure compliance with the new cloud platforms.
Continuously review and monitor cloud costs, both compute and licensing and if necessary, make amendments.
Migrate related applications together and prioritise the applications you are migrating based on a clearly understood and documented metric.
Ensure that all business elements are aligned and involved in and informed of the migration’s business goals, processes, activities, and responsibilities.
Develop a robust end-user and IT training plan, specifically focusing on any changes to BAU/operational procedures and architectural guidance for new applications.
Develop and implement an incident response plan specifically for cloud workloads, and ensure that you have coverage for early life support during and immediately after migration to the cloud.
Update the organisation’s Threat Management solution to fully integrate the cloud platforms, especially where SaaS and PaaS platforms have been adopted.
Do not be afraid to leave applications “behind”, in some cases it may well be better not to migrate an application if it is unsuitable.
In an ideal world consider adopting a DevSecOps or SecOps model for deployment and management of cloud based workloads.

Please let me know if you think I have forgotten an element in the comments

What is Failure, or to put it better, what does success look like?

Understand the current Application and Infrastructure estate

Move groups and dependencies

Controlling the Budget

Are your applications Cloud Aligned?

Is your Organisation Aligned with the Cloud strategy?

Do you have a user adoption programme for both IT and end users?

In Conclusion

Share this:

Related

One thought on “Why do Cloud Migration Projects Fail?”

Add yours

Leave a comment Cancel reply

Social