In this blog post we look at the best practices and expertise that sit behind a well-architected framework and why it’s so important to get it right because it will ultimately save your business from the massive headache of future IT issues.  

We look at the five pillars of a well-architected framework, using the Amazon Web Services framework and then share a checklist on what to look out for across your IT systems to make sure your IT systems are well architected.  

What does Well-Architected mean?

Anyone can put some pieces of software together and ‘make it work’. You only require some basic skills to succeed with that. To do it well though, so the resulting application (or system, component, service) not only does what it is supposed to do, but fits well within the larger context, you need to go beyond developing software. You need to have mastered the skill of software engineering. Software engineering is a real craft that requires skill, in much the same way as a craftsman learns his skills. Especially, with the arrival of cloud computing, a very fundamental paradigm of IT has changed: Whereas in the past if an application did not perform well enough it would be scaled up, today, and especially in the cloud it will be scaled out

In fact, cloud computing is built around the infrastructure capability to scale out anytime with the least amount of effort and disruption. With that shift in paradigms, every modern IT system is innately a distributed system, spanning potentially many different (hardware) nodes all networked together to deliver the expected business value. While this provides immense opportunity in creating very flexible and robust applications, at the same time it greatly increases the complexity and therefore potential for flaws and vulnerabilities in the resulting service. 

What is needed is a supporting framework that guides how well the system will embed and operate within the existing IT landscape and deliver value to its owner. Among the many articles and white papers written, one stands out for us – which is Amazon’s “AWS Well-Architected Framework”. Whilst it is written against the background of AWS’s vast service portfolio, we believe the essence and approaches described are universal, and apply to all good and considered approaches to system architecture.  

The AWS Well-Architected Framework

The framework establishes to consider five essential aspects or ‘pillars’ for a good contemporary IT infrastructure design covering 1) operations, 2)security, 3) resilience, 4) performance and 5) cost. It is intended as a guide for designing green fields as well as transforming existing applications. As a high-level architectural document, it does not provide detailed solutions as such but guides the reader through covering all the essential aspects of good architecture and the set of requirements for the respective application. 

The Five Pillars of a Well-Architected Framework:

  1. Operational Excellence – Software doesn’t only run; it needs to run well within specified parameters, continually, with minimal operational attention. This does not mean that software systems in production will never change – in fact, the exact opposite is true. To minimise potential disruption and service outages, deploy small changes very often into the production system. But to maintain the need for minimal attention, these changes must be reversible at any time, and the process of applying the changes must be repeatable – and that speaks of automation. There is a whole host of underpinning activities that need to work together to make this happen.


Check: Does your organisation have the capacity to achieve this level of operational excellence?

  1. Security – Many security and safety concerns are not new in the age of cloud computing; when organisations still maintained their on-premise infrastructure, they enjoyed a natural perimeter protection through physical barriers to premises. As a result, applications were comparatively safe, certainly in times when computer systems and applications were not as connected as they are today. In today’s hyperconnected world, the possibilities of causing harm – whether maliciously or by accident – require a reversal of your approach to security. From security as an afterthought, it must be kept at the forefront of everything you do. A good principle to follow here is to ask yourself for each piece of information you store: Do you really need this piece of information, and what might happen if it got leaked? The less data you store and process the less headache you will have data austerity.

Check: How are you assessing the overall health and vulnerability threat of your system? Do you do that regularly? Is it automated? How do you respond to any weaknesses or vulnerabilities found in your system? 

  1. Reliability – Your services not only need to perform well under defined and known operational parameters (that is performance efficiency, see below) and low maintenance effort (operational efficiency) but also be resilient against adverse or changing conditions. Typical situations are sudden changes in demand (for instance your company and website were linked from a very popular news outlet), and the existing resources cannot cope with the surge in demand. The simple answer of ‘throw more resources at it’ will fail in a networked environment – your system components need to be designed to scale, scale well, and on-demand (often summarised as autoscaling). There are many such scenarios that need your attention to design a reliable and resilient system. 


Check: Are you Ready and Capable for That?

  1. Performance efficiency – Almost any service can achieve the speed of operations with enough resources. But that’s expensive (see cost control below). Instead, application performance in terms of utilisation of set resources is frequently far more important. If a certain process implementation uses triple of the resources as a comparable alternative that takes only a third more time to complete, then in most circumstances (i.e. where time is of less essence than resource consumption) the more resource-efficient alternative should be used. That is not a one-off exercise – implementing operational excellence, your system (and with that your software engineer team) must maintain this level of performance efficiency across the many, many small updates deployed to production.


Check: And what about the decision to include a new library? What are the performance impacts of including a third-party library in your system?

  1. Cost optimisation The ubiquitous argument of cloud computing being cheaper is very compelling. Until it isn’t true anymore. But why? Certainly, inefficient system design (see performance efficiency) plays a role, and that needs to be addressed continuously. But there are other factors at play, where less than optimal communication between teams leads to duplication of effort and resource spend; developers throwaway setups turn into runaway instances continuously draining resources and racking up costs; lost, forgotten data pools backups, and copies of backups turn your data storage in the cloud into a black hole. And before you can say ‘hang on a minute’ your initial cost benefits have disappeared.


Check: Do you know how to keep tabs on this?

Cloud computing offers compelling opportunities, but it behaves like a Swiss Army knife with 512 functions: If you don’t know how to use it, you’ll end up using the wrong tool for the job and hurting yourself, and the business in the process.  

The Well-Architected Framework Checklist

  1. Operational excellence 

Check: Does your organisation have the capacity to achieve this level of operational excellence?

  1. Security 

Check: How are you assessing the overall health and vulnerability threat of your system? Do you do that regularly? Is it automated? How do you respond to any weaknesses or vulnerabilities found in your system?

  1. Reliability 

Check: Are you ready and capable for that?

  1. Performance efficiency 

Check: And what about the decision to include a new library? What are the performance impacts of including a third-party library into your system?

  1. Cost optimisation 

Check: Do you know how to keep a tab on this?

Driven by best practices: DCL and the well-architected framework Digital Craftsmen (DCL) is an agnostic cloud managed service provider. That means, we provide solutions that are deployed in our own data centres, on AWS, Azure, Google Cloud Platform or other cloud provider platforms. 

We bring a level of expertise across all our services with our ISO27001 and Cyber Essentials Plus certifications which you can trust us. 

If you have any questions to ask the Craftsmen team then drop us an email at [email protected] or call us on 020 3745 7706

Latest Insights

Read the latest news, research and expert views from our master Craftsmen on cyber security and hosting issues, cyber risk, threat intelligence, network security, incident response and cyber strategy.