AWS Resilience Technology Competency
Partner Offering Validation Checklist
Validity Period: August 2025-February 2026
This version of the checklist was released on August 29th, 2025. The next version of this checklist is expected to be released in February 2026. AWS Partners may continue to use this version of the checklist until May 2026. AWS Partners may submit applications using the previous release (February 2025) until November 27th, 2025. Please review the change log for a list of changes (if any) since the previous version.
Introduction
The goal of the AWS Competency Program is to recognize AWS Partner Network Partners (“AWS Partners”) who demonstrate and maintain technical proficiency and proven customer success in specialized AWS Partner solution areas. The AWS Competency Partner Validation Checklist (“Checklist”) is intended for AWS Partners who are interested in applying for an AWS Competency. This Checklist provides the criteria necessary to achieve the designation under the AWS Competency Program. AWS Partners undergo a technical validation of their capabilities upon applying for the specific AWS Competency. AWS leverages in-house expertise and a third-party firm to facilitate the technical validation. AWS reserves the right to make changes to this document at any time and without notice.
Expectation of Parties
It is expected that AWS Partners will review this document in detail before applying for the AWS Competency Program, even if all the prerequisites are met. If items in this document are unclear and require further explanation, please contact your AWS Partner Development Representative (“PDR”) or AWS Partner Development Manager “(PDM”) as the first step. Your PDR/PDM will contact the program office if further assistance is required.
AWS Partners should complete the Self-Assessment Spreadsheet linked at the top of this page, prior to submitting a program application. Once completed, AWS Partners must submit an application in APN Partner Central. Visit the AWS Competency Program guide for step-by-step instructions on how to submit an application.
AWS will review and aim to respond back with any questions within five business days. Incomplete applications will not be considered until all requirements are met. If complete, AWS will send the application to in-house solution architect (SA) experts to complete a Technical Validation. A validation call may be required once the AWS SA has reviewed the self-assessment offline. The AWS SA will reach out directly if additional information is required, or to schedule a validation call.
AWS Partners should prepare for the audit by reading the Checklist, completing a self-assessment using the Checklist, and gathering and organizing objective evidence to share with the auditor on the day of the audit.
AWS recommends that AWS Partners have individuals who are able to speak in-depth about how the solution meets the requirements described in this document during the audit. The best practice is for the AWS Partner to make the following personnel available for the audit: one or more highly technical AWS certified engineers/architects, an operations manager who is responsible for the operations and support elements, and a business development executive to conduct the overview presentation. AWS Partners should ensure that they have the necessary consents to share with the auditor (whether AWS or a third-party) all information contained within the objective evidence or any demonstrations prior to scheduling the audit.
AWS may revoke an AWS Partner’s Competency designation if, at any time, AWS determines in its sole discretion that such AWS Partner does not meet its AWS Competency Program requirements. If an AWS Partner’s Competency designation is revoked, such AWS Partner will (i) no longer receive benefits associated with its designation, (ii) immediately cease use of all materials provided to it in connection with the applicable Competency designation and (iii) immediately cease to identify itself as an AWS Partner of such AWS Competency.
Categories
AWS Resilience Technology Competency Partners provide technical solutions to help AWS customers building or improving critical workloads’ availability and resilience in the cloud. A critical workload refer to workloads that are important for customers’ functioning, operations and success, and in case of downtime or impairment, could potentially result in financial impact, brand reputational damage or regulatory fines.
The Resilience Technology Competency structured in three (3) categories:
Design
This category focuses on the architectural design and implementation of highly available systems. The goal is to build redundancy, load balancing, high availability data access patterns, and automatic failover mechanisms into the application architecture to minimize the impact of individual component failures. This includes leveraging technologies like load balancers, clustering solutions, and service mesh technology to provide redundancy and scalability. Partners in this category can also provide solutions related to software development lifecycle testing and quality, including the multitude of resilience testing, as load testing, unit testing and integration testing. This category provide solutions that can address the most common categories of failure, such as single points of failure, excessive load, excessive latency, misconfiguration and bugs and shared fate. The objective is to proactively reduce the probability of service impairments through robust system design and improving systems service level objectives by implement resilient highly available mechanisms.
Operate
This category encompasses the operational aspects of resilience. It provides the capabilities to detect, mitigate, and learn from failures in real-time through advanced monitoring, alerting, and collaboration tools. Solutions in this category includes monitoring, observability, incident management, remediation automation, communication and collaboration, chaos engineering, operational readiness reviews and correction of error process. The goal is to establish a continuous improvement cycle, where insights from incident handling and testing are used to enhance the overall resilience posture.
Recover
This category covers the technologies for recovering from major incidents, disasters, or catastrophic failures that may lead to extended downtime or data loss. It includes backup and recovery solutions, disaster recovery, data protection and data replication, data integrity and consistency checks, to enable rapid restoration of data and systems, as well as comprehensive disaster recovery planning and testing. The focus is on minimizing Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) to ensure business continuity and resilience against large-scale disruptions.
Prerequisites
The following items will be validated by the AWS Competency Program Manager; missing or incomplete information must be addressed prior to scheduling of the Technology Validation Review.
-
1.0APN Program Membership
-
1.1Program Guidelines
The AWS Partner must read the Program Guidelines and Definitions before applying to the AWS Resilience Competency Program. Click here for Program details.
-
1.2Software Path Membership
Partner must be at the Validated or Differentiated stage within the Software Path. SI Partners should talk to their PDR/PDM on how to join the Software Path.
-
1.3Foundational Technical Review
The AWS Partner solution must have a valid AWS Foundational Technical Review. FTRs completed for other solutions in the AWS Partner’s portfolio do not fulfill this requirement.
-
1.4Solution Category
The AWS Partner must identify the specific AWS Resilience category and deployment model for their solutions. Deployment models must be one of:
- SaaS on AWS
- Customer deployed
-
1.5AWS Partner Program Requirements
To maintain this Specialization you must:
- Maintain an approved FTR for the Software Solution that is submitted for this Specialization
- Maintain the AWS Partner Central Solution attached to your application in "Active" status. This indicates the Solution is currently supported and available.
Important: If you fail to maintain either of these requirements, your Specialization will be marked as non-compliant. You will then have 6 months to regain compliance with the above criteria. If compliance is not regained, you will lose your Specialization and all corresponding benefits.
-
-
2.0Example AWS Customer Deployments
-
2.1Production AWS Customer Case Studies
The AWS Partner must privately share with AWS details about four (4) unique examples of Resilience projects executed for four (4) unique AWS customers. Each case study must demonstrate how the partner offering was used by a customer to solve a specific Resilience customer challenge using AWS.
In addition to the required case study details provided in AWS Partner Central, the partner must also provide architecture diagrams of the specific customer deployment and information listed in the technical requirements sections of this validation checklist.
The information provided for these case studies will be used by AWS for validation purposes only. The partner is not required to publish these details publicly.
AWS Partner can reuse the same case study across different AWS Specialization designations as long as the case study and implementation scope are relevant to those designations. The partner should make sure the existing case study clearly explains the relevance to each designation they are applying for.
AWS will accept one case study per customer. Each customer must be a separate legal entity to qualify. The partner may use an example for an internal or affiliate company of the partner if the offering is available to outside customers.
All case studies must describe deployments that have been performed within the past 18 months and must be for projects that are in production with customers, rather than in a ‘pilot’ or proof of concept stage.
All case studies provided will be examined in the Documentation Review of the Technical Validation. The partner offering will be removed from consideration if the partner cannot provide the documentation necessary to assess all case studies against each relevant validation checklist item, or if any of the validation checklist items are not met.
Case Study Submission
- The Partner Central application offers a form to submit attach the four (4) case studies as marketing assets to be published (public and anonymous case studies only) to external channels such as Partner Solution Finder (PSF).
- The Partner Self-Assessment checklist (Excel File) downloaded at the top of this Validation Checklist includes additional requirements for the submitted case studies intended to provide additional technical context only for purposes of technical validation and will NOT be publicly published.
-
2.2Publicly Available Case Studies
At least two (2) of the provided case studies must be publicly available examples describing how the partner used AWS to help solve a specific customer challenge related to Resilience. These publicly available examples may be in the form of formal customer case studies, white papers, videos, or blog posts. The partner will provide the publicly available URL (published by the partner) in the AWS Partner Central ‘Case Study URL’ field, which must include the following details:
- AWS Customer name
- AWS Partner name
- AWS Customer challenge that aligns with the scope of the competency and selected category
- Using both high-level and technical details, describe how AWS was leveraged as part of the partner solution
- Outcome(s) and/or quantitative results
-
2.3Anonymized Public Case Studies
In cases where the partner cannot publicly name customers due to the sensitive nature of the customer engagements, the partner may choose to anonymize the public case study. Anonymized public case study details will be published by AWS, but the customer name will remain private. The partner must provide the AWS Customer name in the ‘Company name’ field of the AWS Partner Central case study for validation purposes, but it will not be published by AWS. The case study fields that will be published to Partner Solutions Finder (PSF) by AWS include the ‘Title’, ‘Case Study Description’, and ‘Case Study URL’. The partner will provide the publicly available URL (published by the partner) in the AWS Partner Central 'Case Study URL’ field, which must include the following details:
- AWS Customer Description (e.g. a top 5 US retailer, a Fortune 500 financial institution, etc.).
- AWS Partner name
- AWS Customer challenge that aligns with the scope of the competency and selected category
- Using both high-level and technical details, describe how AWS was leveraged as part of the partner solution
- Outcome(s) and/or quantitative results
For best practices on how to write an accepted public case study see the Public Case Study Guide.
-
2.4Resilience Tooling Customer Definition
For the purposes of public and private customer examples for the AWS Resilience Technology Competency, AWS Partners may use examples where their product was used directly by end customers or cases where their product is used by AWS Resilience Consulting Competency Partners in executing resilience for end customers.
-
2.5Customer Example Outcomes/Results
All customer examples and public case studies used to fulfill requirements 2.1 and 2.2 must be for projects in which customer workloads are production workloads and are running on AWS. Hybrid workloads implementation will be accepted, as long as part of the implementation is running on AWS, for example, utilizing AWS as a disaster recovery location for on-premises environments.
-
-
3.0AWS Partner Self-Assessment
-
3.1AWS Partner Self-Assessment
AWS Partner must conduct a self-assessment of their compliance to the requirements of the AWS Resilience Technology Partner Validation Checklist. A version of this Checklist is available in spreadsheet format. Links to the appropriate self-assessment spreadsheet can be found at the top of this page.
- AWS Partner must complete all sections of the Checklist.
- Completed self-assessment should be uploaded via the AWS Competency application in AWS Partner Central. It is recommended that AWS Partner has their Partner Solutions Architect (PSA), Partner Development Representative (PDR), or Partner Development Manager (PDM) review the completed self-assessment before submitting to AWS. The purpose of this is to ensure the AWS Partner’s AWS team is engaged and working to provide recommendations prior to the review and to help ensure a productive review experience.
-
Design Category Requirements
Instructions (PLEASE READ) This category contains (5) Use Cases: Load Balancers, Clustering Solutions, Service Mesh Technologies, API Gateway, Software Quality Control. Please respond to ALL requirements of ONLY (1) applicable use case.
Design category focuses on the architectural design and implementation of highly available systems. The goal is to build redundancy, load balancing, high availability data access patterns, and automatic failover mechanisms into the application architecture to minimize the impact of individual component failures. Additional details can be found on the designation site.
Load Balancers
A load balancer is a critical component that distributes incoming network traffic across multiple servers or resources, acting as a traffic conductor to ensure no single server becomes overwhelmed. In the context of building resilient workloads, load balancers play a fundamental role in maintaining high availability and fault tolerance.
-
REGD-001 - Health Checks
Applies to: SaaS | Customer Deployed
Product must continuously monitors the health of backend resources, automatically removing unhealthy instances from the traffic flow and reinstating them once healthy.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-002 - Traffic Distribution Algorithms
Applies to: SaaS | Customer Deployed
Product must offer multiple smart traffic distribution methods (e.g., round-robin, least connections, weighted, session stickiness) to optimize resource utilization and prevent overload.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-003 - Cross-Zone Balancing and affinity rules
Applies to: SaaS | Customer Deployed
Product must offers mechanisms to setup cross availability-zone balancing and affinity rules. Product must understand AWS availability zone and region boundaries and must be able to distributes traffic across multiple availability zones. Product must also be able to restrict traffic in a single availability zone if customer prefers to implement an AZ independence architecture pattern.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Clustering solutions
Clustering solutions are specialized solutions that group multiple servers or instances together to act as a single system, providing high availability and load distribution for critical workloads. In the context of building resilient workloads, clustering plays a vital role by ensuring continuous service availability even if individual components fail.
-
REGD-004 - Automatic Failover
Applies to: SaaS | Customer Deployed
Product must be able to seamlessly transfers work/users/jobs to healthy nodes when a failure is detected, minimizing downtime.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-005 - Load distribution
Applies to: SaaS | Customer Deployed
Product must be able to intelligently distribute load across multiple nodes to prevent resource exhaustion and maintain optimal performance. Product must allow customers to scale up or down the number of workers in the cluster without any major availability impact.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-006 - Cluster health monitoring
Applies to: SaaS | Customer Deployed
Product must continuously monitors the health of all nodes and services, providing real-time status and proactive issue detection. A health dashboard must be available for customers to monitor the overall cluster health.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Service mesh technology
A service mesh is a dedicated infrastructure layer for managing service-to-service communication within a distributed microservices architecture. It plays a crucial role in building resilient workloads by providing advanced traffic management, observability, and security features.
-
REGD-007 - Intelligent load balancing and traffic routing
Applies to: SaaS | Customer Deployed
Product must implement intelligent load balancing and traffic routing. A service mesh dynamically distributes network traffic across multiple service instances based on various factors like latency, resource utilization, and health status, while providing advanced routing capabilities such as circuit breaking, retries, and traffic splitting.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-008 - Circuit breaking to prevent cascading failures
Applies to: SaaS | Customer Deployed
Product must implement circuit breaking. Circuit break is a resiliency pattern that automatically stops forwarding requests to a failing or degraded service instance when certain thresholds (like error rates or latency) are exceeded, allowing the service to recover and preventing the failure from cascading to other parts of the system.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-009 - Throttling to regulate traffic flow
Applies to: SaaS | Customer Deployed
Product must implement throttling. Throttling is a traffic management mechanism that limits the rate of requests to a service by controlling how many calls can be made within a specified time period, preventing system overload and ensuring stable performance.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-010 - Observability integration
Applies to: SaaS | Customer Deployed
Product must implement observability or integrate with observability solutions. Observability integration refers to the service mesh's built-in capability to automatically collect and expose data about service-to-service communications, making the entire system's behavior visible and analyzable.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
API Gateway
API Gateway technology serves as a crucial intermediary layer between clients and backend services, managing API requests and responses. In the context of designing resilient workloads, API Gateway plays a pivotal role in enhancing system reliability and scalability.
-
REGD-011 - Intelligent load balancing and traffic routing
Applies to: SaaS | Customer Deployed
Product must implement intelligent load balancing and traffic routing. An API gateway dynamically distributes network traffic across multiple service instances based on various factors like latency, resource utilization, and health status, while providing advanced routing capabilities such as circuit breaking, retries, and traffic splitting.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-012 - Circuit breaking to prevent cascading failures
Applies to: SaaS | Customer Deployed
Product must implement Circuit breaking. Circuit breaking is a resiliency pattern that automatically stops forwarding requests to a failing or degraded service instance when certain thresholds (like error rates or latency) are exceeded, allowing the service to recover and preventing the failure from cascading to other parts of the system.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-013 - Throttling to regulate traffic flow
Applies to: SaaS | Customer Deployed
Product must implement throttling. Throttling is a traffic management mechanism that limits the rate of requests to a service by controlling how many calls can be made within a specified time period, preventing system overload and ensuring stable performance.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-014 - Observability integration
Applies to: SaaS | Customer Deployed
Product must implement observability or integrate with observability solutions. Observability integration refers to the API gateways's built-in capability to automatically collect and expose data about service communications, making the entire system's behavior visible and analyzable.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-015 - Request validation
Applies to: SaaS | Customer Deployed
Product must be capable to allow customers to filter out malformed requests before they reach backend services.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Software quality control
Software quality control encompasses a range of tools, methodologies, and practices designed to ensure software meets high standards of reliability, performance, and security.
-
REGD-016 - Software testing
Applies to: SaaS | Customer Deployed
Product must implement software testing. Software testing is a systematic process of evaluating a software application to detect defects, verify functionality, and ensure it meets specified requirements before deployment. Examples of software testing: unit, integration, performance. These examples are not limited to the scope previously mentioned.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-017 - Continuous integration and deployment (CI/CD) integration
Applies to: SaaS | Customer Deployed
Product must integrate with CI/CD pipelines. Continuous integration and deployment (CI/CD) integration allows customers to catch issues early and enable rapid, reliable updates.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGD-018 - Code analysis
Applies to: SaaS | Customer Deployed
Product must allow static and dynamic code analysis. Code analysis tools helps to identify potential vulnerabilities, bugs, and performance bottlenecks.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Operate Category Requirements
Instructions (PLEASE READ) This category contains (3) Use Cases: Monitoring & Observability, Incident Management & Alerting, Chaos Engineering & Fault Injection Experimentation. Please respond to ALL requirements of ONLY (1) applicable use case.
This category encompasses the operational aspects of resilience. It provides the capabilities to detect, mitigate, and learn from failures in real-time through advanced monitoring, alerting, and collaboration tools. Additional details can be found on the designation site.
Monitoring and Observability
Monitoring and Observability technology provides comprehensive visibility into the health, performance, and behavior of systems running on AWS. This technology serves as an early warning system and diagnostic tool that helps customers to identify the overall health of their workloads.
-
REGO-001 - Real-time metrics, logs and signals
Applies to: SaaS | Customer Deployed
Product must implement real-time metrics, logs, and signals. Real-time metrics, logs, and signals are continuous streams of data that provide immediate insights into system performance, application behavior, and infrastructure health through numerical measurements (metrics), detailed event records (logs), and various system indicators (signals) collected and analyzed as they occur.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-002 - Real-time metrics, traces and logs
Applies to: SaaS | Customer Deployed
Product must implement real-time metrics, traces and logs. Real-time metrics, traces and logs are continuous streams of data that provide immediate insights into system performance, application behavior, and infrastructure health through numerical measurements (metrics), detailed event records (logs), and tracing tracks the journey of requests across distributed services (traces) collected and analyzed as they occur.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-003 - Anomaly detection
Applies to: SaaS | Customer Deployed
Product must implement anomaly detection. Anomaly detection is an automated process that uses statistics or machine learning algorithms to identify unusual patterns, behaviors, or data points that deviate significantly from what is considered normal system behavior.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-004 - Integration with incident management and automation tools
Applies to: SaaS | Customer Deployed
Product must be able to integrate with incident management or process automation tools, allowing customers to build an streamline process to improve incident detection and remediation times.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Incident management and alerting
Incident management and alerting technology enables customers to quickly detect, respond to, and resolve issues that could impact their systems' availability and performance. This technology forms the backbone of an effective incident response strategy, serving as the first line of defense against potential disruptions.
-
REGO-005 - Incident correlation and aggregation
Applies to: SaaS | Customer Deployed
Product must demonstrate the ability to correlate and aggregate alerts across diverse systems to reduce noise, identify root causes faster, and enable a focused, efficient incident response.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-006 - Post mortem and root cause analysis
Applies to: SaaS | Customer Deployed
Product must demonstrate a functionality to allow customers to build a structured and blameless approach to post-mortem and root cause analysis, capturing lessons learned and implementing corrective actions to prevent recurrence and continuously improve system resilience.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-007 - Runbooks
Applies to: SaaS | Customer Deployed
Product must implement support for customizable and context-aware runbooks that can be triggered or linked directly from alerts, enabling rapid, consistent, and automated incident response.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-008 - Operational readiness checklists
Applies to: SaaS | Customer Deployed
Product must implement customizable operational readiness checklists, enabling teams to verify critical dependencies, validate configurations, and ensure systems are prepared for production or failover scenarios.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-009 - On-Call Scheduling Systems
Applies to: SaaS | Customer Deployed
Product must implement on-call scheduling, that automatically route alerts to the appropriate responders based on real-time duty rosters and escalation policies.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-010 - Smart routing and escalation
Applies to: SaaS | Customer Deployed
Product must implement customizable alert workflows that support dynamic routing rules, multi-channel notifications, and automated escalation policies based on severity, team availability, and acknowledgment status.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Chaos engineering / Fault Injection Experimentation
Fault Injection Experimentation (Chaos Engineering) is a disciplined approach to identifying and addressing potential failures in systems by deliberately introducing controlled disruptions into the production environment. It's essentially the practice of intentionally injecting faults to test how systems respond under stress, helping customers build more resilient workloads.
-
REGO-011 - Systematic and controlled experiments
Applies to: SaaS | Customer Deployed
Product must implement systematic and controlled experiments that simulate real-world failures (network latency, server crashes, API failures)
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-012 - Emergency levers to interrupt experiments
Applies to: SaaS | Customer Deployed
Product must implement automated and manual emergency stop mechanisms—such as API-triggered aborts, UI-based kill switches, and guardrails—to allow customers to immediately halt fault injection experiments when predefined risk thresholds are breached.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGO-013 - Reporting and monitoring
Applies to: SaaS | Customer Deployed
Product must implement a reporting feature that captures key metrics, logs, and system behavior during fault injection experiments, enabling users to generate reports that measure experiments impact and validate recovery mechanisms.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Recover Category Requirements
Instructions (PLEASE READ) This category contains (2) Use Cases: Disaster Recovery Solutions, Disaster Recovery Orhestrators. Please respond to ALL requirements of ONLY (1) applicable use case.
This category covers the technologies for recovering from major incidents, disasters, or catastrophic failures that may lead to extended downtime or data loss. Additional details can be found on the designation site.
Disaster recovery solutions
Disaster Recovery (DR) technology encompasses solutions designed to restore business operations after a significant disruption. It enables customers to maintain business continuity by replicating critical data, applications, and infrastructure to alternate locations.
-
REGR-001 - Failover and failback procedures
Applies to: SaaS | Customer Deployed
Product must implement failover and failback procedures that seamlessly redirect traffic and restore workloads between primary and secondary environments with minimal manual intervention.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGR-002 - Disaster recovery testing and drills
Applies to: SaaS | Customer Deployed
Product must implement automated and auditable disaster recovery testing capabilities that allow customers to regularly simulate failovers, validate recovery procedures, and ensure readiness without impacting production systems.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGR-003 - Close-to-zero or Close-to-minutes recovery point objectives (RPO), data consistency, and multiple recovery points
Applies to: SaaS | Customer Deployed
Product must implement continuous (synchronous or asynchronous), application-aware data replication with point-in-time snapshot capabilities and built-in consistency markers across regions or failover targets.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Disaster recovery orchestrator
A Disaster Recovery Orchestrator is a solution that automates and coordinates the failover, recovery, and failback of applications, infrastructure, and data across environments to ensure minimal downtime and consistent recovery processes during a disruption.
-
REGR-004 - Disaster recovery orchestration
Applies to: SaaS | Customer Deployed
Product must implement a centralized orchestration feature that allows customers to configure, monitor, and automate recovery workflows across all environments from a single interface.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGR-005 - Disaster recovery testing and drills
Applies to: SaaS | Customer Deployed
Product must implement automated and auditable disaster recovery testing capabilities that allow customers to regularly simulate failovers, validate recovery procedures, and ensure readiness without impacting production systems.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGR-006 - Pre-Built and Customizable Runbooks
Applies to: SaaS | Customer Deployed
Product must implement a flexible orchestration engine that provides out-of-the-box recovery workflows while allowing users to define custom steps, scripts, and application-specific logic to meet unique recovery requirements.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGR-007 - Reporting, Compliance and Audit Logging
Applies to: SaaS | Customer Deployed
Product must implement detailed, immutable logging and reporting capabilities that track every recovery action, configuration change, and test execution, enabling traceability and regulatory compliance.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
-
REGR-008 - Application-Aware Orchestration
Applies to: SaaS | Customer Deployed
Product must implement dependency mapping and coordinated recovery workflows that ensure applications and their underlying services are restored in the correct order with consistent state across all tiers.
Evidence: Product or service documentation with clear description of the technical solution that meets the above criteria.
Common Competency Requirements
The following common requirements are applicable to all AWS Partners applying to the AWS Resilence Technology Competency.
Common Technical Requirements
The following requirements are applicable to all solutions in the Resilience Competency.
-
RECR-001 - AWS Resilience Competency fit
Applies to: SaaS | Customer Deployed
The partner must demonstrate how customers can utilize the partner’s solution to improve their resilience posture.
Please describe how your service or solution helps customers to improve their resilience posture by achieving one or more of the following indicators:
- Increased uptime and availability
- Reduced recovery point objectives (RPO)
- Reduced recovery time objectives (RTO)
- Decreased mean time to detect outages, impairments or incidents
- Decreased mean time to respond to outages, impairments or incidents
- Streamlined team communication during outages, impairments or incidents
- Resilience risk assessment, failure mode analysis and remediation
- Detection, prevention, and mitigation mechanisms related to outages, impairments or incidents
These indicators are not exhaustive, but illustrate how the partner solution can enhance the customer's resilience.
Please provide the following as evidence: 1)Partner website or landing page that contains a description of the related product and/or services describing how it can help customers to improve one or multiple of the indicators related above.
-
RECR-002 - Improving resilience of workloads running on AWS
Applies to: SaaS | Customer Deployed
The AWS Partner solution must demonstrate its capabilities to enhance the resilience of customer workloads that are running on AWS. It is required that the partner service/solution integrates, extend or provides additional resilience capabilities to AWS services.
Please describe your AWS services integration capability and include the following components:
- List of AWS services that are currently supported for integration
Please provide the following as evidence:
- Copy/Link to customer facing manuals or documentation listing supported AWS services compatibility and integrations.
-
RECR-003 - The solution must demonstrate its resilience capabilities
Applies to: SaaS | Customer Deployed
The AWS Partner solution supporting customer's resilience must demonstrate it's own resilience. Partner's solution is a critical component of the customer's overall resilience strategy. If the partner's solution itself is not resilient, it could become a point of failure, undermining the very resilience it is meant to support. By ensuring the partner's solution is resilient, highly available and scalable, the customer can have confidence that their critical workloads can rely on the partner solution even in the face of disruption.
Please describe the following about your service or solution:
- Expected availability metrics: Service level agreement (SLA) and other metrics that define the expected availability of the partner's solution (e.g. 99.9% availability).
- Health Dashboards: Real-time monitoring of the partner's solution components, with the capability to detect failures or performance issues quickly.
- For SaaS - Product or service health dashboard demonstrating that the related product or service constantly meets the defined Service Level Objectives / Service Level Agreements.
- For Customer Deployed - Documentation related to the operational procedures that customers must follow to monitor the resilience of the deployed solution.
- Disaster Recovery Capabilities: The partner's solution should have recovery objectives well defined and measurable with realistic bounds backup, replication, and recovery mechanisms to minimize downtime and data loss in the event of a major incident or disaster.
- For SaaS - Disaster recovery practice and frequency that drills are executed
- For Customer Deployed - Written documentation or process that customers have to follow to implement the disaster recovery for the related product or service
Please provide the following as evidence:
- Copy/Link to customer-facing documentation describing the above information.
-
RECR-004 - API, integration or automation capabilities
Applies to: SaaS | Customer Deployed
The AWS Partner solution must expose APIs (Application Programming Interface), integration, or automation capabilities that can be leveraged by customers to construct or integrate with other solutions, thereby allowing customers to establish and build a cohesive framework to improve their resilience posture.
Please describe how your service or product can be leveraged for automation or integration with other solutions, allowing customers to build a cohesive resilience framework.
Please provide the following as evidence:
- Copy/Link to customer facing documentation of APIs, automation or integration capabilities with other vendors that can help customer to build frameworks to increase resilience.
Resources
- AWS Specialization Programs Guide
- Provides step-by-step instructions when applying for an AWS Specialization.
- AWS Partner Specialization Program Benefits Guide
- Provides a deeper description of the program benefits.
- AWS Competency Application Process
- Provides high-level visibility into the AWS Competency application process and timelines for associated process steps.
- How to build a microsite
- Provides guidance on how to build a microsite to highlight your AWS Specialization.
- How to build a public case study
- Provides guidance on how to build a public customer case study that will meet program requirements and showcase your success with AWS Customers.
- How to build an architecture diagram
- Provides guidance on how to build an architecture diagram that will meet program requirements.
- Well Architected Website
- Learn about the Well Architected Framework and its approach.
- SaaS Best Practices
- Provides best practices on SaaS
- Changes between previous and current versions
- Change Log
- Deployment Pipeline Reference Architecture
- Learn about the stages and actions for different types of pipelines that exist in modern systems.