Member post by Msys Technologies

Abstract

So, everybody loves using the word “migration”. Migrate this, automate that. Just dump it out and automatically migrate it to that <insert-cool-new-cloud-acronym>. Is it that simple, though? Well, let’s find out in this white paper, which addresses the crucial task of data migration using Amazon Web Services (AWS) Database Migration Service (DMS) in conjunction with Infrastructure as Code (IAC) principles implemented through Terraform. 

The whitepaper outlines the significance of automating data migration procedures, elucidates the core features of AWS DMS and Terraform, and provides step-by-step insights into implementing an automated migration workflow. This paper empowers organizations to seamlessly and efficiently migrate their data to the AWS Cloud by focusing on best practices, testing methodologies, and real-world implementation scenarios.

  1. Understanding Data Migration with AWS Database Migration Service (AWS DMS)

AWS Database Migration Service (AWS DMS) is a cloud service for migrating relational databases, NoSQL databases, and other types of data stores. At a basic level, AWS DMS is a server in the AWS Cloud that runs replication software. You create a source and target connection to tell AWS DMS where to extract data from and where to load it. Next, you schedule a task on this server to move your data. The service also supports continuous data capture (CDC) functionality, where it replicates data from source to target on an ongoing basis.

AWS DMS Architectural Overview
Diagram 1 : AWS DMS Architectural Overview

In conclusion, AWS DMS helps customers migrate databases to the AWS Cloud quickly and securely by replicating data from any supported source to any supported target. 

1.1 Prerequisites For AWS DMS: Ensuring Smooth Transition

Data migration demands meticulous planning and preparation. Several prerequisites must be met to set the stage for a successful migration using AWS Database Migration Service (AWS DMS) in conjunction with Terraform Infrastructure as Code (IAC). These prerequisites act as foundational stepping stones, ensuring a seamless and secure transition to the cloud. 

Listed below are requirements organizations must fulfill before starting their data migration journey.

1.2 Use Cases of AWS DMS

AWS Database Migration Service (AWS DMS) covers diverse use cases ranging from like-to-like migrations to more intricate cross-platform transitions. We’ll now discuss these scenarios to uncover how AWS DMS empowers organizations to migrate data efficiently, enabling upgrades, technology shifts, and integration across diverse environments.

  1. Homogeneous Database Migration

Homogeneous database migration involves data migration between identical or consistent target and source databases, such as Oracle to Amazon RDS  for Oracle, MySQL to Amazon Aurora, MySQL to Amazon RDS for MySQL, or Microsoft SQL Server to Amazon RDS for SQL Server. It is a one-step method since the target and source databases’ schema structure and data types are consistent.

Diagram flow showing homogeneous database migration
Diagram 1.2.I : Homogeneous Database Migration

  1. Heterogeneous Database Migration

Heterogeneous database migration involves data migration between target and data sources that are not identical, such as Oracle to Amazon Aurora, Oracle to PostgreSQL, or Microsoft SQL Server to MySQL. In such scenarios, it is necessary to convert the source schema and code to match that of the target databases as schema structures. We used the AWS Schema Conversion Tool (Desktop App) to transform source schema and code, making this migration a two-step procedure.

Diagram flow showing AWS DMS schema migration
Diagram 1.2.II : AWS DMS Schema Migration

Source Schema and code conversion include tables, views, stored procedures, functions, data types, synonyms, and so on. Any objects that DMS Schema Conversion can’t convert automatically are clearly marked, and then one can convert these objects manually to complete the migration. 

Diagram flow showing step 1 and step 2 of heterogeneous database migration
Diagram 1.2.II : Heterogeneous Database Migration

1.3 Components of Database Migration Service in AWS

Below are some of the components that need to be aware of before initiating migration to AWS DMS. 

  1. Replication Instance

Replication instances are managed instances of the Amazon Elastic Compute Cloud (EC2) that hold replication jobs. It connects the source data store, reads the source data, and formats it for consumption by the target data store. A replication instance also loads the data into the target data store. When choosing the Multi-AZ option, organizations can get high availability and failover support with Multi-AZ deployment. 

  1. Source & Target Endpoints

AWS DMS uses endpoints to connect target and source databases. These endpoints provide connection, data store type, and location information about your data store. AWS DMS uses this information to connect to a data store and migrate data from a source endpoint to a target endpoint. 

Below are lists of supported endpoints for both source and target. 

  1. Source Endpoint

Oracle Database, Microsoft SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP Adaptive Server Enterprise (ASE), IBM DB2, Microsoft Azure SQL Database, Google Cloud for MySQL, Amazon RDS for Oracle, Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, Amazon Aurora with MySQL compatibility, Amazon Aurora with PostgreSQL compatibility, Amazon S3, and Amazon DocumentDB.

  1. Target Endpoint

Oracle Database, Microsoft SQL Server, MySQL, MariaDB, PostgreSQL, SAP Adaptive Server Enterprise (ASE), Redis, Amazon RDS for Oracle, Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, Amazon Aurora with MySQL compatibility, Amazon Aurora with PostgreSQL compatibility, Amazon Aurora Serverless v2, Amazon Redshift, Amazon Redshift Serverless, Amazon S3, Amazon DynamoDB, Amazon OpenSearch Service, Amazon ElastiCache for Redis, Amazon Kinesis Data Streams, Amazon DocumentDB with MongoDB compatibility, Amazon Neptune, Apache Kafka, and Babelfish for Aurora PostgreSQL.

III. Replication Tasks

The fundamental purpose of a replication job is to facilitate the smooth movement of data from a source endpoint to a destination endpoint. This pivotal phase necessitates specifying the tables (or views) and schemas critical to the migration and any specialized processing requirements such as logging specifications, control table data, and error handling protocols. Notably, creating this replication task marks a crucial precursor to commencing the migration. Furthermore, it involves defining the migration type, the source and target endpoints, and the replica instance earmarked for the process.

A replication task encompasses three core migration types:

These types give rise to three prominent phases:

This intricate orchestration illustrates how AWS DMS methodically guides the data migration journey through distinct phases, ensuring data integrity and consistency throughout the process.

IV. Cloudwatch Events

Data migration process uses  AWS CloudWatch EventBridge to promptly deliver notifications about various AWS DMS events, including replication task initiation/deletion and replication instance establishment/removal. EventBridge acts as a receiver for these events and effectively directs notifications based on predefined event rules.

V. Lambda Function

We’ve implemented an AWS Lambda function to initiate replication tasks. Whenever an event signaling task creation transpires within AWS DMS, the Lambda function is automatically activated through EventBridge rules that have been meticulously configured.

VI. Resource Limits

In managing the AWS Database Migration Service, we adhere to default resource quotas that serve as soft limits. However, we have the flexibility to elevate these limits to ensure optimal performance according to the specific demands of our migration process, utilizing AWS support tickets for assistance. 

There are several key resource limits for AWS DMS, a selection of which is outlined below:

For illustrative purposes, let’s consider a scenario where we need to migrate 100 databases from an On-Prem MySQL source to RDS MySQL. Using the provided limits, we can calculate the migration process as follows:

Hence,  Total Tasks = Endpoints per Replication Instance / Endpoints per Database = 100/2 = 50.

This calculation indicates that we can migrate up to 50 databases per replication instance. By employing two replication instances, we can accomplish the migration of the entire batch in a single endeavor. This exemplifies the strategic utilization of our resource quotas for efficient and effective database migration.

2.0 Automating Data Migration with Terraform IAC: Process Overview

In the world of data migration, combining Terraform or Terragrunt with AWS Database Migration Service (DMS) brings automation and security to the forefront. This combination simplifies data migration while also managing the creation and removal of AWS infrastructure with strong security measures. 

Let’s explore the step-by-step process that underpins this seamless and secure data migration

Step 1: Fetching Migration Database List: The journey commences with the retrieval of a migration database list, setting the stage for the subsequent actions.

Step 2: Target Schema/Database Creation (Homogeneous Migration): For homogeneous migrations, the process involves the creation of target schema or database structures, laying a robust foundation for data transition.

Step 3: Replication Subnet Group Creation: Enabling seamless network communication, the creation of replication subnet groups forms a pivotal step to facilitate data movement.

Step 4: Source/Target Connection Endpoints: Every database designated for migration is equipped with source and target connection endpoints, fortifying the architecture for swift and secure data transfer.

Step 5: Replication Instance Creation: To execute the actual data migration, the process entails the creation of replication instances, poised to handle the intricate data transition.

Step 6: CloudWatch Event and Lambda Integration: A CloudWatch event coupled with a Lambda function comes into play, orchestrated to trigger the initiation of replication tasks, a foundational step in the migration process.

Step 7: Replication Task Creation and Assignment: Replication tasks, tailored to each database, are meticulously created and assigned to designated replication instances, harmonizing the migration landscape.

Step 8: Migration Task Initiation: As the culmination of this comprehensive process, migration tasks are set in motion for each database, marking the commencement of the data migration journey.

2.1.  Data Migration Automation:  Understanding AWS DMS Pipeline with Terraform IAC Architecture

AWS Database Migration Service (DMS) with Terraform Infrastructure as Code (IAC) not only streamlines the migration journey but also enhances it with automation and efficiency. This section delves into the intricacies of the AWS DMS architecture orchestrated through Terraform IAC, outlining each pivotal step that culminates in a seamless and secure data transition.

AWS DMS Architecture with Terraform IAC

The architecture for data migration automation is initiated through the dynamic framework of Jenkins pipelines. This framework employs an array of input parameters to tailor the migration process, providing flexibility and customization.

Here’s an overview of the comprehensive architecture:

Diagram flow showing AWS DMS architecture with Terraform IAC
Diagram 2.1. AWS DMS Architecture with Terraform IAC

Step 1. Jenkins Pipeline Parameters

The Jenkins pipeline for AWS DMS begins by setting essential input parameters, ranging from region and environment details to Terragrunt module specifics and migration preferences. 

Below are some of the input parameters form the bedrock of a customizable and controlled migration process.

Step 2: Execution Stages

Based on the input parameters, the pipeline advances through distinct execution stages:

Source Code Checkout for IAC: The pipeline starts by checking out the source code for Infrastructure as Code (IAC), ensuring a coherent foundation for the subsequent steps.

Migration Database List: Depending on the migration type selected, the pipeline either fetches the migration database list automatically from the source instance or utilizes a manually provided list.

Schema/Database Creation: The target instance is prepared by creating the requisite schema or database structures for data migration.

Terraform/Terragrunt Execution: The AWS DMS migration journey is facilitated through the execution of Terraform or Terragrunt modules, orchestrating the actual data migration.

Notifications: Notifications are dispatched via MS Teams or email, ensuring transparent communication throughout the migration process.

Step 3: Automatic and Manual List Fetching

The migration database list can be fetched automatically from the source instance using a shell script if  FETCH_DBLIST is automatic. Alternatively, for manual control, users can provide a selective list for migration. 

Step4: Migration Types

Depending on the specified migration type in the MIGRATION_TYPE, the Terraform/Terragrunt module initiates full-load, full-load-and-cdc, or cdc migrations.

Step 5: Automation Control

The migration task initiation can be configured in START_TASKS to start automatically or manually, providing control over the migration process.

Step 6: Credentials Management

During the execution of DMS Terraform/Terragrunt modules, source and target instance database credentials are retrieved from AWS Secrets Manager, ensuring security and privacy.

Step 7: Endpoint Creation

AWS DMS endpoints are established for both source and target instances, facilitating seamless connection and data transfer for each database.

Step 8: Replication Instances

Replication instances are created based on database count or quota limits, enhancing the efficiency of data replication.

Step 9: CloudWatch Integration

AWS CloudWatch event configuration triggers a Lambda function upon the creation of AWS DMS replication tasks.

Step 10: Replication Task Creation

Replication tasks are created for individual databases and assigned to available replication instances, optimizing data transfer.

Step 11: Task Automation

Replication tasks automatically commence using the Lambda function when they are in the Ready State.

Step 12: Monitoring Migration

The AWS DMS Console offers real-time monitoring of data migration progress, providing insights into the migration journey.

Step 13: Ongoing Changes

After full migration, ongoing changes are seamlessly replicated into the target instance, ensuring data consistency.

Step 14: Automated Validation

Migrated data is automatically validated against source and target instances based on provided validation configurations, reinforcing data integrity.

Step 15: Completion and Configuration

Post-validation, it is imperative to ensure user migration and database configurations are completed.

Step 16: Target Testing and Validation

 Once all is set, the application configuration is updated to utilize the target instance for testing, ensuring functionality.

Step 17: Cutover Replication

After thorough testing, the cutover replication from the source instance is executed. A final snapshot of the source instance is taken, concluding the process.

AWS DMS architecture orchestrated through Terraform IAC epitomizes the integration of automation, precision, and security, guiding a seamless data migration journey from inception to culmination.

3. Managing Key Challenges: Data Validation And Agility & Cost Optimization

In the realm of data migration, upholding data integrity and optimizing costs stand as critical benchmarks. Fortunately, AWS DMS offers a strategic approach that aligns precisely with these imperatives.

I. AWS DMS Support For Data Validation 

AWS DMS offers  robust data validation support to ensure the accuracy of migrated data. Validation kicks off immediately following a full load for a table and extends to the incremental changes in CDC-enabled tasks. During the validation process, AWS  DMS meticulously compares each source row with its corresponding target row, meticulously verifying data consistency and promptly flagging any inconsistencies. In cases of CDC-only tasks, pre-existing table data undergoes validation before new data validation commences. 

Supported Source Endpoints

Oracle, PostgreSQL-compatible database (PostgreSQL, Aurora PostgreSQL, or Aurora Serverless for PostgreSQL), MySQL-compatible database (MySQL, MariaDB, Aurora MySQL, or Aurora Serverless for MySQL), Microsoft SQL Server, and IBM DB2. 

Table Statistics

When data validation is enabled, AWS DMS provides comprehensive table-level statistics. These valuable insights can be accessed through the console, AWS CLI, or the AWS DMS API. For those utilizing CloudWatch, task monitoring incorporating table statistical data is achievable through available metrics.

II. Serverless Replication For Operational Agility And Cost Optimization

AWS DMS Serverless stands as a remarkable feature designed to enhance operational agility and optimize cost-effectiveness. By offering automatic provisioning, dynamic scaling, inherent high availability, and a pay-as-you-use billing model, it revolutionizes the landscape of data migration.

Similar to the functionality of the current AWS DMS (referred to in this document as AWS DMS Standard), AWS DMS Serverless allows you to establish source and target connections using endpoints.

Once these connections are established, the subsequent step involves the creation of a replication configuration. This encompasses the configuration settings tailored to the specific replication task. Management of replications is intuitive, encompassing the ability to initiate, halt, modify, or delete them. Each replication can be finely tuned, ensuring alignment with the requirements of your database migration endeavor.

Diagram flow showing AWS DMS serverless replication
        Diagram 3.II.  AWS DMS Serverless Replication

4. Key Features and Benefits of AWS DMS with Terraform

This section unveils the remarkable features that arise from the fusion of AWS DMS and Terraform IAC, revolutionizing data migration by encapsulating benefits that span cost-efficiency, ease of use, minimized downtime, and robust replication. 

Let’s delve into these transformative attributes that amplify the data migration experience.

  1. Cost Optimization

Migrating data shouldn’t break the bank. AWS DMS Migration offers a low or optimal cost model. Pay solely for the utilized compute resources during migration and any additional log storage, ensuring a cost-effective transition.

  1.  Ease of Use

The migration process is streamlined with simplicity. No specific driver or application installation is required, and often no changes to the source database are needed. One-click resource creation powers the entire migration journey, enhancing user-friendliness.

  1. Continuous Replication and Minimal Downtime

AWS DMS ensures continuous replication of your source database, even while it’s operational. This feature facilitates minimal downtime and empowers seamless database switching at your convenience.

  1. Reliability and High Availability

Multi-AZ capability brings high availability to database migration. Redundant replication instances enhance continuous data replication. AWS DMS automatically scales replication servers based on database size or count, augmenting reliability.

  1. Ongoing Replication

Embrace on-going replication tasks to maintain source and target database synchronization, ensuring data consistency over time.

  1. Diverse Source/Target Support

AWS DMS supports migrations ranging from like-to-like, such as MySQL to MySQL, to heterogeneous migrations across various platforms like Oracle to Amazon Aurora. It accommodates transfers between SQL, NoSQL, and text-based targets.

  1. Database Consolidation

Unify multiple source databases into a single target database effortlessly. This feature applies to both homogeneous and heterogeneous migrations across all supported database engines.

  1. Efficiency in Schema Conversion and Migration

Beyond data migration, AWS DMS requires minimal manual effort for tasks such as migrating users, stored procedures, triggers, schema conversion in heterogeneous migrations, and validating the target database against application functionality.

  1. Automated Provisioning with Terraform IAC

Harness the potential of Terraform/terragrunt for automated creation and destruction of AWS DMS replication tasks. Ideal for migrations involving multiple databases, this approach simplifies management and enhances scalability.

  1. Automated Pipeline Integration

 Seamlessly integrate with CI/CD pipelines for comprehensive migration management across stages. This integration offers efficient monitoring and progress tracking, streamlining the migration process.

Incorporating the fusion of AWS DMS and Terraform IAC, these features redefine the data migration landscape, imparting agility, cost-effectiveness, and robustness to the journey of data transition.

5. Conclusion

The amalgamation of AWS Database Migration Service (AWS DMS) and Terraform Infrastructure as Code (IAC) emerged as a game-changing force and the white paper has intricately unveiled the synergy of these technologies.

The white paper serves as a guiding compass and explores synergy between AWS DMS and Terraform IAC. From understanding the core concepts to navigating complexities, these insights equip you with the tools to embark on migration journeys that are streamlined, secure, and poised for success.

As businesses navigate digital transformation, this collaboration promises a future where data migration becomes an avenue for optimization and growth, reshaping the data landscape with precision and efficiency.

6. References