Member post by Msys Technologies
So, everybody loves using the word “migration”. Migrate this, automate that. Just dump it out and automatically migrate it to that <insert-cool-new-cloud-acronym>. Is it that simple, though? Well, let’s find out in this white paper, which addresses the crucial task of data migration using Amazon Web Services (AWS) Database Migration Service (DMS) in conjunction with Infrastructure as Code (IAC) principles implemented through Terraform.
The whitepaper outlines the significance of automating data migration procedures, elucidates the core features of AWS DMS and Terraform, and provides step-by-step insights into implementing an automated migration workflow. This paper empowers organizations to seamlessly and efficiently migrate their data to the AWS Cloud by focusing on best practices, testing methodologies, and real-world implementation scenarios.
- Understanding Data Migration with AWS Database Migration Service (AWS DMS)
AWS Database Migration Service (AWS DMS) is a cloud service for migrating relational databases, NoSQL databases, and other types of data stores. At a basic level, AWS DMS is a server in the AWS Cloud that runs replication software. You create a source and target connection to tell AWS DMS where to extract data from and where to load it. Next, you schedule a task on this server to move your data. The service also supports continuous data capture (CDC) functionality, where it replicates data from source to target on an ongoing basis.
In conclusion, AWS DMS helps customers migrate databases to the AWS Cloud quickly and securely by replicating data from any supported source to any supported target.
1.1 Prerequisites For AWS DMS: Ensuring Smooth Transition
Data migration demands meticulous planning and preparation. Several prerequisites must be met to set the stage for a successful migration using AWS Database Migration Service (AWS DMS) in conjunction with Terraform Infrastructure as Code (IAC). These prerequisites act as foundational stepping stones, ensuring a seamless and secure transition to the cloud.
Listed below are requirements organizations must fulfill before starting their data migration journey.
- Access to Source or Target Endpoints through Firewall and Security Groups.
- Source Endpoint Connection.
- Target Endpoint Connection.
- Replication Instance.
- Target Schema or Database.
- Cloudwatch event to trigger lambda function.
- Lambda function to start replication task.
- Resource Limit Increase.
1.2 Use Cases of AWS DMS
AWS Database Migration Service (AWS DMS) covers diverse use cases ranging from like-to-like migrations to more intricate cross-platform transitions. We’ll now discuss these scenarios to uncover how AWS DMS empowers organizations to migrate data efficiently, enabling upgrades, technology shifts, and integration across diverse environments.
- Homogeneous Database Migration
Homogeneous database migration involves data migration between identical or consistent target and source databases, such as Oracle to Amazon RDS for Oracle, MySQL to Amazon Aurora, MySQL to Amazon RDS for MySQL, or Microsoft SQL Server to Amazon RDS for SQL Server. It is a one-step method since the target and source databases’ schema structure and data types are consistent.
- Heterogeneous Database Migration
Heterogeneous database migration involves data migration between target and data sources that are not identical, such as Oracle to Amazon Aurora, Oracle to PostgreSQL, or Microsoft SQL Server to MySQL. In such scenarios, it is necessary to convert the source schema and code to match that of the target databases as schema structures. We used the AWS Schema Conversion Tool (Desktop App) to transform source schema and code, making this migration a two-step procedure.
Source Schema and code conversion include tables, views, stored procedures, functions, data types, synonyms, and so on. Any objects that DMS Schema Conversion can’t convert automatically are clearly marked, and then one can convert these objects manually to complete the migration.
1.3 Components of Database Migration Service in AWS
Below are some of the components that need to be aware of before initiating migration to AWS DMS.
- Replication Instance
Replication instances are managed instances of the Amazon Elastic Compute Cloud (EC2) that hold replication jobs. It connects the source data store, reads the source data, and formats it for consumption by the target data store. A replication instance also loads the data into the target data store. When choosing the Multi-AZ option, organizations can get high availability and failover support with Multi-AZ deployment.
- Source & Target Endpoints
AWS DMS uses endpoints to connect target and source databases. These endpoints provide connection, data store type, and location information about your data store. AWS DMS uses this information to connect to a data store and migrate data from a source endpoint to a target endpoint.
Below are lists of supported endpoints for both source and target.
- Source Endpoint
Oracle Database, Microsoft SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP Adaptive Server Enterprise (ASE), IBM DB2, Microsoft Azure SQL Database, Google Cloud for MySQL, Amazon RDS for Oracle, Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, Amazon Aurora with MySQL compatibility, Amazon Aurora with PostgreSQL compatibility, Amazon S3, and Amazon DocumentDB.
- Target Endpoint
Oracle Database, Microsoft SQL Server, MySQL, MariaDB, PostgreSQL, SAP Adaptive Server Enterprise (ASE), Redis, Amazon RDS for Oracle, Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, Amazon Aurora with MySQL compatibility, Amazon Aurora with PostgreSQL compatibility, Amazon Aurora Serverless v2, Amazon Redshift, Amazon Redshift Serverless, Amazon S3, Amazon DynamoDB, Amazon OpenSearch Service, Amazon ElastiCache for Redis, Amazon Kinesis Data Streams, Amazon DocumentDB with MongoDB compatibility, Amazon Neptune, Apache Kafka, and Babelfish for Aurora PostgreSQL.
III. Replication Tasks
The fundamental purpose of a replication job is to facilitate the smooth movement of data from a source endpoint to a destination endpoint. This pivotal phase necessitates specifying the tables (or views) and schemas critical to the migration and any specialized processing requirements such as logging specifications, control table data, and error handling protocols. Notably, creating this replication task marks a crucial precursor to commencing the migration. Furthermore, it involves defining the migration type, the source and target endpoints, and the replica instance earmarked for the process.
A replication task encompasses three core migration types:
- Full Load: This entails migrating existing data exclusively.
- Full Load with CDC (Change Data Capture): Here, existing data migration is coupled with the ongoing replication of changes.
- CDC Only (Change Data Capture): Solely replicating changes in data.
- Validation Only: Concentrating solely on data validation.
These types give rise to three prominent phases:
- Migration of Existing Data (Full Load): During this phase, data is transposed from the source data store’s tables to their counterparts on the target data store using AWS DMS.
- Cached Changes Application: While a full load is underway, changes made to the loading tables are cached on the replication server. Once the full load for a particular table concludes, AWS DMS promptly proceeds to apply the cached changes pertaining to that table.
- Ongoing Replication (Change Data Capture): At the outset of this phase, a backlog of transactions typically causes a delay between the source and target databases. Over time, as the migration processes through this transaction backlog, a point of equilibrium is achieved, leading to a harmonious and steady migration flow.
This intricate orchestration illustrates how AWS DMS methodically guides the data migration journey through distinct phases, ensuring data integrity and consistency throughout the process.
IV. Cloudwatch Events
Data migration process uses AWS CloudWatch EventBridge to promptly deliver notifications about various AWS DMS events, including replication task initiation/deletion and replication instance establishment/removal. EventBridge acts as a receiver for these events and effectively directs notifications based on predefined event rules.
V. Lambda Function
We’ve implemented an AWS Lambda function to initiate replication tasks. Whenever an event signaling task creation transpires within AWS DMS, the Lambda function is automatically activated through EventBridge rules that have been meticulously configured.
VI. Resource Limits
In managing the AWS Database Migration Service, we adhere to default resource quotas that serve as soft limits. However, we have the flexibility to elevate these limits to ensure optimal performance according to the specific demands of our migration process, utilizing AWS support tickets for assistance.
There are several key resource limits for AWS DMS, a selection of which is outlined below:
- Endpoints per user account (default: 1000)
- Endpoints per replication instance (default: 100)
- Tasks per user account (default: 600)
- Tasks per replication instance (default: 200)
- Replication instances per user account (default: 60)
For illustrative purposes, let’s consider a scenario where we need to migrate 100 databases from an On-Prem MySQL source to RDS MySQL. Using the provided limits, we can calculate the migration process as follows:
- Tasks per Database = 1
- Endpoints per Database = 2
- Endpoints per Replication Instance = 100
Hence, Total Tasks = Endpoints per Replication Instance / Endpoints per Database = 100/2 = 50.
This calculation indicates that we can migrate up to 50 databases per replication instance. By employing two replication instances, we can accomplish the migration of the entire batch in a single endeavor. This exemplifies the strategic utilization of our resource quotas for efficient and effective database migration.
2.0 Automating Data Migration with Terraform IAC: Process Overview
In the world of data migration, combining Terraform or Terragrunt with AWS Database Migration Service (DMS) brings automation and security to the forefront. This combination simplifies data migration while also managing the creation and removal of AWS infrastructure with strong security measures.
Let’s explore the step-by-step process that underpins this seamless and secure data migration
Step 1: Fetching Migration Database List: The journey commences with the retrieval of a migration database list, setting the stage for the subsequent actions.
Step 2: Target Schema/Database Creation (Homogeneous Migration): For homogeneous migrations, the process involves the creation of target schema or database structures, laying a robust foundation for data transition.
Step 3: Replication Subnet Group Creation: Enabling seamless network communication, the creation of replication subnet groups forms a pivotal step to facilitate data movement.
Step 4: Source/Target Connection Endpoints: Every database designated for migration is equipped with source and target connection endpoints, fortifying the architecture for swift and secure data transfer.
Step 5: Replication Instance Creation: To execute the actual data migration, the process entails the creation of replication instances, poised to handle the intricate data transition.
Step 6: CloudWatch Event and Lambda Integration: A CloudWatch event coupled with a Lambda function comes into play, orchestrated to trigger the initiation of replication tasks, a foundational step in the migration process.
Step 7: Replication Task Creation and Assignment: Replication tasks, tailored to each database, are meticulously created and assigned to designated replication instances, harmonizing the migration landscape.
Step 8: Migration Task Initiation: As the culmination of this comprehensive process, migration tasks are set in motion for each database, marking the commencement of the data migration journey.
2.1. Data Migration Automation: Understanding AWS DMS Pipeline with Terraform IAC Architecture
AWS Database Migration Service (DMS) with Terraform Infrastructure as Code (IAC) not only streamlines the migration journey but also enhances it with automation and efficiency. This section delves into the intricacies of the AWS DMS architecture orchestrated through Terraform IAC, outlining each pivotal step that culminates in a seamless and secure data transition.
AWS DMS Architecture with Terraform IAC
The architecture for data migration automation is initiated through the dynamic framework of Jenkins pipelines. This framework employs an array of input parameters to tailor the migration process, providing flexibility and customization.
Here’s an overview of the comprehensive architecture:
Step 1. Jenkins Pipeline Parameters
The Jenkins pipeline for AWS DMS begins by setting essential input parameters, ranging from region and environment details to Terragrunt module specifics and migration preferences.
Below are some of the input parameters form the bedrock of a customizable and controlled migration process.
- AWS_REGION – Populating region list from repository.
- APP_ENVIRONMENT – Populating application environment list from repository.
- TG_MODULE – Populating terragrunt module folder list from repository.
- TG_ACTION – Allow users to select terragrunt action’s [validate, plan, apply].
- TG_EXTRA_FLAGS – Allows users to pass additional terragrunt flags.
- FETCH_DBLIST – Migration DB list generation types [AUTOMATIC, MANUAL].
- CUSTOM_DBLIST – SQL Server custom DB list for Migration if FETCH_DBLIST is MANUAL.
- MIGRATION_TYPE – Allow users to select DMS migration type’s [full-load, full-load-and-cdc, cdc].
- START_TASKS – Allow users to enable / disable migration task execution.
- TEAMS – MS TEAMS channel for build notification.
Step 2: Execution Stages
Based on the input parameters, the pipeline advances through distinct execution stages:
Source Code Checkout for IAC: The pipeline starts by checking out the source code for Infrastructure as Code (IAC), ensuring a coherent foundation for the subsequent steps.
Migration Database List: Depending on the migration type selected, the pipeline either fetches the migration database list automatically from the source instance or utilizes a manually provided list.
Schema/Database Creation: The target instance is prepared by creating the requisite schema or database structures for data migration.
Terraform/Terragrunt Execution: The AWS DMS migration journey is facilitated through the execution of Terraform or Terragrunt modules, orchestrating the actual data migration.
Notifications: Notifications are dispatched via MS Teams or email, ensuring transparent communication throughout the migration process.
Step 3: Automatic and Manual List Fetching
The migration database list can be fetched automatically from the source instance using a shell script if FETCH_DBLIST is automatic. Alternatively, for manual control, users can provide a selective list for migration.
Step4: Migration Types
Depending on the specified migration type in the MIGRATION_TYPE, the Terraform/Terragrunt module initiates full-load, full-load-and-cdc, or cdc migrations.
Step 5: Automation Control
The migration task initiation can be configured in START_TASKS to start automatically or manually, providing control over the migration process.
Step 6: Credentials Management
During the execution of DMS Terraform/Terragrunt modules, source and target instance database credentials are retrieved from AWS Secrets Manager, ensuring security and privacy.
Step 7: Endpoint Creation
AWS DMS endpoints are established for both source and target instances, facilitating seamless connection and data transfer for each database.
Step 8: Replication Instances
Replication instances are created based on database count or quota limits, enhancing the efficiency of data replication.
Step 9: CloudWatch Integration
AWS CloudWatch event configuration triggers a Lambda function upon the creation of AWS DMS replication tasks.
Step 10: Replication Task Creation
Replication tasks are created for individual databases and assigned to available replication instances, optimizing data transfer.
Step 11: Task Automation
Replication tasks automatically commence using the Lambda function when they are in the Ready State.
Step 12: Monitoring Migration
The AWS DMS Console offers real-time monitoring of data migration progress, providing insights into the migration journey.
Step 13: Ongoing Changes
After full migration, ongoing changes are seamlessly replicated into the target instance, ensuring data consistency.
Step 14: Automated Validation
Migrated data is automatically validated against source and target instances based on provided validation configurations, reinforcing data integrity.
Step 15: Completion and Configuration
Post-validation, it is imperative to ensure user migration and database configurations are completed.
Step 16: Target Testing and Validation
Once all is set, the application configuration is updated to utilize the target instance for testing, ensuring functionality.
Step 17: Cutover Replication
After thorough testing, the cutover replication from the source instance is executed. A final snapshot of the source instance is taken, concluding the process.
AWS DMS architecture orchestrated through Terraform IAC epitomizes the integration of automation, precision, and security, guiding a seamless data migration journey from inception to culmination.
3. Managing Key Challenges: Data Validation And Agility & Cost Optimization
In the realm of data migration, upholding data integrity and optimizing costs stand as critical benchmarks. Fortunately, AWS DMS offers a strategic approach that aligns precisely with these imperatives.
I. AWS DMS Support For Data Validation
AWS DMS offers robust data validation support to ensure the accuracy of migrated data. Validation kicks off immediately following a full load for a table and extends to the incremental changes in CDC-enabled tasks. During the validation process, AWS DMS meticulously compares each source row with its corresponding target row, meticulously verifying data consistency and promptly flagging any inconsistencies. In cases of CDC-only tasks, pre-existing table data undergoes validation before new data validation commences.
Supported Source Endpoints
Oracle, PostgreSQL-compatible database (PostgreSQL, Aurora PostgreSQL, or Aurora Serverless for PostgreSQL), MySQL-compatible database (MySQL, MariaDB, Aurora MySQL, or Aurora Serverless for MySQL), Microsoft SQL Server, and IBM DB2.
When data validation is enabled, AWS DMS provides comprehensive table-level statistics. These valuable insights can be accessed through the console, AWS CLI, or the AWS DMS API. For those utilizing CloudWatch, task monitoring incorporating table statistical data is achievable through available metrics.
II. Serverless Replication For Operational Agility And Cost Optimization
AWS DMS Serverless stands as a remarkable feature designed to enhance operational agility and optimize cost-effectiveness. By offering automatic provisioning, dynamic scaling, inherent high availability, and a pay-as-you-use billing model, it revolutionizes the landscape of data migration.
Similar to the functionality of the current AWS DMS (referred to in this document as AWS DMS Standard), AWS DMS Serverless allows you to establish source and target connections using endpoints.
Once these connections are established, the subsequent step involves the creation of a replication configuration. This encompasses the configuration settings tailored to the specific replication task. Management of replications is intuitive, encompassing the ability to initiate, halt, modify, or delete them. Each replication can be finely tuned, ensuring alignment with the requirements of your database migration endeavor.
4. Key Features and Benefits of AWS DMS with Terraform
This section unveils the remarkable features that arise from the fusion of AWS DMS and Terraform IAC, revolutionizing data migration by encapsulating benefits that span cost-efficiency, ease of use, minimized downtime, and robust replication.
Let’s delve into these transformative attributes that amplify the data migration experience.
- Cost Optimization
Migrating data shouldn’t break the bank. AWS DMS Migration offers a low or optimal cost model. Pay solely for the utilized compute resources during migration and any additional log storage, ensuring a cost-effective transition.
- Ease of Use
The migration process is streamlined with simplicity. No specific driver or application installation is required, and often no changes to the source database are needed. One-click resource creation powers the entire migration journey, enhancing user-friendliness.
- Continuous Replication and Minimal Downtime
AWS DMS ensures continuous replication of your source database, even while it’s operational. This feature facilitates minimal downtime and empowers seamless database switching at your convenience.
- Reliability and High Availability
Multi-AZ capability brings high availability to database migration. Redundant replication instances enhance continuous data replication. AWS DMS automatically scales replication servers based on database size or count, augmenting reliability.
- Ongoing Replication
Embrace on-going replication tasks to maintain source and target database synchronization, ensuring data consistency over time.
- Diverse Source/Target Support
AWS DMS supports migrations ranging from like-to-like, such as MySQL to MySQL, to heterogeneous migrations across various platforms like Oracle to Amazon Aurora. It accommodates transfers between SQL, NoSQL, and text-based targets.
- Database Consolidation
Unify multiple source databases into a single target database effortlessly. This feature applies to both homogeneous and heterogeneous migrations across all supported database engines.
- Efficiency in Schema Conversion and Migration
Beyond data migration, AWS DMS requires minimal manual effort for tasks such as migrating users, stored procedures, triggers, schema conversion in heterogeneous migrations, and validating the target database against application functionality.
- Automated Provisioning with Terraform IAC
Harness the potential of Terraform/terragrunt for automated creation and destruction of AWS DMS replication tasks. Ideal for migrations involving multiple databases, this approach simplifies management and enhances scalability.
- Automated Pipeline Integration
Seamlessly integrate with CI/CD pipelines for comprehensive migration management across stages. This integration offers efficient monitoring and progress tracking, streamlining the migration process.
Incorporating the fusion of AWS DMS and Terraform IAC, these features redefine the data migration landscape, imparting agility, cost-effectiveness, and robustness to the journey of data transition.
The amalgamation of AWS Database Migration Service (AWS DMS) and Terraform Infrastructure as Code (IAC) emerged as a game-changing force and the white paper has intricately unveiled the synergy of these technologies.
The white paper serves as a guiding compass and explores synergy between AWS DMS and Terraform IAC. From understanding the core concepts to navigating complexities, these insights equip you with the tools to embark on migration journeys that are streamlined, secure, and poised for success.
As businesses navigate digital transformation, this collaboration promises a future where data migration becomes an avenue for optimization and growth, reshaping the data landscape with precision and efficiency.