Content Migration Step by Step and Tips for Success
Content migration can be one of the more complex, confusing, and left to the last second aspects of a Content Management project… in this instance, we will use Sitecore as an example of things you should be thinking of, alongside the tools to use, for a successful content migration. The steps and tips listed here apply to any CMS project and really focus to Enterprise Content Management best practices.
Contents
- 1 Content Migration vs. Content Synchronization
- 2 Content Migration Step by Step
- 2.1 Content Migration Tool Selection Rationale
- 2.2 Site Structure Extraction
- 2.3 Content Audit
- 2.4 Site Structure Mapping
- 2.5 Content Mapping and Content Transformation Process
- 2.6 Redirect Strategy
- 2.7 Content Migration Batch Strategy
- 2.8 Content Migration Test
- 2.9 Content Migration Execution by Batch
- 2.10 Final Content Migration Review
- 3 Content Migration Tips for Success
Content Migration vs. Content Synchronization
It’s important to differentiate between content migration and content synchronization. Roughly defined, content migration is the migration of content from another environment that may have differing content types, metadata, schema, site structure, and unstructured data. Examples of content migration cases are as follows:
- Migrating from a set of connected Sitecore 8.2 environments to a set of connected Sitecore 10.2 environments. A set of environments is defined as lower and upper environments in connection with each other such as Dev, QA, UAT, Staging, and Production
- Migrating from a set of connected Sitecore 9.3 environments to a set of connected Sitecore 9.3 environments with a differing architecture
Conversely, content synchronization is the process of synchronizing content between a set of connected environments that have closely equal to equal content types, metadata, schema, site structure, and structured data. Examples of content synchronization cases are as follows:
- Ensuring the content in a set of connected Sitecore 10.2 environments is synchronized across lower and upper environments
- Within the same set of connected environments, updating a lower environment with content from the production environment for the sake of development or testing
The slight deviation in “closely equal” vs. “equal” in content synchronization is because that during development activities in the new destination environment you could have a new content type or metadata field created as part of a new page or component, and then synchronized for content editors to leverage. That is a far different use case then a content migration from a previous Sitecore version/environment with differing content types, schema, and site structure requiring detailed analysis and transformation.
Content Migration Step by Step
The following are not exhaustive descriptions but provide the key steps you should be considering for content migration.
Content Migration Tool Selection Rationale
Selecting the right tool for your content migration is critical. Due the need to manually consider data transformations as per the content migration vs. content synchronization section, it will be necessary to extract, transform, and then import data.
Content Migration Tool Requirements
Given the need to analyze and most likely transform data prior to importing content into a new Sitecore environment, there are 3 primary requirements for a content migration tool as follows:
- Able to export content items (types) with the corresponding metadata for analysis outside of the migration tool from the source environment in CSV format. This is critical given the stakeholders required to analyze content types
- Ability to generate import files from the destination environment per content item (type) with the corresponding metadata to a CSV
- Ability to import content items (types) with the corresponding metadata from a CSV
Optional requirements/nice to have’s are as follows:
- Ability to view the source and destination Content Tree via a GUI interface
- Requires customization of the tool to execute a migration
Content Migration Tools to Consider
Razl
Razl is a Sitecore database comparison tool that allows a view of both source and destination Sitecore content trees. While it provides the ability to view differences and migrate data between source and destination, this is cumbersome for larger batches of items, especially where transforming data is concerned.
Sitecore Sidekick Content Migrator
Sitecore Sidekick Content Migrator is a subset of Sitecore Sidekick that allows content to be synchronized between Sitecore environments. It does not provide data editing or transformation capabilities.
URL: Sitecore Sidekick – Content Migrator | Bending Sitecore (jeffdarchuk.com)
Unicorn Sync
Unicorn Sync is a heavily used tool for synchronizing data between Sitecore environments, usually leveraged during a code release to move the “templates, renderings, and other database items between Sitecore instances”. It lacks editing and transformation capabilities.
Sitecore PowerShell Extensions
Sitecore PowerShell Extension (SPE) is a widely used tool that provides a PowerShell like CLI and scripting environment to automate tasks within a Sitecore environment. Using the SPE cmndlets such as Export-Item and Import-Item allows developers the capabilities to export and import items via CSV files and the supporting PowerShell.
URL: Introduction – Sitecore PowerShell Extensions
Sitecore Data Importer (Bundled with SPE)
Bundled with Sitecore PowerShell Extensions (SPE) and available as a standalone module, Sitecore Data Importer allows the export and import of Sitecore items via its interface inside of the Sitecore Desktop Mode.
Sitecore Content Export Tool
The Sitecore Content Export/Import Tools is a well-documented module whose explicit purpose is to export and import Sitecore items as CSV files. It also runs as a Sitecore Job to get around the Azure 230 second hard limit Azure has on long running processes.
URL: estockwell-alpert/ContentExportTool: Content Export Tool for Sitecore (github.com)
Content Migration Tool Closely Meeting Requirements
Given the requirements to export, transform, and import Sitecore items via an external file (CSV), the Sitecore Content Export Tool is one of my favorites. If it does not meet the business/security requirements of an organization, the Sitecore Data Importer, and finally scripting via Sitecore PowerShell Extensions can be leveraged. It’s important to note that all of these tools connect to the Sitecore API and will have a similar security framework.
Site Structure Extraction
Beyond content itself, it’s imperative to extract the site structure for the current Content Tree with the source environment to understand the hierarchy of where content resides. Beyond ensuring content types are captured, Sitecore leverages an inheritance model where metadata can be shared in a hierarchical manner.
Content Audit
A content audit is required to list all the content types in the source Sitecore environment, as well as any content residing outside of Sitecore that will be migrated into the destination Sitecore environment. Note that metadata per content type should be exported from Sitecore and saved as individual CSV files in preparation for transformation (if required) and migration to the destination Sitecore environment.
For content currently residing outside of Sitecore, CSV files should be created by content type in preparation for import into Sitecore. This is particularly important when considering unstructured data such as set of documents or images that need to be added to Sitecore.
Lastly, there may be a desire to create new content within Sitecore via import, such as a new content type with a large number of items. Following the previous methodology, each content type and its corresponding metadata needs to be captured in a CSV file per content type.
The listing of these content types, with relationships between other content types should be presented in a Content Model, inclusive of key details such as content types in scope for migration.
High Level Content Model
The following example high-level content model lists the content types, relationships, and other corresponding details. The complete content model consists of the “High-Level Content Model” and the CSV files per content type that include the metadata details. You can also create a highly detailed content model in diagram form once you have determined the relationships between content types and their metadata.
Content Type Name | Type of Content | CurrentContent Location | Structured Data (Y/N) | New Content (Y/N) | Related Content Types | In Scope for Migration (Y/N) |
CT1 | Page | Sitecore | Y | N | CT2, CT4 | Y |
CT2 | Component | Sitecore | Y | N | CT1 | Y |
CT3 | Document | Sitecore | Y | N | N | |
CT4 | Image | Sitecore | Y | N | CT2 | Y |
CT5 | Document | External | N | N | Y | |
CT6 | Page | Not Created Yet | Y | Y | Y |
Site Structure Mapping
As there are likely differences between the source Content Tree hierarchy and the destination Content Tree hierarchy, site structure mapping is the process of comparing both the source and destination to ensure that the destination hierarchy meets 1) the desired Content Tree structure and 2) includes the desired content types in their proper place within the hierarchy.
This is achieved by leveraging the Excel document created in the previous section named “Site Structure Extraction” and creating the destination site structure prior to comparing. Once the comparison is complete, the destination hierarchy may need to be changed to reflect the desired hierarchy, especially when it comes to content types and any inheritance within the Content Tree.
Note that in some cases, especially where practices for Enterprise Content Management were lacking or governance was not in place, it may be better to create a fresh Content Model and then map the data that will fill the new Content Tree from the source dataset.
Content Mapping and Content Transformation Process
With the Content Model complete from the Content Audit, this section focuses on mapping the content from the source environment to the destination environment inclusive of any content transformation such as additional fields/metadata, field value changes, and field type changes (ex. Boolean vs. string).
It’s important to note that some page content types in the source environment may be components in the destination environment, and vice versa. To that end, the mapping considers differing mappings and potential transformations such as page to page, page to component, document to document, etc.
The following steps create the mapped and transformed import files in CSV format in preparation for migration and assumes you are using the Sitecore Content Exporter:
- Create import spreadsheets in CSV format per content type (template) from the destination environment that contains the template fields (metadata) by creating an import spreadsheet from the “Download Sample CSV” option in the Content Import Section of the Content Export tool. It will populate the template path for you as this is required for import per item. This should be done per Content Type (template)
- Map each content type from the source environment to the corresponding content type in the destination environment
- Once mapped, copy in values from the source environment content type CSV file into the destination content type CSV being mindful that that not only field values may change or need to be added, but the field types may need to be updated (ex. Boolean vs. string values)
Redirect Strategy
Redirecting content that may in a new location due to site structure changes is bound to occur. You should carefully evaluate what and if content needs to be redirected. Use this post as a guide: Got Redirects? Successful Redirect Governance in Sitecore
Content Migration Batch Strategy
The batch strategy for content migration is focused on breaking the content migration into smaller batches for the purposes of ensuring testing by section or another definable unit, such as a sub site. Specific guidelines could be as follows:
- Batches should take no longer than 8 hours to execute migration per batch
- Batches should be in a unit that is easily identifiable (ex. A site section or sub site) and testable within 8 hours post batch migration
- Batches should be tested where each content type is validated as successfully migrated within the Content Tree in the CMS and viewable on page per item, document, or component… as one content type could have multiple items serving pages, documents, components, etc.
- Batch per sub site due to the amount of interrelated content and templates required to render on a page (ex. Card Types and Containers within a Page that has other elements)
Content Migration Test
A content migration test is a critical part of the overall content migration strategy. This section details key aspects for performing a content migration test.
Goal of Content Migration Testing
The goals of content migration testing are as follows:
- Validate that the content migration tool meets the content migration requirements
- Validate that content migration process is governable, inclusive of testing and validation
- Inform if the batch strategy is effective or needs to be changed (ex. Broken into more batches or perhaps less with a differing unit to batch such as a site section instead of a sub site)
- Estimate the time it takes to complete a batch based on the combination of actual import/migration time and validation process time
Content Migration Batch Content Types and Size
The content migration batch size should be a representative amount of content types per a unit defined as part of the content migration strategy. As an example, if you have selected a batch strategy to migrate by sub site, you would a test that includes all the content types associated with the sub site.
The depth (size) of content per content type does not need to include all data, but rather 10-15 items per content type to ensure schema and values migrate as intended.
Content Migration Test Validation
Per the “Content Migration Batch Strategy”, validation is achieved viewing migrated content types with their corresponding items within the Content Tree in the CMS and viewable on page per item, document, or component… as one content type could have multiple items serving pages, documents, components, etc. It’s imperative to ensure the metadata values and expected field types were successfully migrated.
Content Migration Execution by Batch
This section lists the batches to be executed in the content migration. Please note the importance of validation and ensure that there is plenty of time in between batches for testing/validation by checking the migrated content type in the destination CMS Content Tree, as well as on page, with particular attention to ensuring the appropriate metadata fields, field types, and values carried over via the migration.
If “Batch #” is not a detailed enough naming, feel free to add name as a suffix for the unit that the batch is tied to (ex. Site section, sub site, etc.). An example name could be “Batch 1 – Sub Site Name”. Repeat the chart for all the batches that will be migrated.
Batch 1
Content Type Name | Date/Time to Execute | Executed | Validated | Notes |
Batch 2
Content Type Name | Date/Time to Execute | Executed | Validated | Notes |
Final Content Migration Review
Post completion of all batches, validate the hierarchy and content types within the destination CMS to ensure the whole site structure is reflective of all the desired content types and the site structure is in the correct order. Given the rigor of content mapping/transformation and site structure mapping, it should be in alignment, especially with per batch validation… but this final step ensures the “whole view” is evaluated prior to going live and closing out the content migration.
Content Migration Tips for Success
The following tips help your content migration be successful… and with the least amount of pain possible:
- Start early in your project and don’t wait until the very end to begin content migration
- Gather a team that includes your Sitecore Administrator and Lead Sitecore Content Editor
- Don’t forget about unstructured content residing outside of Sitecore (ex. links to documents) and the need to structure what has previously been unstructured
- Don’t feel stuck in your existing Sitecore Content Tree/Hierarchy… rather consider a fresh taxonomy as appropriate
- Evaluate all of your templates and fields to determine if you have the proper base templates… and what templates need to be removed or new ones created as part of the migration
- Determine and maintain a regular Sitecore upgrade cycle… as this will keep you evaluating your content on a regular basis and make future migrations less painful/potentially a synchronization
- Test the content migration tool(s) you are using before migrating any content to ensure they are meeting your criteria and your CMS can handle the load (ex. a timeout on a long running content export would be problematic)
- Content mapping/transformation can be tedious, so take breaks and make it as “fun” as possible by trying things like teaming up in a “Content Migration Escape Room” (aka, company conference room) where each section mapped equals an escape
- Don’t underestimate the level of effort required for content migration… but don’t overestimate it either. If you document and follow a process, you will start seeing the benefits typically around the batch testing phase of the migration
- Don’t take shortcuts as you need to embrace Enterprise Content Mangement best practices to reduce the technical debt unstructured and ad hoc content schema can cause
This sounds super familiar…