Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Colin Maitland, 04 August 2013

When using the Data Import Wizard in Microsoft Dynamics CRM 2011, duplicate detection rules may be used to deduplicate data during the data import. In this blog I will describe how this may be done.

The following two screenshots show a sample combined set of Contacts and Accounts to be imported. The highlighting shows that some of the Accounts, e.g. Alpine Ski House, A. Datum Corporation and Coho Winery, are duplicated because they are related to more than one Contact.

Contacts and Accounts

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Prior to running the data import, the combined list of Contact and Account records may be split into two separate lists. 

Contacts

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Accounts

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules 

Because all we have done is split the original list of records, the list of Accounts still contains duplicates for Alpine Ski House, A. Datum Corporation and Coho Winery.

When there are a small number of records, such as with this example, it is very easy and takes little time, to identify and manually remove duplicates. However, when working with a large number of records the task of manually identifying and removing duplicates is not as easy and takes more time.

A simple solution is to use Duplicate Detection Rules, in conjunction with the Data Import Wizard, to remove the duplicate records. This method only applies when importing the Accounts and Contacts as separate imports.

The following steps demonstrate the process:

Step 1: Enable Duplicate Detection During Data Import 

Select Settings, Data Management, Duplicate Detection Settings and ensure that the Enable Duplicate Detection During Data Import option is selected.

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules 

Step 2: Configure and Publish Duplicate Detection Rules 

Select Settings, Data Management, Duplicate Detection Rules and ensure that appropriate Duplicate Detection Rules have been created and published for the record types to be deduplicated.

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

The most appropriate Duplicate Detection Rule for this example is the Accounts with the same Name rule.

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Care should be taken to ensure that the impact of all published Duplication Detection Rules for the selected record type is understood. In this example, retaining the ‘…same Account Number’, ‘…E-mail Address’, ‘…Phone Number’ and ‘…Website’ Duplicate Detection Rules will not cause unwanted duplicate detections. However, retaining the use of other Duplicate Detection Rules, if any, such as ‘…same City’, which does not exist in this example, may cause unwanted duplicate detections. If required, unwanted Duplicate Detection Rules may be unpublished prior to the data import and then published again afterward the data import.

Step 3: Import Data Using Data Import Wizard 

Use the Data Import Wizard to import the records and ensure that the Allow Duplicates option is set to No on the Review Settings and Import Data screen as shown in the sixth screenshot below.

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Step 4: Review Import Failures

When the data import is completed you may review the Import Failures log to see a list of the records that were not imported because they are duplicates. The following error will be displayed for each, “A record was not created or updated because a duplicate of the current record already exists”

The following screenshots show that the records on rows 6 (Alpine Ski House), 11 (A. Datum Corporation) and 13 (Coho Winery) were not imported.

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

The highlighting in the following screenshot shows which of these records were imported (GREEN) and which were not (YELLOW), i.e. Coho Winery, Alpine Ski House and A. Datum Corporation on rows 2, 5 and 7 were imported but the duplicates of these on rows 6, 11 and 13 were not.

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

The following screenshot shows the Accounts imported into Microsoft Dynamics CRM:

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

In this example, after the Accounts were imported, the Contacts were then also imported. These are shown in the following screenshot:

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

As mentioned previously, a limitation of this method, is that it does not work when combining several import files (such as Accounts.xml and Contacts.xml) into a single Zip file (such as Accounts and Contacts.zip) for import.

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules 

In this, example, the six Contacts related to the three duplicate Accounts, are not imported when they should be.

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules 

This is because the data import process attempts to match the Contacts to Accounts prior to the Accounts being deduplicated and so a, ‘A duplicate lookup reference was found’, error occurs.

 Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules 

Finally, in this example, it would be desirable to ensure that both the Primary Contact and the Parent Customer relationships are retained. This is only possible by importing the Accounts and Contacts from a single Zip file rather than as two separate imports.