How to find duplicate field values in a Veeva Align dataset containing more records than Microsoft Excel is capable of opening?
This article demonstrates the Integration Status 2 (IS2) table as an example object from Veeva Align. This method can be carried out to any other Veeva Align objects.
I. The Problem: IS2 record example as to how the duplication looks like
Any records where the crm_record_id__aln is null are also considered duplicates in some cases, depending on the IS2 action. In the event there are such records, these have to be analyzed independently, and requires intermediate understanding of Veeva Align.
The first step is to extract all records of the specific object, in this case of the Integration Status 2 object.
II. Extracting IS2 records from Veeva Align using Vault Loader
It is recommended to extract large amounts of data using Vault Loader.
The Vault Loader Extract limit is set to 1 million in Align Vaults by default. In case, there are more than 1 million records, raise a Veeva Support ticket to increase it to 10 million. The extract limit cannot be increased any further. If there are more than 10 million records, filters are required to be added into the where clause of the Vault Loader Extract.
In the Extract tab of the Loader section, select the affected object, check Override Default Column Headers and select the desired fields. Having the ID, Name and CRM Record ID are required.
Click on Extract. The process of the job and the final .CSV result file can be downloaded from Admin -> Operations -> Job Status -> History section by locating the Vault Loader extract job.
III. The tool to use
In the event, the amount of the extracted Integration Status 2 records is too large, the data cannot be manipulated or viewed using Microsoft Excel.
For this reason, another tool is required to be able to perform Database queries. It is recommended to use SQL tools, specifically in this case, DB Browser for SQLite will be demonstrated.
From the website of the software:
- Download the .zip (No Installer) Distribution (32 bit / 64 bit) and extract it.
- In the main folder, open DB Browser for SQLite.exe
- Click on New Database.
- Provide a name to the database, keep the default extension and save it. A popup window appears: Edit Table Definition
- Click on Cancel.
- Click on File -> Import -> Table from CSV File...
- Select the extracted .CSV File.
- Provide a name to the table (The default name is going to be the name of the file).
- Check Column names in first line
- Ensure to have UTF-8 selected
- Verify that the seperators are correct, by default the field seperator is comma and the quote character is double quotes
- Uncheck Trim Fields
Navigate to the Execute SQL tab.
The SQL query to get the duplicates:
SELECT id, tablename.crm_record_id__
GROUP by tablename.crm_record_id__aln
Where tablename is the name of the table in DB Browser for SQL lite.
In this example, the id field has been added, which will show one of the duplicates.
IV. Records where the crm_record_id__aln is null
With DB Browser for SQLite, run the following query:
SELECT id, crm_record_id__aln from is2 WHERE crm_record_id__aln IS NULL
With Vault Loader, they can be extracted by entering the following filter to the Loader where clause: crm_record_id__aln = null
V. Result extraction
The result dataset of the query can be extracted by clicking on the following icon:
- DB Browser for SQLite is a Third Party Software and we recommend using the .zip (No Installer) distribution. The software package is completely safe, harmless, and tested.
- The presented data is not customer specific.
Align Help Documentation:
3rd Party Tool: