Audience:
Admin
Application:
Application (Product) | Application Area | Release Number | Release Number |
Align |
Vault/VOF | N/A |
N/A |
Question:
How to find duplicate field values in a Veeva Align dataset containing more records than Microsoft Excel is capable of opening?
Answer:
This article demonstrates the Integration Status 2 (IS2) table as an example object from Veeva Align. This method can be used for other Veeva Align objects.
I. The Problem: Example IS2 record example with duplicate crm_record_id__aln
ID | crm_record_id__aln |
V2J0000001GS718 | 0JB1800000BrC7eGAF |
V2J0000001GS730 | 0JB1800000BrC7eGAF |
The first step is to extract all records of the specific object, in this case of the Integration Status 2 object.
II. Extracting IS2 records from Veeva Align using Vault Loader
It is recommended to extract large amounts of data using Vault Loader.
The Vault Loader Extract limit is set to 1 million in Align Vaults by default. The extract limit cannot be increased any further. If there are more than 1 million records, filters are required to be added into the where clause of the Vault Loader Extract. The Product team recommends using the preferred workaround of filtering results with the Where Clause field using the MAXROWS and SKIP VQL clauses: Using the Where Clause Field
We recommend alternative methods, such as using Scheduled Data Exports — as the object's initial export includes all records — or combining the batch files after extracting.
In the Extract tab of the Loader section, select the affected object, check Override Default Column Headers and select the desired fields. Having the ID, Name and CRM Record ID are required.
Click on Extract. The process of the job and the final .CSV result file can be downloaded from Admin -> Operations -> Job Status -> History section by locating the Vault Loader extract job.
III. The tool to use
When the amount of the extracted records is too large to be viewed in Excel, another tool is required to be able to perform Database operations. It is recommended to use a SQL database tool: as an example, DB Browser for SQLite will be demonstrated.
From the website of the software:
- Download the .zip (No Installer) Distribution (32 bit / 64 bit) and extract it.
- In the main folder, open DB Browser for SQLite.exe
- Click on New Database.
- Provide a name to the database, keep the default extension and save it. A popup window appears: Edit Table Definition
- Click on Cancel.
- Click on File -> Import -> Table from CSV File...
- Select the extracted .CSV File.
- Provide a name to the table (The default name is going to be the name of the file).
- Check Column names in first line
- Ensure to have UTF-8 selected
- Verify that the seperators are correct, by default the field seperator is comma and the quote character is double quotes
- Uncheck Trim Fields
Click OK.
Navigate to the Execute SQL tab.
The SQL query to get the duplicates:
SELECT id, tablename.crm_record_id__aln, count(tablename.id)
from tablename
GROUP by tablename.crm_record_id__aln
HAVING count(tablename.id)>1
Where tablename is the name of the table in DB Browser for SQL lite.
In this example, the id field has been added, which will show one of the duplicates.
IV. Records where the crm_record_id__aln is null
With DB Browser for SQLite, run the following query:
SELECT id, crm_record_id__aln from is2 WHERE crm_record_id__aln IS NULL
With Vault Loader, they can be extracted by entering the following filter to the Loader where clause: crm_record_id__aln = null
V. Result extraction
The result dataset of the query can be extracted by clicking on the following icon:
- DB Browser for SQLite is a Third Party Software and we recommend using the .zip (No Installer) distribution.
- The presented data is not customer specific.
Additional Resources:
Align Help Documentation:
Vault Help Documentation:
3rd Party Tool:
Knowledge Article: