A quick way to find duplicates from external lab data

There are many different options to find a duplicate obs or data using SAS.

I have just received data from an external service provider and I want to compare it with my EDC lab data. My goal is to find any duplicates before is useful or before I can compare it with my lab data.

My external data has over 400 records so this will be a tedious job if I use excel (the original format). This review could take over a few hours, but with SAS, we can do this in less than 30 mins.

A simple SAS program:
over 400 records…

This is our SAS code:

SAS code to find duplicate data from an external source

This is our final output. We found the duplicates including empty rows.

external data
external data, lab data









It is important to note that not all duplicates can be deleted or found. Further review by your clinical team members is required. Once we are confident that duplicates have been removed, we know our data set is more accurate.

One last thing: How to get this data onto SAS so we can perform this task? Here’s a snippet:

  1. Import the file using PROC IMPORT (assuming you have DBMS ACCESS)
  2. You can use a permanent or temporary library. In this example, we created a temporary lib name = IMPORT
  3. We want to bring over the column names as var names using GETNAMES=YES statement

DATAFILE= “H:\Labs\LocalLab\LabDataReview.xls”

Now you have learned a quick way to find duplicates on your external data. I hope this piece of code will make any data manager job easier. Remember, you can always enhance this program (add labels, formats, etc) and improve efficiency.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

Published by AnayansiVanDerBerg

“I am someone who influences my own development. I look for a company where I have the opportunity to pursue my interests across functions and geographies, and where a job title is not considered the final definition of who I am, but the starting point.”

One thought on “A quick way to find duplicates from external lab data

  1. What to do when you find duplicates on your dataset?
    Step 1 – determine why you have duplicates. Is the variable supposed to be unique? Check your annotated CRF or study design documentation.
    step 2 – eleminate the duplicates. You can do this by using the PROC SORT procedure.

Comments are closed.

%d bloggers like this: