There are many different options to find a duplicate obs or data using SAS.
I have just received data from an external service provider and I want to compare it with my EDC lab data. My goal is to find any duplicates before is useful or before I can compare it with my lab data.
My external data has over 400 records so this will be a tedious job if I use excel (the original format). This review could take over a few hours, but with SAS, we can do this in less than 30 mins.
A simple SAS program:
over 400 records…
This is our SAS code:
This is our final output. We found the duplicates including empty rows.
It is important to note that not all duplicates can be deleted or found. Further review by your clinical team members is required. Once we are confident that duplicates have been removed, we know our data set is more accurate.
One last thing: How to get this data onto SAS so we can perform this task? Here’s a snippet:
- Import the file using PROC IMPORT (assuming you have DBMS ACCESS)
- You can use a permanent or temporary library. In this example, we created a temporary lib name = IMPORT
- We want to bring over the column names as var names using GETNAMES=YES statement
PROC IMPORT OUT= WORK.IMPORT
Now you have learned a quick way to find duplicates on your external data. I hope this piece of code will make any data manager job easier. Remember, you can always enhance this program (add labels, formats, etc) and improve efficiency.