Using PROC PRINT to Validate Clinical Data
When your data isn’t clean, you need to locate the errors and validate them. We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC PRINT procedure.
- First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
- Study your data
- Then validate using PROC PRINT procedure.
- We will clean the data using data set steps with assignments and IF-THEN-ELSE statements.
When you validate your data, you are looking for:
- Missing values
- Invalid values
- Out-of-ranges values
- Duplicate values
In the example below, our lab data ranges table we find missing values. We also would like to update the lab test to UPPER case.
From the screenshot above, our PROC PRINT program identified all missing / invalid values as per our specifications. We need to clean up 6 observations.
Cleaning Data Using Assignment Statements and If-Then-Else in SAS
We can use the data step to update the datasets/tables/domains when there is an invalid or missing data as per protocol requirements.
In our example, we have a lab data ranges for a study that has started but certain information is missing or invalid.
To convert our lab test in upper case, we will use an assignment statement. For the rest of the data cleaning, we will use IF statements.
From our final dataset, we can verify that there are no missing values. We converted our labTest in uppercase and we updated the unit and EffectiveEnddate to k/cumm and 31DEC2025 respectively.
You cannot use PROC PRINT to detect values that are not unique. We will do that in our next blog ‘Using PROC FREQ to Validate Clinical Data’. To find duplicates/remove duplicates, check out my previous post-Finding Duplicate data.
or use a proc sort data=<dataset> out=sorted nodupkey equals; by ID; run;