Using PROC UNIVARIATE to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC  UNIVARIATE procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC UNIVARIATE procedure.
  • Find extreme values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC FREQ to find missing/unique values. Today, we will use PROC UNIVARIATE which is useful for finding data outliers, which are data that falls outside expected values.

proc univariate data=labdata nextrobs=10;

Lab data result using Univariate










For validating data, you will be more interested in the last two tables from this report. The missing values table shows that the variable LBRESULT has 260 missing values. There are 457 observations. The extreme observations table can tell us the lowest and highest values (possible outliers) from our dataset. The nextrobs=10 specify the number of extreme observations to display on the report. To suppress it use nextrobs=0.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn