Using PROC FREQ to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC FREQ procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC FREQ procedure.
  • Spot distinct values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC PRINT to find missing/invalid values. Today, we will use PROC FREQ  to view a frequency table of the unique values for a variable. The TABLES statement in a PROC FREQ step specified which frequency tables to produce.

proc freq data=labdataranges nlevels;
table _all_ / noprint;

So how many unique lab test do we have on our raw data file? We know that our sas data set has 12 records. The Levels column from this report,  the labtest=3 uniques. Which means, we must have 9 duplicates labtest in total. For this type of data [lab ranges] though, this is correct. We are using it as an example as you can check any type of data.

Proc Freq sas




Lab test data ranges













So remember, to view the distinct values for a variable, you use PROC FREQ that produces frequency tables (nway/one way) . You can view the frequency, percent, cumulative frequency, and cumulative percentage. With the NLEVELS options, PROC FREQ displays a table that provides the number of distinct values for each variable name in the table statement.

Example: SEX variable has the correct values F or M as expected; however, it is missing for two observations.

Missing values proc freq






To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn