Using PROC FREQ to Validate Clinical Data
When your data isn’t clean, you need to locate the errors and validate them. We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC FREQ procedure.
- First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
- Study your data
- Then validate using PROC FREQ procedure.
- Spot distinct values
When you validate your data, you are looking for:
- Missing values
- Invalid values
- Out-of-ranges values
- Duplicate values
Previously, we used PROC PRINT to find missing/invalid values. Today, we will use PROC FREQ to view a frequency table of the unique values for a variable. The TABLES statement in a PROC FREQ step specified which frequency tables to produce.
proc freq data=labdataranges nlevels;
table _all_ / noprint;
So how many unique lab test do we have on our raw data file? We know that our sas data set has 12 records. The Levels column from this report, the labtest=3 uniques. Which means, we must have 9 duplicates labtest in total. For this type of data [lab ranges] though, this is correct. We are using it as an example as you can check any type of data.
So remember, to view the distinct values for a variable, you use PROC FREQ that produces frequency tables (nway/one way) . You can view the frequency, percent, cumulative frequency, and cumulative percentage. With the NLEVELS options, PROC FREQ displays a table that provides the number of distinct values for each variable name in the table statement.
Example: SEX variable has the correct values F or M as expected; however, it is missing for two observations.