Using PROC UNIVARIATE to Validate Clinical Data
When your data isn’t clean, you need to locate the errors and validate them. We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC UNIVARIATE procedure.
- First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
- Study your data
- Then validate using PROC UNIVARIATE procedure.
- Find extreme values
When you validate your data, you are looking for:
- Missing values
- Invalid values
- Out-of-ranges values
- Duplicate values
Previously, we used PROC FREQ to find missing/unique values. Today, we will use PROC UNIVARIATE which is useful for finding data outliers, which are data that falls outside expected values.
proc univariate data=labdata nextrobs=10;
For validating data, you will be more interested in the last two tables from this report. The missing values table shows that the variable LBRESULT has 260 missing values. There are 457 observations. The extreme observations table can tell us the lowest and highest values (possible outliers) from our dataset. The nextrobs=10 specify the number of extreme observations to display on the report. To suppress it use nextrobs=0.