Tag Archives: common sas errors

Using PROC PRINT to Validate Clinical Data

Using PROC PRINT to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC PRINT procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC PRINT procedure.
  • We will clean the data using data set steps with assignments and IF-THEN-ELSE statements.

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

In the example below, our lab data ranges table we find missing values. We also would like to update the lab test to UPPER case.

Clinical Raw data
Proc Print data val code
PROC PRINT output – data validation


From the screenshot above, our PROC PRINT program identified all missing / invalid values as per our specifications. We need to clean up 6 observations.

Cleaning Data Using Assignment Statements and If-Then-Else in SAS

We can use the data step to update the datasets/tables/domains when there is an invalid or missing data as per protocol requirements.

In our example, we have a lab data ranges for a study that has started but certain information is missing or invalid.

To convert our lab test in upper case, we will use an assignment statement. For the rest of the data cleaning, we will use IF statements.

Proc Print data cleaning








Data Validation and data cleaning final dataset








From our final dataset, we can verify that there are no missing values. We converted our labTest in uppercase and we updated the unit and  EffectiveEnddate to k/cumm and 31DEC2025 respectively.

You cannot use PROC PRINT to detect values that are not unique. We will do that in our next blog ‘Using PROC FREQ to Validate Clinical Data’. To find duplicates/remove duplicates, check out my previous post-Finding Duplicate data.

or use a proc sort data=<dataset> out=sorted nodupkey equals; by ID; run;

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn


Five (5) Types of Errors When Writing SAS Programs

Fortunately, SAS system fixes some mistakes made by SAS programmers. For example, SAS has gotten so smart over the years that it is now almost impossible to get an error by misspellilng a keyword. If you misspelled a keyword in a SAS program, SAS will almost always figure out what you meant to say and run the statement correctly in spite of your poor spelling skills. But SAS cannot fix all programming errors for you, so here are some of the most comment errors and how to debug them.

Syntax = compilation time errors
For example: missing semicolon [proc means data=work.demog run;]

Semantic = compile time error when the language element is correct, but the element might not be valid
For example: DATA step procedures wrong results but no error message

Execution-time = when SAS attemps to execute a program and execution fails

Data = execution time error when data values are invalid
For example: missing values were generated, numeric to character conversion, invalid data or character field is truncated

Macro-related = when you use the macro facility incorrectly

The most important rule in debugging SAS programs is to always check the SAS log. It is important to review the log messages each time you submit your program. To review the log, check at the top for messages such as ERRORS, WARNING or NOTES.

WARNING: The data set WORK.DEMOG may be incomplete. When this step was stopped there were 0 observations and 5 variables.

This message tell you that SAS did run a DATA step or able to peform the action, but for some reason there are zero observations. This could be a non-issue, but generally speaking when you go to the trouble of creating a data set, you want some data in it.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.DEMOG has 2 observations and 2 variables.

Notes are simple there to inform you of the status of your program. If you were expecting 27 observations, one for each input record, then this would tell you that something went wrong. Notes can also be useful to streamline codes but writing more efficient programs. For example, if the run-time (total process time) of a report takes too long to run then this is another way to check your code.

A missing semi-colon can be notorious for misleading error messages. The compiler depends on a sequence of key words to identify the type of statement. If you leave out a semi-colon then you hide the key word of the next satement. The compiler is likely to find something wrong, but it is usually not the real mistake – the missing semicolon. Hence the errors and warnings are just hints about what the compiler is seeing instead of the underlying problem.

One final note, you can insert a PUTLOG statements to check to idenfity error(s):

data demog;
set edc.demog_summary;
by patient_id;
if first.patient_id=1 then race=’white’;
putlog race=;

proc print data=demog;

The DATA step debugger offers SAS programmers a new way to investigae logic errors. Since SAS runs programs in two phases, SAS compiles it then executes the program. To invoke the debugger, add / DEBUG to the end of your DATA statement and then run your DATA step.

If we modify the previous DATA step:

data demog / DEBUG;
set edc.demog_summary;
by patient_id;
if first.patient_id=1 then race=’white’;

After you submit the above code, two windows appear: the DEBUGGER LOG window and the DEBUGGER SOURCE window. As you may have imagined, the DEBUGGER LOG window contains messages from the debugger and command line. The SOURCE window contains your DATA step statements with current line highlighted. SAS executes each line of your program for the first observation, then returns to the top of the DATA step for the second observation and so on.

As you can see, there are many ways to check your SAS programs for errors, even when the ouptput looks fine. Notes are just as important as warnings and error messages. I strongly recommend that you learn how to use the debugger as it can save lots of time when debugging your program!

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.