Tag Archives: SAS training

Using PROC PRINT to Validate Clinical Data

Using PROC PRINT to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC PRINT procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC PRINT procedure.
  • We will clean the data using data set steps with assignments and IF-THEN-ELSE statements.

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

In the example below, our lab data ranges table we find missing values. We also would like to update the lab test to UPPER case.

Clinical Raw data
Proc Print data val code
PROC PRINT output – data validation

 

From the screenshot above, our PROC PRINT program identified all missing / invalid values as per our specifications. We need to clean up 6 observations.

Cleaning Data Using Assignment Statements and If-Then-Else in SAS

We can use the data step to update the datasets/tables/domains when there is an invalid or missing data as per protocol requirements.

In our example, we have a lab data ranges for a study that has started but certain information is missing or invalid.

To convert our lab test in upper case, we will use an assignment statement. For the rest of the data cleaning, we will use IF statements.

Proc Print data cleaning

 

 

 

 

 

 

 

Data Validation and data cleaning final dataset

 

 

 

 

 

 

 

From our final dataset, we can verify that there are no missing values. We converted our labTest in uppercase and we updated the unit and  EffectiveEnddate to k/cumm and 31DEC2025 respectively.

You cannot use PROC PRINT to detect values that are not unique. We will do that in our next blog ‘Using PROC FREQ to Validate Clinical Data’. To find duplicates/remove duplicates, check out my previous post-Finding Duplicate data.

or use a proc sort data=<dataset> out=sorted nodupkey equals; by ID; run;

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

 

SAS Proficiency Test

Here are some common SAS questions:

1- What happens if no VAR statement is used in PROC PRINT?

2- Understand how SAS processes a data step

3- Explain what a (data step) SELECT statement does

4- What is the default length of a numeric variable?

5-What is the difference between DO WHILE and DO UNTIL?

6-What does BY after SET A; do?

7-What does BY after SET A B; do?

8-What is a sub-setting IF statement?

If you pass it doesn’t mean you’re good; but if you’re good, you should pass

9- What’s the difference between INFILE and INPUT?

10-What information is given by a Proc Contents?

11-How do you modify the header?

12-Which elements can you change? (in reference to question 10-11)

13-Which elements can you NOT change without re-creating the dataset?(in reference to question 10-11)

14-What is the purpose of the single trailing @ sign?

15-What statements are used to include or exclude specific variables in a data step?

16-How can you generate a 2-dimensional cross-tabulation?

17-In the macro language, all macro commands begin with?

18-What occurs if multiple datasets are included in a SET statement?

19-What symbol is used to concatenate two character values?

20-Explain the substr function

21-What are the differences among these functions? Round(), Ceil(), Floor()

22-What happens if you use the same in= variable for 2 datasets?

23-How do you begin a data step if you do not want to create a SAS dataset?

24-What are the automatic variables first. and last. ?

25-What does a 2 level input dataset name such as MydataSet.Mydata indicate ?

26-What is the default libname, where one-level dataset names go?

27-What statement allows you to keep track of previous values of variables or keep a running total?

28-What happens if you retain the value of a variable in the incoming dataset?

29-When an invalid data field is encountered when inputting a numeric variable, what happens?

30-What’s the difference between $w. and $CHARw. informats?

31-Explain the _type_ variable generated by proc summary

32-What is the purpose of a %GLOBAL statement?

33-What do CALL SYMPUT and CALL SYMGET do?

34-What date is the reference date for calculating the value of a SAS date variable?

35-What is _n_?

There are 10 kinds of people in the world: those who understand binary, and those who don’t.

Those are SAS most common questions you will be asked during a SAS programming job interview or if you plan in taking the SAS based certification.

In future blogs, I will try to cover each of those questions individually with some demonstrations.

Good luck!

Your comments and questions are valued and encouraged.
Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica, Open Source and Oracle Clinical.

 

anayansi gamboa

SAS for True Beginners Part 2

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: Wilton Graves, University of Georgia

SAS for True Beginners Part 1

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: Wilton Graves, University of Georgia