Tag Archives: non-sas programmer

Using PROC PRINT to Validate Clinical Data

Using PROC PRINT to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC PRINT procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC PRINT procedure.
  • We will clean the data using data set steps with assignments and IF-THEN-ELSE statements.

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

In the example below, our lab data ranges table we find missing values. We also would like to update the lab test to UPPER case.

Clinical Raw data
Proc Print data val code
PROC PRINT output – data validation


From the screenshot above, our PROC PRINT program identified all missing / invalid values as per our specifications. We need to clean up 6 observations.

Cleaning Data Using Assignment Statements and If-Then-Else in SAS

We can use the data step to update the datasets/tables/domains when there is an invalid or missing data as per protocol requirements.

In our example, we have a lab data ranges for a study that has started but certain information is missing or invalid.

To convert our lab test in upper case, we will use an assignment statement. For the rest of the data cleaning, we will use IF statements.

Proc Print data cleaning








Data Validation and data cleaning final dataset








From our final dataset, we can verify that there are no missing values. We converted our labTest in uppercase and we updated the unit and  EffectiveEnddate to k/cumm and 31DEC2025 respectively.

You cannot use PROC PRINT to detect values that are not unique. We will do that in our next blog ‘Using PROC FREQ to Validate Clinical Data’. To find duplicates/remove duplicates, check out my previous post-Finding Duplicate data.

or use a proc sort data=<dataset> out=sorted nodupkey equals; by ID; run;

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn



From Non-SAS Programmer to SAS Programmer Part II

Previously, we wrote about how you can become a SAS Programmer with little or no programming background.

Today, I want to share a new link where you can download SAS Studio for free and practice. I have to give a thank to Andrew from statskom for the tip. Visit his blog for more SAS tips.

Here is a quick step on what you need in order to use the SAS University version for free provided by SAS:

1- Create a SAS profile and select the environment based on your operating system in order to download the SAS® University Edition. I  chose Oracle VirtualBox. The options available are: Oracle VirtualBox in Windows, Macintosh, and Linux operating environments.

2- You will receive an email where you can you download your SAS edition as per your selected environment on step 1. Click the link. It could take up to an hour for the entire program to download.

SAS University Edition

3-Go to https://www.virtualbox.org/wiki/Downloads to install the OracleVirtualBox.

4-Add the SAS University Edition vApp downloaded on step 2 to VirtualBox step 3.


5-Create a folder for your data and results.

6- Start the SAS University Edition vApp

7-Open the SAS University Edition by opening your web browser and typing  http://localhost:10080. From the the SAS University Edition: Information Center, click Start SAS Studio.

There you have it! You have now access to SAS and can start practicing your new programming language.

anayansigamboa sas studio anayansigamboa sas studio anayansigamboa sas studio anayansigamboa sas studio

For more information about the SAS University Edition, see the FAQs and videos at http://support.sas.com/software/products/university-edition/index.html.

For Data Management and EDC training, please contact RA eClinical Solutions.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica, Open Source and Oracle Clinica.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

SAS Proficiency Test

Here are some common SAS questions:

1- What happens if no VAR statement is used in PROC PRINT?

2- Understand how SAS processes a data step

3- Explain what a (data step) SELECT statement does

4- What is the default length of a numeric variable?

5-What is the difference between DO WHILE and DO UNTIL?

6-What does BY after SET A; do?

7-What does BY after SET A B; do?

8-What is a sub-setting IF statement?

If you pass it doesn’t mean you’re good; but if you’re good, you should pass

9- What’s the difference between INFILE and INPUT?

10-What information is given by a Proc Contents?

11-How do you modify the header?

12-Which elements can you change? (in reference to question 10-11)

13-Which elements can you NOT change without re-creating the dataset?(in reference to question 10-11)

14-What is the purpose of the single trailing @ sign?

15-What statements are used to include or exclude specific variables in a data step?

16-How can you generate a 2-dimensional cross-tabulation?

17-In the macro language, all macro commands begin with?

18-What occurs if multiple datasets are included in a SET statement?

19-What symbol is used to concatenate two character values?

20-Explain the substr function

21-What are the differences among these functions? Round(), Ceil(), Floor()

22-What happens if you use the same in= variable for 2 datasets?

23-How do you begin a data step if you do not want to create a SAS dataset?

24-What are the automatic variables first. and last. ?

25-What does a 2 level input dataset name such as MydataSet.Mydata indicate ?

26-What is the default libname, where one-level dataset names go?

27-What statement allows you to keep track of previous values of variables or keep a running total?

28-What happens if you retain the value of a variable in the incoming dataset?

29-When an invalid data field is encountered when inputting a numeric variable, what happens?

30-What’s the difference between $w. and $CHARw. informats?

31-Explain the _type_ variable generated by proc summary

32-What is the purpose of a %GLOBAL statement?

33-What do CALL SYMPUT and CALL SYMGET do?

34-What date is the reference date for calculating the value of a SAS date variable?

35-What is _n_?

There are 10 kinds of people in the world: those who understand binary, and those who don’t.

Those are SAS most common questions you will be asked during a SAS programming job interview or if you plan in taking the SAS based certification.

In future blogs, I will try to cover each of those questions individually with some demonstrations.

Good luck!

Your comments and questions are valued and encouraged.
Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica, Open Source and Oracle Clinical.


anayansi gamboa

From Non-SAS Programmer to SAS Programmer

SAS Programmers come from many different educational backgrounds. Many has started their careers as a Data Manager in a CRO environment and grew to become a SAS programmer. Others have gone to college and pursued degrees in math, statistics or computer science degree.

Do you have SAS Skills? First, you need to find out more about statistical programming desire skills and start to slowly learn what SAS programmers and statisticians do in the pharmaceutical industry. It is also important to understand the Drug Development and Regulatory process so that you have a better understanding of the industry as a whole as well as the drug approval process.

In addition, I have personally attended several workshop on Statistics for Non-statistician provided by several of my past employers/clients (GSK, Sanofi-Aventis, etc) so I could have a greater understanding of statistics role. I am personally more inclined to the EDC development than becoming a biostatistician but these are just some of the few steps you could take to grow your career as a SAS programmer.

Practice, Practice, Practice!

To begin learning how to actually program in SAS, it would be a good idea to enroll to a SAS course provided by the SAS Institute near you or via eLearning. I have taken the course SAS Programming 1: Essentials, and I would recommended. You could also join SUGI conferences and other user groups near your city/country. Seek every opportunity to help you gain further understanding on how to efficiently program in the pharmaceutical industry. It could well land you a Junior SAS programming position.

Transitioning to a SAS Programming role: Now that you have gotten your first SAS programming job, you will need to continue your professional development and attend additional training, workshops, seminars and study workgroup meetings. The SAS Institute provide a second level, more advance course Programming II: Manipulating Data with the Data Step, SAS Macro Language and SAS macro Programming Advanced topics. There are also SAS certifications courses available to help you prepare to become a SAS certified programmer.

There is a light at the end of the tunnel: Advance!

Your ongoing development will be very exciting and challenging. Continued attending SAS classes as needed and attending industry related conferences such as PharmaSUG to gain additional knowledge and insight on how to perform your job more effectively and efficiently.

As you can see, it is possible to ‘grow’ a SAS programmer from a non-programming background to an experience programmer. All of the classes, training, and projects you will work on are crucial in expanding your SAS knowledge and will allow you to have a very exciting career opportunity ahead of you.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.