Category Archives: Tips

Using PROC UNIVARIATE to Validate Clinical Data

Using PROC UNIVARIATE to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC  UNIVARIATE procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC UNIVARIATE procedure.
  • Find extreme values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC FREQ to find missing/unique values. Today, we will use PROC UNIVARIATE which is useful for finding data outliers, which are data that falls outside expected values.

proc univariate data=labdata nextrobs=10;
var LBRESULT;
run;

Lab data result using Univariate

 

 

 

 

 

 

 

 

 

For validating data, you will be more interested in the last two tables from this report. The missing values table shows that the variable LBRESULT has 260 missing values. There are 457 observations. The extreme observations table can tell us the lowest and highest values (possible outliers) from our dataset. The nextrobs=10 specify the number of extreme observations to display on the report. To suppress it use nextrobs=0.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

 

Advertisements

Using PROC FREQ to Validate Clinical Data

Using PROC FREQ to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC FREQ procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC FREQ procedure.
  • Spot distinct values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC PRINT to find missing/invalid values. Today, we will use PROC FREQ  to view a frequency table of the unique values for a variable. The TABLES statement in a PROC FREQ step specified which frequency tables to produce.

proc freq data=labdataranges nlevels;
table _all_ / noprint;
run;

So how many unique lab test do we have on our raw data file? We know that our sas data set has 12 records. The Levels column from this report,  the labtest=3 uniques. Which means, we must have 9 duplicates labtest in total. For this type of data [lab ranges] though, this is correct. We are using it as an example as you can check any type of data.

Proc Freq sas

 

 

 

Lab test data ranges

 

 

 

 

 

 

 

 

 

 

 

 

So remember, to view the distinct values for a variable, you use PROC FREQ that produces frequency tables (nway/one way) . You can view the frequency, percent, cumulative frequency, and cumulative percentage. With the NLEVELS options, PROC FREQ displays a table that provides the number of distinct values for each variable name in the table statement.

Example: SEX variable has the correct values F or M as expected; however, it is missing for two observations.

Missing values proc freq

 

 

 

 

 

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

Using PROC PRINT to Validate Clinical Data

Using PROC PRINT to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC PRINT procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC PRINT procedure.
  • We will clean the data using data set steps with assignments and IF-THEN-ELSE statements.

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

In the example below, our lab data ranges table we find missing values. We also would like to update the lab test to UPPER case.

Clinical Raw data
Proc Print data val code
PROC PRINT output – data validation

 

From the screenshot above, our PROC PRINT program identified all missing / invalid values as per our specifications. We need to clean up 6 observations.

Cleaning Data Using Assignment Statements and If-Then-Else in SAS

We can use the data step to update the datasets/tables/domains when there is an invalid or missing data as per protocol requirements.

In our example, we have a lab data ranges for a study that has started but certain information is missing or invalid.

To convert our lab test in upper case, we will use an assignment statement. For the rest of the data cleaning, we will use IF statements.

Proc Print data cleaning

 

 

 

 

 

 

 

Data Validation and data cleaning final dataset

 

 

 

 

 

 

 

From our final dataset, we can verify that there are no missing values. We converted our labTest in uppercase and we updated the unit and  EffectiveEnddate to k/cumm and 31DEC2025 respectively.

You cannot use PROC PRINT to detect values that are not unique. We will do that in our next blog ‘Using PROC FREQ to Validate Clinical Data’. To find duplicates/remove duplicates, check out my previous post-Finding Duplicate data.

or use a proc sort data=<dataset> out=sorted nodupkey equals; by ID; run;

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

 

Count the number of discrepancies per procedure – OracleClinical (OC)

Let’s now write a quick program to count the number of discrepancies per procedure in OC/OCRDC:

Remember to comment /**/ or ***comment here*; what the program does. It is a good clinical practice to document everything so anyone can read your program and make the necessary updates, if necessary.

proc sql;
connect to oracle(path=ocpath);
create table discr as select * from connection to oracle
(Select  p.name, pd.test_order_sn detail, count(pd.test_order_sn) count, p.procedure_id procid
from discrepancy_management dm,
procedures p,
procedure_details pd
where dm.clinical_study_id=9999
and dm.procedure_id = p.procedure_id
and dm.procedure_detail_id=pd.procedure_detail_id
and p.PROCEDURE_VER_SN=pd.PROCEDURE_DETAIL_PROC_VER_SN
and dm.PROCEDURE_VER_SN=p.PROCEDURE_VER_SN
and dm.de_sub_TYPE_CODE=’MULTIVARIATE’
group by p.name, pd.test_order_sn, p.procedure_id
order by count(p.name)desc
);
/*document your code*/
proc sql;
connect to oracle(path=ocpath);
create table name as select * from connection to oracle
(select distinct p.procedure_id procid, p.name, pd.TEST_ORDER_SN detail
from  procedures p,
procedure_details pd
where p.clinical_study_id= 9999 *replace with your studyid;
and p.procedure_status_code !=’R’
and p.procedure_id=pd.procedure_id
order by procid
);
quit;

/* merge # of discrepancies with name */
proc sort data=discr;
by procid;
run;

proc sort data=name;
by procid;
run;

data discname;
merge discr (in=d) name (in=n);
by procid;
if n;
run;

proc sort data=discname ;
by descending count ;
run;

/* print out  */
proc print data=discname label;
var name numdisc percent numdcf;
label numdisc = ‘Number of discrepancies’
numdcf = ‘Number of DCFs’;
title “Number of discrepancies per Procedure”;
title2 “RA eClnica”;
run;

You could also export the report to Excel xls and have your DM / data manager review it.

Good luck and let me know if it was helpful.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

A quick way to find duplicates from external lab data

There are many different options to find a duplicate obs or data using SAS.

I have just received data from an external service provider and I want to compare it with my EDC lab data. My goal is to find any duplicates before is useful or before I can compare it with my lab data.

My external data has over 400 records so this will be a tedious job if I use excel (the original format). This review could take over a few hours, but with SAS, we can do this in less than 30 mins.

A simple SAS program:
over 400 records…

This is our SAS code:

SAS code to find duplicate data from an external source

This is our final output. We found the duplicates including empty rows.

external data
external data, lab data

 

 

 

 

 

 

 

 

It is important to note that not all duplicates can be deleted or found. Further review by your clinical team members is required. Once we are confident that duplicates have been removed, we know our data set is more accurate.

One last thing: How to get this data onto SAS so we can perform this task? Here’s a snippet:

  1. Import the file using PROC IMPORT (assuming you have DBMS ACCESS)
  2. You can use a permanent or temporary library. In this example, we created a temporary lib name = IMPORT
  3. We want to bring over the column names as var names using GETNAMES=YES statement

PROC IMPORT OUT= WORK.IMPORT
DATAFILE= “H:\Labs\LocalLab\LabDataReview.xls”
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
RUN;

Now you have learned a quick way to find duplicates on your external data. I hope this piece of code will make any data manager job easier. Remember, you can always enhance this program (add labels, formats, etc) and improve efficiency.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

eClinical Training

eClinical Training

Training—Make the most of affordable, informative, educational events to polish your professional skills.

Electronic Data Capturing

Medidata Rave
Web-based Data Management System for the collection of clinical trials data.
Our consultants are available short- and long-term projects.
Credentials:
– Medidata Rave Certified
– 4+ years experience in study build
– Custom Functions / C# Developers
– Crystal Reports Developers / JReview Developers

InForm Architect / Central Designer / InForm EDC
Learn InForm at your own paced or via online

Data Management Training 101
For entry levels and those seeking to enter the clinical research industry.

BASE SAS Training 101
Interested in learning SAS for clinical programmers? Coming soon…Contact us for more details…

One-on-One Training
Interested in learning RAVE / InForm / Oracle Clinical? Contact us for more details…

From Non-SAS Programmer to SAS Programmer Part II

Previously, we wrote about how you can become a SAS Programmer with little or no programming background.

Today, I want to share a new link where you can download SAS Studio for free and practice. I have to give a thank to Andrew from statskom for the tip. Visit his blog for more SAS tips.

Here is a quick step on what you need in order to use the SAS University version for free provided by SAS:

1- Create a SAS profile and select the environment based on your operating system in order to download the SAS® University Edition. I  chose Oracle VirtualBox. The options available are: Oracle VirtualBox in Windows, Macintosh, and Linux operating environments.

2- You will receive an email where you can you download your SAS edition as per your selected environment on step 1. Click the link. It could take up to an hour for the entire program to download.

SAS University Edition

3-Go to https://www.virtualbox.org/wiki/Downloads to install the OracleVirtualBox.

4-Add the SAS University Edition vApp downloaded on step 2 to VirtualBox step 3.

OracleVM

5-Create a folder for your data and results.

6- Start the SAS University Edition vApp

7-Open the SAS University Edition by opening your web browser and typing  http://localhost:10080. From the the SAS University Edition: Information Center, click Start SAS Studio.

There you have it! You have now access to SAS and can start practicing your new programming language.

anayansigamboa sas studio anayansigamboa sas studio anayansigamboa sas studio anayansigamboa sas studio

For more information about the SAS University Edition, see the FAQs and videos at http://support.sas.com/software/products/university-edition/index.html.

For Data Management and EDC training, please contact RA eClinical Solutions.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica, Open Source and Oracle Clinica.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Society of the Sojourner – Part II

Well it’s been quite a journey since my last post – sojourner. To sum it up I would say that I have come to the realisation that the world we live in, with its social structures that we have come accustomed to, are quite different than what they appear.

Anayansi decided to change her name to “Sojourner” literally because of her belief.  She openly believed that God told her to change her name, and it is so interesting how her name (Anayansi) has a profound meaning [She was an indigenous woman, a woman in love, who opened her heart to the blond conquistador with his steel armour, newly arrived to win glory in the Isthmus of Panama-KEY TO HAPPINESS]. She changed her name to “Sojourner” because she planned to travel.

anayansigamboa sojourner Panama
Images courtesy of Google -Panama Kuna Island, San Blas

As for my story, what I feared in 2007/2008 when I began to awake to the political and social turmoil going on in the United States of America is now coming true, I decided to return to my home country.  I had a fear then for the lack of respect for the sovereignty of individual human beings. Government has a monopoly on human beings. Without a birth certificate or passport or ID, you are nobody even though we are alive, living flesh and blood man; we are who we say we are and no government ID can change that. We have been brainwashed in believing, we are some name written on a passport ID.

I would like to go into the issues I have with the whole government system but lets just not waste time. Let’s just say, once you are awake, your eyes can never closed again. It is like a light bulb is on 24-7.

So while I was preparing to travel back to Panama [“Panama” is of indian origin, from a word meaning “the abundance of fish” or “the abundance of butterflies”], a friend of mine from the States told me to visit this psychic lady from down town (she was the one interested in having a palm reading). I don’t believe it much so I went along. After her reading, the lady asked me if I wanted a palm reading. I said, “why not.” Nothing to lose as I am already here.

All I can remember her saying ”…soon you will meet someone and move…I see a lot of water surrounding this place…lots of water…” I said to myself. “That must be Panama.” We have the canal, rivers, lakes and two oceans…When her last words were “I see a trip to Europe”. That was 2008 and now fast forward to 2014.

anayansigamboa sojourner Panama
Images courtesy of Google – Panama Canal, Amador, Balboa, Punta Pacifica

My decision was, that I must leave this city; it was not place for me; yea, I felt on called it spirit to leave it and to travel east.

It’s been 5 years since I got out of Dodge for the…onto Society of the Sojourner III -to be continued. Until then…

anayansigamboa sojourner Panama
Images courtesy of Google – Panama City, Balboa, Pacific Ocean

 Saludos y hasta luego!

Notes: sojourner means a temporary resident. Who knows that one of the most beautiful stories of the Spanish Conquest belongs to Panama? The story in question is related with Vasco Núñez de Balboa, who was a Spanish explorer and conquistador. He was the first European to see and stand in the waters of the Pacific Ocean, on September 1513. He claimed the Pacific Ocean and all its shores for Spain, which opened the way for Spanish exploration and conquest along the western coast of South America. But it was the Portuguese explorer, Ferdinand de Magalhães (not Balboa), who, because its waters seemed so calm, gave this ocean the name Pacific (meaning peaceful).

anayansigamboa Panama Pacific Ocean
Courtesy of Google – Balboa, Parque Anayansi, Pacific Ocean

What is your story? Leave your comments below.

Anayansi Gamboa, MPM, an EDC Developer Consultant and clinical programmer for the Pharmaceutical and Biotech industry with more than 13 years of experience.

Available for short-term contracts or ad-hoc requests. See my specialties section (Oracle, SQL Server, EDC Inform, EDC Rave, OpenClinica, SAS and other CDM tools)

As the 3 C’s of life states: Choices, Chances and Changes- you must make a choice to take a chance or your life will never change. I continually seek to implement means of improving processes to reduce cycle time and decrease work effort.

Fair Use Notice: This article/image/video contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law).

If you wish to use this copyrighted material for purposes that go beyond fair use, you must obtain permission from the copyright owner. Fair Use notwithstanding we will immediately comply with any copyright owner who wants their material removed or modified, wants us to link to their website or wants us to add their photo.

Disclaimer: The EDC Developer blog is “one man’s opinion”. Anything that is said on the report is either opinion, criticism, information or commentary. If making any type of investment or legal decision it would be wise to contact or consult a professional before making that decision.

Disclaimer:De inhoud van deze columns weerspiegelen niet per definitie de mening van {EDC Developer}.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

CDISC/CDASH Standards at your Fingertips

A standard database structure using CDISC (Clinical Data Interchange Standards Consortium) and CDASH (Clinical Data Acquisition Standards Harmonization) standards can facilitate the collection, exchange, reporting, and submission of clinical data to the FDA and EMEA. CDISC and CDASH standards provide reusability and scalability to EDC (electronic data capture) trials.

There are some defiance in implementing CDISC in EDC CDMS:

1. Key personnel in companies must be committed to implementing the CDISC/CDASH standards.

2. There is an initial cost for deployment of new technology: SDTM Data Translation Software, Data Storage and Hosting, Data Distribution and Reporting Software.

3. It can be difficult to understand and interpret complex SDTM Metadata concepts and the different implementation guides.

4. Deciding at what point in a study to apply the standards can be challenging: in the study design process, during data collection within the CDMS [CDASH via EDC tools], in SAS prior to report generation [ADaM], or after study completion prior to submission [SDTM].

5. Data management staff [CDM, clinical programmers], biostatisticians, and clinical monitors may find it difficult to converge on a new standard when designing standard libraries and processes.

6. Implementing new standards involves reorganizing the operations of (an organization) so as to improve efficiency [processes and SOPs].

7. Members of Data Management team must be retrained on the use of new software and CDISC/CDASH standards.

standards8. There are technical obstacles related to implementation in several EDC systems, including 8 character limitations [SAS] on numerous variables, determining when to use supplemental qualifiers versus creating new domains, and creating vertical data structure.

Comments? Join us at {EDC Developer}

Anayansi Gamboa, MPM, an EDC Developer Consultant and clinical programmer for the Pharmaceutical and Biotech industry with more than 13 years of experience.

Available for short-term contracts or ad-hoc requests. See my specialties section (Oracle, SQL Server, EDC Inform, EDC Rave, OpenClinica, SAS and other CDM tools)

As the 3 C’s of life states: Choices, Chances and Changes- you must make a choice to take a chance or your life will never change. I continually seek to implement means of improving processes to reduce cycle time and decrease work effort.

Subscribe to my blog’s RSS feed and email newsletter to get immediate updates on latest news, articles, and tips. I am available on LinkedIn. Connect with me there for technical discussions.

Fair Use Notice: This article/video contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes that go beyond fair use, you must obtain permission from the copyright owner. Fair Use notwithstanding we will immediately comply with any copyright owner who wants their material removed or modified, wants us to link to their website or wants us to add their photo.

Disclaimer: The EDC Developer blog is “one man’s opinion”. Anything that is said on the report is either opinion, criticism, information or commentary. If making any type of investment or legal decision it would be wise to contact or consult a professional before making that decision.

Disclaimer:De inhoud van deze columns weerspiegelen niet per definitie de mening van {EDC Developer}.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.