Category Archives: SAS

Freelancer / Consultant / EDC Developer / Clinical Programmer

* Setting up a project in EDC (Oracle InForm, Medidata Rave, OpenClinica, OCRDC)
* Creation of electronic case report forms (eCRFs)
* Validation of programs, edit checks
* Write validation test scripts
* Execute validation test scripts
* Write custom functions
* Implement study build best practices
* Knowledge of the process of clinical trials and the CDISC data structure

 

Advertisements

Top 3 Posts at (EDC Developer)

Fist, I would like to thank everyone who has read articles posted at {EDC} Developer. Especially, my colegas and friends from India. The highest reading and hits have come from people living in India.

New to the industry? Want to get in as clinical data manager or clinical programmer? Looking for a particular topic or an answer to a question? check the contact me section.

Here are the top most searched articles this past few months:

1- Data Management: Queries in Clinical Trials

2- How to document the testing done on the edit checks?

3- Why use JReview for your Clinical Trials?

Others most read articles:

Role of Project Management and the Project Manager in Clinical Data Management

4 Programming Languages You Should Learn Right Now (eClinical Speaking)

Data Management Plan in Clinical Trials

For the search term used to find {EDC} Developer:

1-types of edit checks in clinical data management

2-Rave programming

3- pharmaceutical terminology list

4-seeking rave training (better source is mdsol.com)

5- edc programmer

6-central design tips and tricks

Thank you for reading!

Using PROC UNIVARIATE to Validate Clinical Data

Using PROC UNIVARIATE to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC  UNIVARIATE procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC UNIVARIATE procedure.
  • Find extreme values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC FREQ to find missing/unique values. Today, we will use PROC UNIVARIATE which is useful for finding data outliers, which are data that falls outside expected values.

proc univariate data=labdata nextrobs=10;
var LBRESULT;
run;

Lab data result using Univariate

 

 

 

 

 

 

 

 

 

For validating data, you will be more interested in the last two tables from this report. The missing values table shows that the variable LBRESULT has 260 missing values. There are 457 observations. The extreme observations table can tell us the lowest and highest values (possible outliers) from our dataset. The nextrobs=10 specify the number of extreme observations to display on the report. To suppress it use nextrobs=0.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

 

Using PROC FREQ to Validate Clinical Data

Using PROC FREQ to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC FREQ procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC FREQ procedure.
  • Spot distinct values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC PRINT to find missing/invalid values. Today, we will use PROC FREQ  to view a frequency table of the unique values for a variable. The TABLES statement in a PROC FREQ step specified which frequency tables to produce.

proc freq data=labdataranges nlevels;
table _all_ / noprint;
run;

So how many unique lab test do we have on our raw data file? We know that our sas data set has 12 records. The Levels column from this report,  the labtest=3 uniques. Which means, we must have 9 duplicates labtest in total. For this type of data [lab ranges] though, this is correct. We are using it as an example as you can check any type of data.

Proc Freq sas

 

 

 

Lab test data ranges

 

 

 

 

 

 

 

 

 

 

 

 

So remember, to view the distinct values for a variable, you use PROC FREQ that produces frequency tables (nway/one way) . You can view the frequency, percent, cumulative frequency, and cumulative percentage. With the NLEVELS options, PROC FREQ displays a table that provides the number of distinct values for each variable name in the table statement.

Example: SEX variable has the correct values F or M as expected; however, it is missing for two observations.

Missing values proc freq

 

 

 

 

 

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

Using PROC PRINT to Validate Clinical Data

Using PROC PRINT to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC PRINT procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC PRINT procedure.
  • We will clean the data using data set steps with assignments and IF-THEN-ELSE statements.

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

In the example below, our lab data ranges table we find missing values. We also would like to update the lab test to UPPER case.

Clinical Raw data
Proc Print data val code
PROC PRINT output – data validation

 

From the screenshot above, our PROC PRINT program identified all missing / invalid values as per our specifications. We need to clean up 6 observations.

Cleaning Data Using Assignment Statements and If-Then-Else in SAS

We can use the data step to update the datasets/tables/domains when there is an invalid or missing data as per protocol requirements.

In our example, we have a lab data ranges for a study that has started but certain information is missing or invalid.

To convert our lab test in upper case, we will use an assignment statement. For the rest of the data cleaning, we will use IF statements.

Proc Print data cleaning

 

 

 

 

 

 

 

Data Validation and data cleaning final dataset

 

 

 

 

 

 

 

From our final dataset, we can verify that there are no missing values. We converted our labTest in uppercase and we updated the unit and  EffectiveEnddate to k/cumm and 31DEC2025 respectively.

You cannot use PROC PRINT to detect values that are not unique. We will do that in our next blog ‘Using PROC FREQ to Validate Clinical Data’. To find duplicates/remove duplicates, check out my previous post-Finding Duplicate data.

or use a proc sort data=<dataset> out=sorted nodupkey equals; by ID; run;

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

 

Count the number of discrepancies per procedure – OracleClinical (OC)

Let’s now write a quick program to count the number of discrepancies per procedure in OC/OCRDC:

Remember to comment /**/ or ***comment here*; what the program does. It is a good clinical practice to document everything so anyone can read your program and make the necessary updates, if necessary.

proc sql;
connect to oracle(path=ocpath);
create table discr as select * from connection to oracle
(Select  p.name, pd.test_order_sn detail, count(pd.test_order_sn) count, p.procedure_id procid
from discrepancy_management dm,
procedures p,
procedure_details pd
where dm.clinical_study_id=9999
and dm.procedure_id = p.procedure_id
and dm.procedure_detail_id=pd.procedure_detail_id
and p.PROCEDURE_VER_SN=pd.PROCEDURE_DETAIL_PROC_VER_SN
and dm.PROCEDURE_VER_SN=p.PROCEDURE_VER_SN
and dm.de_sub_TYPE_CODE=’MULTIVARIATE’
group by p.name, pd.test_order_sn, p.procedure_id
order by count(p.name)desc
);
/*document your code*/
proc sql;
connect to oracle(path=ocpath);
create table name as select * from connection to oracle
(select distinct p.procedure_id procid, p.name, pd.TEST_ORDER_SN detail
from  procedures p,
procedure_details pd
where p.clinical_study_id= 9999 *replace with your studyid;
and p.procedure_status_code !=’R’
and p.procedure_id=pd.procedure_id
order by procid
);
quit;

/* merge # of discrepancies with name */
proc sort data=discr;
by procid;
run;

proc sort data=name;
by procid;
run;

data discname;
merge discr (in=d) name (in=n);
by procid;
if n;
run;

proc sort data=discname ;
by descending count ;
run;

/* print out  */
proc print data=discname label;
var name numdisc percent numdcf;
label numdisc = ‘Number of discrepancies’
numdcf = ‘Number of DCFs’;
title “Number of discrepancies per Procedure”;
title2 “RA eClnica”;
run;

You could also export the report to Excel xls and have your DM / data manager review it.

Good luck and let me know if it was helpful.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

A quick way to find duplicates from external lab data

There are many different options to find a duplicate obs or data using SAS.

I have just received data from an external service provider and I want to compare it with my EDC lab data. My goal is to find any duplicates before is useful or before I can compare it with my lab data.

My external data has over 400 records so this will be a tedious job if I use excel (the original format). This review could take over a few hours, but with SAS, we can do this in less than 30 mins.

A simple SAS program:
over 400 records…

This is our SAS code:

SAS code to find duplicate data from an external source

This is our final output. We found the duplicates including empty rows.

external data
external data, lab data

 

 

 

 

 

 

 

 

It is important to note that not all duplicates can be deleted or found. Further review by your clinical team members is required. Once we are confident that duplicates have been removed, we know our data set is more accurate.

One last thing: How to get this data onto SAS so we can perform this task? Here’s a snippet:

  1. Import the file using PROC IMPORT (assuming you have DBMS ACCESS)
  2. You can use a permanent or temporary library. In this example, we created a temporary lib name = IMPORT
  3. We want to bring over the column names as var names using GETNAMES=YES statement

PROC IMPORT OUT= WORK.IMPORT
DATAFILE= “H:\Labs\LocalLab\LabDataReview.xls”
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
RUN;

Now you have learned a quick way to find duplicates on your external data. I hope this piece of code will make any data manager job easier. Remember, you can always enhance this program (add labels, formats, etc) and improve efficiency.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

eClinical Training

eClinical Training

Training—Make the most of affordable, informative, educational events to polish your professional skills.

Electronic Data Capturing

Medidata Rave
Web-based Data Management System for the collection of clinical trials data.
Our consultants are available short- and long-term projects.
Credentials:
– Medidata Rave Certified
– 4+ years experience in study build
– Custom Functions / C# Developers
– Crystal Reports Developers / JReview Developers

InForm Architect / Central Designer / InForm EDC
Learn InForm at your own paced or via online

Data Management Training 101
For entry levels and those seeking to enter the clinical research industry.

BASE SAS Training 101
Interested in learning SAS for clinical programmers? Coming soon…Contact us for more details…

One-on-One Training
Interested in learning RAVE / InForm / Oracle Clinical? Contact us for more details…

Solving Data Collection Challenges

Cross-partnership between sponsors and CROs for the collection and analysis of clinical trial data are complex. As a result there are a number of issues encountered during the running of  trial.

As with many projects, standardization projects like CDISC is a huge undertake. It requires resources, technology and knowledge-transfer. The industry (FDA for example) has been working on standardization for years but on September 2013, it became official, in which the FDA released a ‘Position Statement‘.

 Data Collection

According to the WHO, data collection is defined as the ongoing systematic collection, analysis, and interpretation of health data necessary for designing, implementing, and evaluating public health prevention programs.

Sources of data: primarily case report books or (e)CRF forms, laboratory data and patient report data or diaries.

 Challenges of data collection

It is important for the CROs / service providers to be aware of the potential challenges they may face when using different data collection methods for partnership clinical studies. Having several clients does not mean having several standards or naming conventions. This is the main reason why CDISC is here. So why are many CROs or service providers not using CDISC standards?

Another challenge is time limitations. Some clinical trials run for just a few weeks / months.

It may be found difficult to understand the partnership in the amount of time they have. Hence, most CROs and service providers prefer to perform manual mapping at the end of the trial, hence, re-work and manual work.

Funding also plays a key challenge for CDISC-compliance data collection study. Small researchers or biotechnology companies that do not have the resources in-house, out-sourced this task to CROs or service providers and are not interested whether it is compliance as long as it is save them money. But would it save money now instead of later in the close-out phase?

Anayansi Gamboa - Data Status

 

 

 

 

 

If there is a shortage of funding this may not allow the CRO or service provider all the opportunities that would assist them in capturing the information they need as per CDISC standards.

We really don’t have the level of expertise or the person dedicated to this that would bring, you know, the whole thing to fruition on the scale in which it’s envisioned – Researcher

Role of the Library

There is a clear need for libraries (GL) to move beyond passively providing technology to embrace the changes within the industry. The librarian functions as one of the most important of medical educators. This role is frequently unrecognized, and for that reason, too little attention is given to this role. There has been too little attention paid to the research role that should be played by the librarian. With the development of new methods of information storage and dissemination, it is imperative that the persons primarily responsible for this function should be actively engaged in research. We have little information at the present time as to the relative effectiveness of these various media. We need research in this area. Librarians should assume an active role in incorporating into their area of responsibility the various types of storage media. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC232677/]

Review and Revise

At the review and revise stage it might be useful for the CRO or service provider to consider what the main issues are when collecting and organizing the data on the study. Some of these issues include: ensuring sponsors, partners and key stakeholders were engaged in the scoping phase and defining its purpose; the objectives have been considered; the appropriate data collection methods have been used; the data has been verified through the use of multiple sources and that sponsors have approved the data that is used in the final clinical data report.

Current data management systems must be fundamentally improved so that they can meet the capacity demand for secure storage and transmission of research data. And while there can be no definitive tools and guideline, it is certain that we must start using CDISC-standards from the data collection step to avoid re-inventing the wheel each time a new sponsor or clinical researcher ask you to run their clinical trial.

RA eClinica is a established consultancy company for all essential aspects of statistics, clinical data management and EDC solutions. Our services are targeted to clients in the pharmaceutical and biotech sector, health insurers and medical devices.

The company is headquarter in Panama City and representation offices with business partners in the United States, India and the European Union.  For discussion about our services and how you can benefit from our SMEs and cost-effective implementation CDISC SDTM clinical data click here.

The Only Three (3) [Programming] Languages You Should Learn Right Now (eClinical Speaking)

On a previous article that I wrote in 2012, I mentioned 4 programming languages that you should be learning when it comes to the development of clinical trials. Why is this important, you may ask? Clinical Trials is a method to determine if a new drug or treatment will work on disease or will it be beneficial to patients. Anayansi Gamboa - Clinical Data Management Process If you have never written a line of code in your life, you are in the right place. If you have some programming experience, but interesting in learning clinical programming, this information can be helpful.

But shouldn’t I be Learning ________?

Here are the latest eClinical programming languages you should learn:

1. SAS®: Data analysis and result reporting are two major tasks to SAS® programers. Currently, SAS is offering certifications as a Clinical Trials Programmer. Some of the skills you should learned are:

  • clinical trials process
  • accessing, managing, and transforming clinical trials data
  • statistical procedures and macro programming
  • reporting clinical trials results
  • validating clinical trial data reporting

2. ODM/XML: Operational Data Modeling or ODM uses XML to build the standard data exchange models that are being developed to support the data acquisition, exchange and archiving of operational data.

3. CDISC Language: Yes. This is not just any code. This is the standard language on clinical trials and you should be learning it right now. The future is here now. The EDC code as we know it will eventually go away as more and more vendors try to adapt their systems and technologies to meet rules and regulations. Some of the skills you should learn:

  • Annotation of variables and variable values – SDTM aCRF
  • Define XML – CDISC SDTM datasets
  • ADaM datasets – CDISC ADaM datasets

CDISC has established data standards to speed-up data review and FDA is now suggesting that soon this will become the norm. Pharmaceuticals, bio-technologies companies and many sponsors within clinical research are now better equipped to improve CDISC implementation.

Everyone should learn to code

Therefore, SAS® and XML are now cooperating. XML Engine in SAS® v9.0 is built up so one can import a wide variety of XML documentation. SAS® does what is does best – statistics, and XML does what it does best – creating reportquality tables by taking advantage of the full feature set of the publishing software. This conversation can produce report-quality tables in an automated hands-off/light out process.

Standards are more than just CDISC

If you are looking for your next career in Clinical Data Management, then SAS and CDISC SDTM should land you into the right path of career development and job security.

Conclusion: Learn the basics and advanced SAS clinical programming concepts such as reading and manipulating clinical data. Using the clinical features and basic SAS programming concepts of clinical trials, you will be able to import ADAM, CDISC or other standards for domain structure and contents into the metadata, build clinical domain target table metadata from those standards, create jobs to load clinical domains, validate the structure and content of the clinical domains based on the standards, and to generate CDISC standard define.xml files that describes the domain tables for clinical submissions.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica – Open Source and Oracle Clinical.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Source:

SAS Institute
CDISC