Category Archives: SAS

Vandaag is GDPR dag! GDPR is in place! Are you ready for it?

According to the EU General Data Protection Regulation (GDPR), which comes into effect today, May 25th, 2018, most companies will need to inform you of their privacy policy for processing and protecting your personal information and your privacy.

The General Data Protection Regulation (GDPR) is already in place, but many companies are not yet ready - more precisely, only 45% of organizations said they had a structured plan to comply with it.

A recent survey also reveals that 54% of large organizations (with more than 5,000 employees) are better prepared to deal with GDPR; in small ones, this index drops to 37%. And, only 24% of companies use external consulting to become compatible.

With this  Regulation, individuals have the right to request that their personal data be erased or transferred to another organization. This raises questions as to what tools and processes they will need to implement. For 48% of respondents, it is a challenge to find only personal data in their own banks. In these cases, compliance with the GDPR rules will be an even more serious task.

55% of organizations are not prepared for GDPR

For EU citizens and residents, this is a welcome law. But for US citizens and residents, they will continue to suffer identity theft and data privacy violations in the hands of the same companies the EU is trying to fined and control under this law. The Googles, the Facebooks, the Twitters and most social media will be scrutinized heavily after this day.

Who does the GDPR affect?
The GDPR not only applies to organizations located within the EU but it will also apply to organizations located outside of the EU if they offer goods or services to, or monitor the behavior of, EU data subjects. It applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

What are the penalties for non-compliance?
Organizations can be fined up to 4% of annual global turnover for breaching GDPR or €20 Million. This is the maximum fine that can be imposed for the most serious infringements e.g.not having sufficient customer consent to process data or violating the core of Privacy by Design concepts. There is a tiered approach to fines e.g. a company can be fined 2% for not having their records in order (article 28), not notifying the supervising authority and data subject about a breach or not conducting an impact assessment. It is important to note that these rules apply to both controllers and processors — meaning ‘clouds’ will not be exempt from GDPR enforcement.

Source:

https://www.eugdpr.org/

https://ec.europa.eu/commission/priorities/justice-and-fundamental-rights/data-protection/2018-reform-eu-data-protection-rules_en

Fair Use Notice: Images/logos/graphics on this page contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law).

Advertisements

Freelancer / Consultant / EDC Developer / Clinical Programmer

* Setting up a project in EDC (Oracle InForm, Medidata Rave, OpenClinica, OCRDC)
* Creation of electronic case report forms (eCRFs)
* Validation of programs, edit checks
* Write validation test scripts
* Execute validation test scripts
* Write custom functions
* Implement study build best practices
* Knowledge of the process of clinical trials and the CDISC data structure

 

Top 3 Posts at (EDC Developer)

Fist, I would like to thank everyone who has read articles posted at {EDC} Developer. Especially, my colegas and friends from India. The highest reading and hits have come from people living in India.

New to the industry? Want to get in as clinical data manager or clinical programmer? Looking for a particular topic or an answer to a question? check the contact me section.

Here are the top most searched articles this past few months:

1- Data Management: Queries in Clinical Trials

2- How to document the testing done on the edit checks?

3- Why use JReview for your Clinical Trials?

Others most read articles:

Role of Project Management and the Project Manager in Clinical Data Management

4 Programming Languages You Should Learn Right Now (eClinical Speaking)

Data Management Plan in Clinical Trials

For the search term used to find {EDC} Developer:

1-types of edit checks in clinical data management

2-Rave programming

3- pharmaceutical terminology list

4-seeking rave training (better source is mdsol.com)

5- edc programmer

6-central design tips and tricks

Thank you for reading!

Using PROC UNIVARIATE to Validate Clinical Data

Using PROC UNIVARIATE to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC  UNIVARIATE procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC UNIVARIATE procedure.
  • Find extreme values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC FREQ to find missing/unique values. Today, we will use PROC UNIVARIATE which is useful for finding data outliers, which are data that falls outside expected values.

proc univariate data=labdata nextrobs=10;
var LBRESULT;
run;

Lab data result using Univariate

 

 

 

 

 

 

 

 

 

For validating data, you will be more interested in the last two tables from this report. The missing values table shows that the variable LBRESULT has 260 missing values. There are 457 observations. The extreme observations table can tell us the lowest and highest values (possible outliers) from our dataset. The nextrobs=10 specify the number of extreme observations to display on the report. To suppress it use nextrobs=0.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

 

Using PROC FREQ to Validate Clinical Data

Using PROC FREQ to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC FREQ procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC FREQ procedure.
  • Spot distinct values

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

Previously, we used PROC PRINT to find missing/invalid values. Today, we will use PROC FREQ  to view a frequency table of the unique values for a variable. The TABLES statement in a PROC FREQ step specified which frequency tables to produce.

proc freq data=labdataranges nlevels;
table _all_ / noprint;
run;

So how many unique lab test do we have on our raw data file? We know that our sas data set has 12 records. The Levels column from this report,  the labtest=3 uniques. Which means, we must have 9 duplicates labtest in total. For this type of data [lab ranges] though, this is correct. We are using it as an example as you can check any type of data.

Proc Freq sas

 

 

 

Lab test data ranges

 

 

 

 

 

 

 

 

 

 

 

 

So remember, to view the distinct values for a variable, you use PROC FREQ that produces frequency tables (nway/one way) . You can view the frequency, percent, cumulative frequency, and cumulative percentage. With the NLEVELS options, PROC FREQ displays a table that provides the number of distinct values for each variable name in the table statement.

Example: SEX variable has the correct values F or M as expected; however, it is missing for two observations.

Missing values proc freq

 

 

 

 

 

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

Using PROC PRINT to Validate Clinical Data

Using PROC PRINT to Validate Clinical Data

When your data isn’t clean, you need to locate the errors and validate them.  We can use SAS Procedures to determine whether or not the data is clean. Today, we will cover the PROC PRINT procedure.

  • First step is to identify the errors in a raw data file. Usually, in our DMP, in the DVP/DVS section, we can identify what it is considered ‘clean’ or data errors.
    • Study your data
  • Then validate using PROC PRINT procedure.
  • We will clean the data using data set steps with assignments and IF-THEN-ELSE statements.

When you validate your data, you are looking for:

  • Missing values
  • Invalid values
  • Out-of-ranges values
  • Duplicate values

In the example below, our lab data ranges table we find missing values. We also would like to update the lab test to UPPER case.

Clinical Raw data
Proc Print data val code
PROC PRINT output – data validation

 

From the screenshot above, our PROC PRINT program identified all missing / invalid values as per our specifications. We need to clean up 6 observations.

Cleaning Data Using Assignment Statements and If-Then-Else in SAS

We can use the data step to update the datasets/tables/domains when there is an invalid or missing data as per protocol requirements.

In our example, we have a lab data ranges for a study that has started but certain information is missing or invalid.

To convert our lab test in upper case, we will use an assignment statement. For the rest of the data cleaning, we will use IF statements.

Proc Print data cleaning

 

 

 

 

 

 

 

Data Validation and data cleaning final dataset

 

 

 

 

 

 

 

From our final dataset, we can verify that there are no missing values. We converted our labTest in uppercase and we updated the unit and  EffectiveEnddate to k/cumm and 31DEC2025 respectively.

You cannot use PROC PRINT to detect values that are not unique. We will do that in our next blog ‘Using PROC FREQ to Validate Clinical Data’. To find duplicates/remove duplicates, check out my previous post-Finding Duplicate data.

or use a proc sort data=<dataset> out=sorted nodupkey equals; by ID; run;

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

 

Count the number of discrepancies per procedure – OracleClinical (OC)

Let’s now write a quick program to count the number of discrepancies per procedure in OC/OCRDC:

Remember to comment /**/ or ***comment here*; what the program does. It is a good clinical practice to document everything so anyone can read your program and make the necessary updates, if necessary.

proc sql;
connect to oracle(path=ocpath);
create table discr as select * from connection to oracle
(Select  p.name, pd.test_order_sn detail, count(pd.test_order_sn) count, p.procedure_id procid
from discrepancy_management dm,
procedures p,
procedure_details pd
where dm.clinical_study_id=9999
and dm.procedure_id = p.procedure_id
and dm.procedure_detail_id=pd.procedure_detail_id
and p.PROCEDURE_VER_SN=pd.PROCEDURE_DETAIL_PROC_VER_SN
and dm.PROCEDURE_VER_SN=p.PROCEDURE_VER_SN
and dm.de_sub_TYPE_CODE=’MULTIVARIATE’
group by p.name, pd.test_order_sn, p.procedure_id
order by count(p.name)desc
);
/*document your code*/
proc sql;
connect to oracle(path=ocpath);
create table name as select * from connection to oracle
(select distinct p.procedure_id procid, p.name, pd.TEST_ORDER_SN detail
from  procedures p,
procedure_details pd
where p.clinical_study_id= 9999 *replace with your studyid;
and p.procedure_status_code !=’R’
and p.procedure_id=pd.procedure_id
order by procid
);
quit;

/* merge # of discrepancies with name */
proc sort data=discr;
by procid;
run;

proc sort data=name;
by procid;
run;

data discname;
merge discr (in=d) name (in=n);
by procid;
if n;
run;

proc sort data=discname ;
by descending count ;
run;

/* print out  */
proc print data=discname label;
var name numdisc percent numdcf;
label numdisc = ‘Number of discrepancies’
numdcf = ‘Number of DCFs’;
title “Number of discrepancies per Procedure”;
title2 “RA eClnica”;
run;

You could also export the report to Excel xls and have your DM / data manager review it.

Good luck and let me know if it was helpful.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

A quick way to find duplicates from external lab data

There are many different options to find a duplicate obs or data using SAS.

I have just received data from an external service provider and I want to compare it with my EDC lab data. My goal is to find any duplicates before is useful or before I can compare it with my lab data.

My external data has over 400 records so this will be a tedious job if I use excel (the original format). This review could take over a few hours, but with SAS, we can do this in less than 30 mins.

A simple SAS program:
over 400 records…

This is our SAS code:

SAS code to find duplicate data from an external source

This is our final output. We found the duplicates including empty rows.

external data
external data, lab data

 

 

 

 

 

 

 

 

It is important to note that not all duplicates can be deleted or found. Further review by your clinical team members is required. Once we are confident that duplicates have been removed, we know our data set is more accurate.

One last thing: How to get this data onto SAS so we can perform this task? Here’s a snippet:

  1. Import the file using PROC IMPORT (assuming you have DBMS ACCESS)
  2. You can use a permanent or temporary library. In this example, we created a temporary lib name = IMPORT
  3. We want to bring over the column names as var names using GETNAMES=YES statement

PROC IMPORT OUT= WORK.IMPORT
DATAFILE= “H:\Labs\LocalLab\LabDataReview.xls”
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
RUN;

Now you have learned a quick way to find duplicates on your external data. I hope this piece of code will make any data manager job easier. Remember, you can always enhance this program (add labels, formats, etc) and improve efficiency.

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

eClinical Training

eClinical Training

Training—Make the most of affordable, informative, educational events to polish your professional skills.

Electronic Data Capturing

Medidata Rave
Web-based Data Management System for the collection of clinical trials data.
Our consultants are available short- and long-term projects.
Credentials:
– Medidata Rave Certified
– 4+ years experience in study build
– Custom Functions / C# Developers
– Crystal Reports Developers / JReview Developers

InForm Architect / Central Designer / InForm EDC
Learn InForm at your own paced or via online

Data Management Training 101
For entry levels and those seeking to enter the clinical research industry.

BASE SAS Training 101
Interested in learning SAS for clinical programmers? Coming soon…Contact us for more details…

One-on-One Training
Interested in learning RAVE / InForm / Oracle Clinical? Contact us for more details…

Solving Data Collection Challenges

Cross-partnership between sponsors and CROs for the collection and analysis of clinical trial data are complex. As a result there are a number of issues encountered during the running of  trial.

As with many projects, standardization projects like CDISC is a huge undertake. It requires resources, technology and knowledge-transfer. The industry (FDA for example) has been working on standardization for years but on September 2013, it became official, in which the FDA released a ‘Position Statement‘.

 Data Collection

According to the WHO, data collection is defined as the ongoing systematic collection, analysis, and interpretation of health data necessary for designing, implementing, and evaluating public health prevention programs.

Sources of data: primarily case report books or (e)CRF forms, laboratory data and patient report data or diaries.

 Challenges of data collection

It is important for the CROs / service providers to be aware of the potential challenges they may face when using different data collection methods for partnership clinical studies. Having several clients does not mean having several standards or naming conventions. This is the main reason why CDISC is here. So why are many CROs or service providers not using CDISC standards?

Another challenge is time limitations. Some clinical trials run for just a few weeks / months.

It may be found difficult to understand the partnership in the amount of time they have. Hence, most CROs and service providers prefer to perform manual mapping at the end of the trial, hence, re-work and manual work.

Funding also plays a key challenge for CDISC-compliance data collection study. Small researchers or biotechnology companies that do not have the resources in-house, out-sourced this task to CROs or service providers and are not interested whether it is compliance as long as it is save them money. But would it save money now instead of later in the close-out phase?

Anayansi Gamboa - Data Status

 

 

 

 

 

If there is a shortage of funding this may not allow the CRO or service provider all the opportunities that would assist them in capturing the information they need as per CDISC standards.

We really don’t have the level of expertise or the person dedicated to this that would bring, you know, the whole thing to fruition on the scale in which it’s envisioned – Researcher

Role of the Library

There is a clear need for libraries (GL) to move beyond passively providing technology to embrace the changes within the industry. The librarian functions as one of the most important of medical educators. This role is frequently unrecognized, and for that reason, too little attention is given to this role. There has been too little attention paid to the research role that should be played by the librarian. With the development of new methods of information storage and dissemination, it is imperative that the persons primarily responsible for this function should be actively engaged in research. We have little information at the present time as to the relative effectiveness of these various media. We need research in this area. Librarians should assume an active role in incorporating into their area of responsibility the various types of storage media. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC232677/]

Review and Revise

At the review and revise stage it might be useful for the CRO or service provider to consider what the main issues are when collecting and organizing the data on the study. Some of these issues include: ensuring sponsors, partners and key stakeholders were engaged in the scoping phase and defining its purpose; the objectives have been considered; the appropriate data collection methods have been used; the data has been verified through the use of multiple sources and that sponsors have approved the data that is used in the final clinical data report.

Current data management systems must be fundamentally improved so that they can meet the capacity demand for secure storage and transmission of research data. And while there can be no definitive tools and guideline, it is certain that we must start using CDISC-standards from the data collection step to avoid re-inventing the wheel each time a new sponsor or clinical researcher ask you to run their clinical trial.

RA eClinica is a established consultancy company for all essential aspects of statistics, clinical data management and EDC solutions. Our services are targeted to clients in the pharmaceutical and biotech sector, health insurers and medical devices.

The company is headquarter in Panama City and representation offices with business partners in the United States, India and the European Union.  For discussion about our services and how you can benefit from our SMEs and cost-effective implementation CDISC SDTM clinical data click here.