Category Archives: Advanced SAS

Solving Data Collection Challenges

Cross-partnership between sponsors and CROs for the collection and analysis of clinical trial data are complex. As a result there are a number of issues encountered during the running of  trial.

As with many projects, standardization projects like CDISC is a huge undertake. It requires resources, technology and knowledge-transfer. The industry (FDA for example) has been working on standardization for years but on September 2013, it became official, in which the FDA released a ‘Position Statement‘.

 Data Collection

According to the WHO, data collection is defined as the ongoing systematic collection, analysis, and interpretation of health data necessary for designing, implementing, and evaluating public health prevention programs.

Sources of data: primarily case report books or (e)CRF forms, laboratory data and patient report data or diaries.

 Challenges of data collection

It is important for the CROs / service providers to be aware of the potential challenges they may face when using different data collection methods for partnership clinical studies. Having several clients does not mean having several standards or naming conventions. This is the main reason why CDISC is here. So why are many CROs or service providers not using CDISC standards?

Another challenge is time limitations. Some clinical trials run for just a few weeks / months.

It may be found difficult to understand the partnership in the amount of time they have. Hence, most CROs and service providers prefer to perform manual mapping at the end of the trial, hence, re-work and manual work.

Funding also plays a key challenge for CDISC-compliance data collection study. Small researchers or biotechnology companies that do not have the resources in-house, out-sourced this task to CROs or service providers and are not interested whether it is compliance as long as it is save them money. But would it save money now instead of later in the close-out phase?

Anayansi Gamboa - Data Status






If there is a shortage of funding this may not allow the CRO or service provider all the opportunities that would assist them in capturing the information they need as per CDISC standards.

We really don’t have the level of expertise or the person dedicated to this that would bring, you know, the whole thing to fruition on the scale in which it’s envisioned – Researcher

Role of the Library

There is a clear need for libraries (GL) to move beyond passively providing technology to embrace the changes within the industry. The librarian functions as one of the most important of medical educators. This role is frequently unrecognized, and for that reason, too little attention is given to this role. There has been too little attention paid to the research role that should be played by the librarian. With the development of new methods of information storage and dissemination, it is imperative that the persons primarily responsible for this function should be actively engaged in research. We have little information at the present time as to the relative effectiveness of these various media. We need research in this area. Librarians should assume an active role in incorporating into their area of responsibility the various types of storage media. []

Review and Revise

At the review and revise stage it might be useful for the CRO or service provider to consider what the main issues are when collecting and organizing the data on the study. Some of these issues include: ensuring sponsors, partners and key stakeholders were engaged in the scoping phase and defining its purpose; the objectives have been considered; the appropriate data collection methods have been used; the data has been verified through the use of multiple sources and that sponsors have approved the data that is used in the final clinical data report.

Current data management systems must be fundamentally improved so that they can meet the capacity demand for secure storage and transmission of research data. And while there can be no definitive tools and guideline, it is certain that we must start using CDISC-standards from the data collection step to avoid re-inventing the wheel each time a new sponsor or clinical researcher ask you to run their clinical trial.

RA eClinica is a established consultancy company for all essential aspects of statistics, clinical data management and EDC solutions. Our services are targeted to clients in the pharmaceutical and biotech sector, health insurers and medical devices.

The company is headquarter in Panama City and representation offices with business partners in the United States, India and the European Union.  For discussion about our services and how you can benefit from our SMEs and cost-effective implementation CDISC SDTM clinical data click here.


The Only Three (3) [Programming] Languages You Should Learn Right Now (eClinical Speaking)

On a previous article that I wrote in 2012, I mentioned 4 programming languages that you should be learning when it comes to the development of clinical trials. Why is this important, you may ask? Clinical Trials is a method to determine if a new drug or treatment will work on disease or will it be beneficial to patients. Anayansi Gamboa - Clinical Data Management Process If you have never written a line of code in your life, you are in the right place. If you have some programming experience, but interesting in learning clinical programming, this information can be helpful.

But shouldn’t I be Learning ________?

Here are the latest eClinical programming languages you should learn:

1. SAS®: Data analysis and result reporting are two major tasks to SAS® programers. Currently, SAS is offering certifications as a Clinical Trials Programmer. Some of the skills you should learned are:

  • clinical trials process
  • accessing, managing, and transforming clinical trials data
  • statistical procedures and macro programming
  • reporting clinical trials results
  • validating clinical trial data reporting

2. ODM/XML: Operational Data Modeling or ODM uses XML to build the standard data exchange models that are being developed to support the data acquisition, exchange and archiving of operational data.

3. CDISC Language: Yes. This is not just any code. This is the standard language on clinical trials and you should be learning it right now. The future is here now. The EDC code as we know it will eventually go away as more and more vendors try to adapt their systems and technologies to meet rules and regulations. Some of the skills you should learn:

  • Annotation of variables and variable values – SDTM aCRF
  • Define XML – CDISC SDTM datasets
  • ADaM datasets – CDISC ADaM datasets

CDISC has established data standards to speed-up data review and FDA is now suggesting that soon this will become the norm. Pharmaceuticals, bio-technologies companies and many sponsors within clinical research are now better equipped to improve CDISC implementation.

Everyone should learn to code

Therefore, SAS® and XML are now cooperating. XML Engine in SAS® v9.0 is built up so one can import a wide variety of XML documentation. SAS® does what is does best – statistics, and XML does what it does best – creating reportquality tables by taking advantage of the full feature set of the publishing software. This conversation can produce report-quality tables in an automated hands-off/light out process.

Standards are more than just CDISC

If you are looking for your next career in Clinical Data Management, then SAS and CDISC SDTM should land you into the right path of career development and job security.

Conclusion: Learn the basics and advanced SAS clinical programming concepts such as reading and manipulating clinical data. Using the clinical features and basic SAS programming concepts of clinical trials, you will be able to import ADAM, CDISC or other standards for domain structure and contents into the metadata, build clinical domain target table metadata from those standards, create jobs to load clinical domains, validate the structure and content of the clinical domains based on the standards, and to generate CDISC standard define.xml files that describes the domain tables for clinical submissions.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica – Open Source and Oracle Clinical.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.


SAS Institute

Project Plan: CDISC Implementation

CDISC standards have been in development for many years. There are now methodologies and technologies that would make the transformation of non-standard data into CDISC-compliance with ease. Clinical trials have evolved and become more complex and this requires a new set of skills outside of clinical research – Project Management.

As with many projects, CDISC is a huge undertake. It requires resources, technology and knowledge-transfer. The industry (FDA for example) has been working on standardization for years but on September 2013, it became official, in which the FDA released a ‘Position Statement‘.

So what is CDISC? We can say that it is way of naming convention for XPT files, or field names naming conventions or rules for handling unusual data. Currently, there are two main components of CDISC: SDTM (Study Data Tabulation Model) and aDAM (Analysis Data Model).

As a project manager and with the right tool, you can look to a single source project information to manage the project through its life-cycle – from planning, through execution, to completion.

1) Define Scope: This is where you’re tested on everything that has to do with getting a project up and running: what’s in the charter, developing the preliminary scope, understanding what your stakeholders need, and how your organization handles projects.

The scope document is a form of a requirement document which will help you identify the goals for this project. It can also be used as a communication method to other managers and team members to set the appropriate level of expectations.

The project scope management plan is a really important tool in your project. You need to make sure that what you’re delivering matches what you wrote down in the scope statement.

2) Define Tasks: we now need to document all the tasks that are required in implementing and transforming your data to CDISC.

Project Tasks  (Work packages) Estimates (work unit)
Initial data standards review 27
Data Integrity review 17
Create transformation models 35

The work breakdown structure (wbs) provides the foundation for defining work as it relates to project objectives. The scope of work in terms of deliverables and to facilitate communication between the project manager and stakeholders throughout the life of the project. Hence, even though, preliminary at first, it is a key input to other project management processes and deliverables.

3) Project Plan: Once we completed the initiation phase (preliminary estimates), we need to create a project plan assigning resources to project and schedule those tasks. Project schedules can be presented in many ways, including simple lists, bar charts with dates, and network logic diagrams with dates, to name just a few. A sample of the project plan is shown below:

project plan sample
image from Meta‐Xceed paper about CDISC

4) Validation Step: Remember 21 CFR Part 11 compliance for Computer Systems Validation? The risk management effort is not a one-time activity on the project. Uncertainty is directly associated with the change being produced by a project. The following lists some of the tasks that are performed as it pertains to validation.

  • Risk Assessment: Different organizations have different approaches towards validation of programs. This is partly due to varying interpretations of the regulations and also  due to how different managers and organizations function. Assess the level of validation that needs to take place.
  • Test Plan: In accordance with the project plan and, if not, to determine how to address any deviation. Test planning is essential in:  ensuring testing identifies and reveals as many errors as possible and to acceptable levels of quality.

test plan-cdisc

  • Summary Results: This is all the findings documented during testing.

An effective risk management process involves first identifying and defining risk factors that could affect the various stages of the CDISC implementation process as well as specific aspects of the project. riskplan

5) Transformation Specification: Dataset transformation is a process in which a set of source datasets and its variables are changed to  meet new standard requirements. Some changes will occur during this step: For example, variable name must be 8 chars long. The variable label must not be more than 40 chars in length. Combining values from multiple sources (datasets) into on variable.

6) Applying Transformation: This is done according to specification however, this document is active during the duration of a project and can change. There are now many tools available to help with this tasks as it could be time consuming and resource intensive to update the source code (SAS) manually. Transdata, CDISCXpres, SAS CDIDefine-it; just to name a few.

7) Verification Reports: The validation test plan will detail the specific test cases that need to be implemented  to ensure quality of the transformation. For example, a common report is the “Duplicate Variable” report.

8) Special Purpose Domain: CDISC has several special purpose domains: CO (comments), RELREC (related records or relationship between two datasets) and SUPPQUAL (supplemental qualifiers for non-standards variables).

9) Data Definition Documentation: In order to understand what all the variables are and how they are derived, we need a annotation document. This is the document that will be included during data submission. SAS PROC CONTENTS can help in the generation of this type of metadata documentation. The last step in the project plan for CDISC implementation is to generate the documentation in either PDF  or XML format.

CDISC has established data standards to speed-up data review and FDA is now suggesting that soon this will become the norm. Pharmaceuticals, bio-technologies companies and many sponsors within clinical research are now better equipped to improve CDISC implementation.

Need SAS programmers? RA eClinica can help provide resources in-house / off-shore to facilitate FDA review by supporting CDISC mapping, SDTM validation tool, data conversion and CDASH compliant eCRFs.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

CDISC/CDASH Standards at your Fingertips

A standard database structure using CDISC (Clinical Data Interchange Standards Consortium) and CDASH (Clinical Data Acquisition Standards Harmonization) standards can facilitate the collection, exchange, reporting, and submission of clinical data to the FDA and EMEA. CDISC and CDASH standards provide reusability and scalability to EDC (electronic data capture) trials.

There are some defiance in implementing CDISC in EDC CDMS:

1. Key personnel in companies must be committed to implementing the CDISC/CDASH standards.

2. There is an initial cost for deployment of new technology: SDTM Data Translation Software, Data Storage and Hosting, Data Distribution and Reporting Software.

3. It can be difficult to understand and interpret complex SDTM Metadata concepts and the different implementation guides.

4. Deciding at what point in a study to apply the standards can be challenging: in the study design process, during data collection within the CDMS [CDASH via EDC tools], in SAS prior to report generation [ADaM], or after study completion prior to submission [SDTM].

5. Data management staff [CDM, clinical programmers], biostatisticians, and clinical monitors may find it difficult to converge on a new standard when designing standard libraries and processes.

6. Implementing new standards involves reorganizing the operations of (an organization) so as to improve efficiency [processes and SOPs].

7. Members of Data Management team must be retrained on the use of new software and CDISC/CDASH standards.

standards8. There are technical obstacles related to implementation in several EDC systems, including 8 character limitations [SAS] on numerous variables, determining when to use supplemental qualifiers versus creating new domains, and creating vertical data structure.

Comments? Join us at {EDC Developer}

Anayansi Gamboa, MPM, an EDC Developer Consultant and clinical programmer for the Pharmaceutical and Biotech industry with more than 13 years of experience.

Available for short-term contracts or ad-hoc requests. See my specialties section (Oracle, SQL Server, EDC Inform, EDC Rave, OpenClinica, SAS and other CDM tools)

As the 3 C’s of life states: Choices, Chances and Changes- you must make a choice to take a chance or your life will never change. I continually seek to implement means of improving processes to reduce cycle time and decrease work effort.

Subscribe to my blog’s RSS feed and email newsletter to get immediate updates on latest news, articles, and tips. I am available on LinkedIn. Connect with me there for technical discussions.

Fair Use Notice: This article/video contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes that go beyond fair use, you must obtain permission from the copyright owner. Fair Use notwithstanding we will immediately comply with any copyright owner who wants their material removed or modified, wants us to link to their website or wants us to add their photo.

Disclaimer: The EDC Developer blog is “one man’s opinion”. Anything that is said on the report is either opinion, criticism, information or commentary. If making any type of investment or legal decision it would be wise to contact or consult a professional before making that decision.

Disclaimer:De inhoud van deze columns weerspiegelen niet per definitie de mening van {EDC Developer}.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Did You Know?

Did You Know? »

SAS Proc Contents option
Proc Contents that will arrange the output variable list in the order the variables appear in the dataset as opposed to arranging the variables alphabetically

Proc contents data=<dsn> varnum;

  • POSITION option

proc contents position data=;

proc contents data=mysasdata POSITION;

  • VARNUM option

Use the VARNUM option:

proc contents data=mysasdata VARNUM;

SAS: Problem Solving 1

Today we want to provide you with some problem-resolution options for a simple situation.


We have 3 variables that we will call Var1, Var2, Var3. Their values ranges from 1-9 and we would like to create new variables that would flag a response based on their value on any of the 3 previous variables.


Response 7 has one (1) variable Var1, then VarFlag1 should be equal to 1

If the same response 7 has a value 3 on Var3, then the VarFlag3 should be equal to 1

Solution 1: Data step

data mydata;
input Var1, Var2, Var3;
array vars Var1 Var2 Var3;
array flags flag1-flag9;
do over vars;
if 1 <=vars<=9 then

2.5 7 9;

proc print; run;

Solution 2: array solution

array flag{*} flag1-flag9;
do j=1 to 9;

Solution 2: Macro solution

%macro SET_Flags(Flag,num);
%do 1=1 % to &n;
&Flag.&i=(Var=&i or Var=&i or Var=&i);
%mend Set_Flags;

Data Mydata;

Professional Timeline – Clinical Programmer

Professional Timeline

Curriculum Vitae


anayansi gamboa

SAS Proficiency Test

Here are some common SAS questions:

1- What happens if no VAR statement is used in PROC PRINT?

2- Understand how SAS processes a data step

3- Explain what a (data step) SELECT statement does

4- What is the default length of a numeric variable?

5-What is the difference between DO WHILE and DO UNTIL?

6-What does BY after SET A; do?

7-What does BY after SET A B; do?

8-What is a sub-setting IF statement?

If you pass it doesn’t mean you’re good; but if you’re good, you should pass

9- What’s the difference between INFILE and INPUT?

10-What information is given by a Proc Contents?

11-How do you modify the header?

12-Which elements can you change? (in reference to question 10-11)

13-Which elements can you NOT change without re-creating the dataset?(in reference to question 10-11)

14-What is the purpose of the single trailing @ sign?

15-What statements are used to include or exclude specific variables in a data step?

16-How can you generate a 2-dimensional cross-tabulation?

17-In the macro language, all macro commands begin with?

18-What occurs if multiple datasets are included in a SET statement?

19-What symbol is used to concatenate two character values?

20-Explain the substr function

21-What are the differences among these functions? Round(), Ceil(), Floor()

22-What happens if you use the same in= variable for 2 datasets?

23-How do you begin a data step if you do not want to create a SAS dataset?

24-What are the automatic variables first. and last. ?

25-What does a 2 level input dataset name such as MydataSet.Mydata indicate ?

26-What is the default libname, where one-level dataset names go?

27-What statement allows you to keep track of previous values of variables or keep a running total?

28-What happens if you retain the value of a variable in the incoming dataset?

29-When an invalid data field is encountered when inputting a numeric variable, what happens?

30-What’s the difference between $w. and $CHARw. informats?

31-Explain the _type_ variable generated by proc summary

32-What is the purpose of a %GLOBAL statement?

33-What do CALL SYMPUT and CALL SYMGET do?

34-What date is the reference date for calculating the value of a SAS date variable?

35-What is _n_?

There are 10 kinds of people in the world: those who understand binary, and those who don’t.

Those are SAS most common questions you will be asked during a SAS programming job interview or if you plan in taking the SAS based certification.

In future blogs, I will try to cover each of those questions individually with some demonstrations.

Good luck!

Your comments and questions are valued and encouraged.
Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica, Open Source and Oracle Clinical.


anayansi gamboa

Describe Your Table in SAS to Write the SQL Code

Describe Your Table in SAS to Write the SQL Code
By Stephen Overton posted on the

Ever had the need to write SQL create statements for existing tables but felt too lazy to write it by hand? Ever wanted to reverse engineer tables into SQL code? Have no fear, PROC SQL is here. Use the DESCRIBE statement to get the full blown SQL code to create the table. This is particularly good for generating empty table structures to insert data with ETL code. In the data warehousing world, having SQL code to create empty tables is referred to as the Data Definition Language (DDL).

Example: Describe Statement Using PROC SQL
Running the following describe statement produces the SQL create statement to define the table and columns.

proc sql;
describe table SASHELP.CLASS;

The SAS log provides the create statement:


SAS Support provides additional information on the DESCRIBE statement.

Source: Steve Overton, SAS BI Notes