Oracle Architecture

A pretty reckless description of what happens when you start Oracle and when you execute a statement. If you’re a developer, it could influence how you code. If you’re a junior DBA, or would be junior DBA, I hope it will help you make sense of some parts of the “Concepts” manual.

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source:  Konagora

Got EDC?

Clinical trials play a key role in the pharmaceutical, biotechnology and medical device industry. With a large number of drugs coming off patent, companies are under pressure to develop and test new drugs as swiftly and efficiently as possible. This requires an increase in clinical trials and a reduction in the time cycle of those trials.

What is Electronic Data Capture?

Electronic Data Capture or EDC is the gathering of data collected by humans into computer systems without the need for manual data re-entry. EDC systems can speed time-to-market, reduce data entry errors, provide for early analysis and trend monitoring.

Data entry can be achieved using a number of mechanisms. Users can enter data directly into an electronic device such as a laptop PC, handheld device, tablet PC, and touch screen or tone dialing system such as IVRS.

EDC has been around for more than 10 years and still only about a third of all studies use it – the rest using paper-based data collection process.

While EDC has many advantages, a barrier to success is the expectation that the system is ready at the time of the enrollment of the first subject (patient). In order to achieve this target, one must have agreement on the data to be collected during the trial. I have found that this requirement frequently changes between the finalization of the protocol and first subject in. When paper case report forms are used, these changes to data can be more easily accommodated.

For EDC to become increasingly used in the pharmaceutical industry, they need to address the challenges prior to implementation. However, while pilot studies have been successful, pharmaceutical companies have not yet implemented EDC across the majority of their clinical trials. They are constrained by a lack of strategic planning, the varying requirements of each trial, the relative immaturity and fragmentation of the EDC software market and the need to address both process and change management.

Challenges:

  • What data are you gathering?
  • Is the site and clinical personal fully trained? What hardware do they have available?
  • Validation – Is validation of input data required?
  • Workflow – Do the data need validating, reviewing, approving and releasing for general consumption locally and centrally?
  • Integration – Does the system need integrating with other computer systems? Regulations. Does the system have to conform to any externally imposed regulations, such as 21 CFR Part 11, Good Manufacturing Practice, or Good Laboratory Practice?

EDC eliminate the transcription error (paper-based errors) and therefore transcription errors.

Benefits:

  • Double-Data Entry – The manual re-entry of data recorded on paper is expensive and unreliable
  • Validation – Immediate validation of data entry
  • User Friendly Web Forms – Data entry is quick and efficient
  • Access to real time data – quick executive level decision and information

EDC helps reduce invalid data entries and speed up the availability of drug trial information.

While the EDC technology still faces some challenges, benefits will drive acceptance. A leading stimulation to growth will be the reduction in price and increased sophistication and power of small handheld devices. These developments are only useful if accepted by the users. Not all EDC systems are created equally, and one must carefully pick a system that best meets your need.

If a company changes from paper-based system to EDC system, this ‘innovation’ will have a human side. Successful implementation is not just a matter of installing the software and announcing the change. Stakeholders, who are responsible for collecting, processing and communicating clinical trial data must adopt and adapt to the new systems – ensuring that a technical innovation is actually successfully adopted.

With current standards CDISC initiatives, it is inevitable that the FDA will eventually demand all information be provided in this format — although no date has yet been set.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

SAS for True Beginners Part 2

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: Wilton Graves, University of Georgia

SAS for True Beginners Part 1

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: Wilton Graves, University of Georgia

iReview in Clinical Data Management

JReview® is the web-enabled version of Integrated Review™ (iReview). It allows users to view, create, print, and interact with their Integrated Review™ objects locally on an Intranet or securely over the Internet. JReview® can be run in two different modes of operation (authoring and non-authoring) in addition to two modes of communication (clear-text and SSL).

iReview Common Development Practice:

  • iReview allows you to saved the library of objects to be deployed at “Global” level in the production environment.
  • Create separate categories (folders for DEV/QC/UAT) before approval (deployment into production)
    – “Development”
    – “QC”
  • Create study specific folders under those categories (e.g. DEV/QC/UAT)
  • Configure UserGroups to manage privileges appropriately at the category level– – “Developers can access – Development”
    – “QR/QT can access QC”

QC/UAT PROCESS

  • You can query iReview metadata
  • Business rule verification by checking

– “Panel names, item names”
– “Object location e.g. Public, private or usergroup”

  • Use of SQL to query iReview objects metadata
  • The information in CONTENTBLOCK is parsed to get
    additional metadata information for a particular iReview
    object
  • Define a detailed QC checklist for each object in the Global Library
  • Maintain a lessons learned document (knowledge base) to improve the development process

  • Continuously improve processes by collecting Metrics

    – Development time

    – QC time

    – Rework time

Advance Functionality

  • Deploy reports with dynamic Filter values
  • Filter values are not static and change during trial conduct
  • Deployment for non-technical end-users
  • Provide easy access to report
  • Create Lookup table(s) in the backend
  • Populate Lookup table(s) with study specific Filter values

  • Using “Filter Output” in IR, add appropriate nested queries to the WHERE clause

  • The use of ImportSQL, more complex dynamic filtering so no need to hardcode values in the front end

  • Saves development time by avoiding the creation of study specific filters and increases re-usability

  • Flexibility to activate/inactivate filter values via backend

Import SQL

  • Modify an Import SQL panel by adding more items will not impact existing reports already using this Import SQL

  • Import SQL has a limitation with max of 2000 characters (will result in the error below)

A workaround would be to create a stored procedure or a view

Patient Selection Criteria

  • Modifying a PSC has no impact on already saved existing reports using this PSC

Object Specifications window

  • Removing Objects (missing folders)
    – When all the objects are removed from a folder in the Object
    Specifications window, the folder with no objects will be hidden but
    not removed
    e.g. Drug Safety ..> All AEs ..> SAE Reports ..>SAE reconciliation
  • Removing all objects under “SAE Reports” folder will result in the “SAE Reports” folder being hidden
  • The workaround would be to use the Category section of Object Management tool to remove these hidden folders

Navigating iReview Windows

  • If you have hundreds of saved objects, typing the first few letters (similar to Windows Explorer) will help with easy scrolling and navigation in the Object Specifications window

Reference: Integrated Clinical Systems, Inc.

Concatenating SAS data sets with the APPEND procedure

The APPEND procedure is an efficient method for concatenating observations from a smaller data set to a larger data set. The BASE= data set option is reserved for the larger of the two data sets with the DATA= option for the smaller data set. Essentially, the APPEND procedure avoids reading any observations in the BASE= data set by positioning the record pointer at the end of the BASE= data set.

If no additional processing is necessary, using PROC APPEND or the APPEND statement in PROC DATASETS is more efficient than using a DATA step to concatenate data sets.

Each observation from the smaller data set is then applied one at a time to the end of the BASE= data set. In the next example, the BASE= data set identifies a larger data set called MASTER and the DATA= data set identifies the smaller SOURCEDATA data set.

When one or more variables in the input data set (DATA=) are not present in the BASE= data set, an optional FORCE option can be specified as an option with the PROC APPEND statement to prevent a syntax error

PROC APPEND
BASE=master
DATA=sourcedata;
RUN;

WARNING: Variable charvar has different lengths on BASE and DATA files (BASE 25 DATA 30).
ERROR: No appending done because of anomalies listed above. Use FORCE option to append these files.

As the LOG message suggest, no appending is done and use of the FORCE option is necessary to append the input files. The consequent required syntax to correct this issue would look like this:

PROC APPEND
BASE=master
DATA=sourcedata force;
RUN;

WARNING: Variable charvar has different lengths on BASE and DATA files (BASE 25 DATA 30).
NOTE: FORCE is specified, so dropping/truncating will occur.

When two or more data sets need to be concatenated, multiple APPEND procedures are issued. In the next example, two separate PROC APPEND steps are specified to concatenate the two smaller data sets (mysasdata1 and mysasdata2) at the end of the larger BASE= data set.

PROC APPEND
BASE=master
DATA=mysasdata1;
RUN;

PROC APPEND
BASE=master
DATA=mysasdata2;
RUN;

You cannot use PROC APPEND to add observations to a SAS data set in a sequential library

Reference: SAS 9.2 Language Reference Concepts 2nd edition

From Non-SAS Programmer to SAS Programmer

SAS Programmers come from many different educational backgrounds. Many has started their careers as a Data Manager in a CRO environment and grew to become a SAS programmer. Others have gone to college and pursued degrees in math, statistics or computer science degree.

Do you have SAS Skills? First, you need to find out more about statistical programming desire skills and start to slowly learn what SAS programmers and statisticians do in the pharmaceutical industry. It is also important to understand the Drug Development and Regulatory process so that you have a better understanding of the industry as a whole as well as the drug approval process.

In addition, I have personally attended several workshop on Statistics for Non-statistician provided by several of my past employers/clients (GSK, Sanofi-Aventis, etc) so I could have a greater understanding of statistics role. I am personally more inclined to the EDC development than becoming a biostatistician but these are just some of the few steps you could take to grow your career as a SAS programmer.

Practice, Practice, Practice!

To begin learning how to actually program in SAS, it would be a good idea to enroll to a SAS course provided by the SAS Institute near you or via eLearning. I have taken the course SAS Programming 1: Essentials, and I would recommended. You could also join SUGI conferences and other user groups near your city/country. Seek every opportunity to help you gain further understanding on how to efficiently program in the pharmaceutical industry. It could well land you a Junior SAS programming position.

Transitioning to a SAS Programming role: Now that you have gotten your first SAS programming job, you will need to continue your professional development and attend additional training, workshops, seminars and study workgroup meetings. The SAS Institute provide a second level, more advance course Programming II: Manipulating Data with the Data Step, SAS Macro Language and SAS macro Programming Advanced topics. There are also SAS certifications courses available to help you prepare to become a SAS certified programmer.

There is a light at the end of the tunnel: Advance!

Your ongoing development will be very exciting and challenging. Continued attending SAS classes as needed and attending industry related conferences such as PharmaSUG to gain additional knowledge and insight on how to perform your job more effectively and efficiently.

As you can see, it is possible to ‘grow’ a SAS programmer from a non-programming background to an experience programmer. All of the classes, training, and projects you will work on are crucial in expanding your SAS knowledge and will allow you to have a very exciting career opportunity ahead of you.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

Good Programming Practice for Clinical Trials by Sunil Gupta

The following are draft recommendations a Good Programming Practice for analysis, reporting and data manipulation in Clinical Trials and the Healthcare industries.

The purpose is to encourage contributions from across companies, non-profit organizations and regulators in an attempt to create a consensus recommendation. The ambition is that this page becomes recognized by the Pharmaceutical Industry, Clinical Research and Health Care Organizations as well as Regulatory Authorities.

The hope is that the Practice can be reviewed and endorsedby the relevant management teams of several Pharmaceutical companies and major Contract Research Organizations and promoted through relevant professional organizations such as PharmaSUG, PhUSE, PSI, CDISC.

Introduction

The Good Programming Practices are defined in order to:

  • Ensure the clarity of the code and facilitate code review;
  • Save time in case of maintenance, and ease the transfer of code among programmers or companies;
  • Minimize the need for code maintenance by robust programming;
  • Minimize the development effort by development and re-use of standard code and by use of dynamic (easily adaptable) code;
  • Minimize the resources needed at execution time (improve the efficiency of the code);
  • Reduce the risk of logical errors.
  • Meet regulatory requirements regarding validation and 21CFRPart11 compliance

Note: As often, the various guidelines provided hereafter may conflict with one another if applied in too rigorous a way. Clarity, efficiency, re-usability, adaptability and robustness of the code are all important, and must be balanced in the programming practice.

Regulatory Requirements

Validation

21CFRPart11 compliance

Readability and Maintainability

Language

English is an international language and study protocols, study reports for practical reasons (regulatory authorities, inlicensing, outlicensing, partnerships, mergers) are mostly written in English, therefore it is recommended to write the SAS code and comments in English.

Header and Revision History

  • Include a header for every program (template below).
**********************************************************;* Program name      :** Author            :** Date created      :** Study             : (Study number)*                     (Study title)** Purpose           :** Template          :** Inputs            :** Outputs           :** Program completed : Yes/No** Updated by        : (Name) – (Date): *                            (Modification and Reason)**********************************************************;
  • In addition to your name or initials, use your login ID to identify yourself in the header. This is so there is no ambiguity on the identify of each programmer.
  • Update the revision history at each code modification made after the finalization of the first version of a program.

Note: When you copy a program from another study, you became the author of this program, and you should clear the revision history. You can specify the origin of the program under the “Template” section of the header.

Below is an example with comments of an alternative comment block that I think is more useful for Open Source programming. PaulOldenKamp 16:54, 5 April 2009 (UTC)

/** ---------------------------------------------------------------------------------------- $Id: os3a_autoexec.sas 152 2008-11-17 01:48:40Z Paul_ok01 $ <== Id info automatically inserted with each commit to Subversion version control Application: OS3A - Common Programs Description: OS3A session initialization program. Previous Program: None Saved as: c:\os3a\trunk\os3a_autoexec.sas <== locations, local and web, where the pgm http://os3a.svn.sourceforge.net/viewvc/os3a/trunk/os3a_autoexec.sas can be found Change History: Date Prog Ref Description 04/26/2008 PMO [1] Initial programming for os3a <== date, programmer initials, ref number [1] to link to specific location of change Copyright: Copyright (c) 2008 OS3A Program. All rights reserved. <== Always tell who owns the program so one Copyright Contact: paul_ok01@users.sourceforge.net can ask permission from the copyright holder License: Eclipse Public License v1.0 <== Tell folks how they are licenced to use This program and the accompanying materials are made available under the program. the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at www.eclipse.org/legal/epl-v10.html Contributors: Paul OldenKamp, POK_Programming@OldenKamp.org - Develop initial pgm. <== Identify significant contributors <== tag identifies start of info @purpose OS3A system initialization program. used by the Codedoc Perl script to Set up initial options and global macro variables. produce external HTML documentation @param SYSPARM - input provided from program initiation call; default: main_FORE for production. @symbol sysRoot - system root location; Windows - C:/, UNIX - / @symbol remove_cmd - system command to remove file; Windows - erase, UNIX - rm @symbol os3aRoot - directory location of os3a root @symbol futsRoot - directory location of FUTS top level macros. @symbol Root - directory location of sub-project identified with Four Letter Acronym ----------------------------------------------------------------------------------------- */

The results from encoding the header and comments in a SAS program can be seen on the CodeDoc web page. See http://www.thotwave.com/products/codedoc.jsp.
CodeDoc Download Page

Comments

  • Include a comment before each major DATA/PROC step, especially when you are doing something complex or non-standard. Comments should be comprehensive, and should describe the rationale and not simply the action. For example, do not comment “Access demography data”; instead explain which data elements and why they are needed.
  • Organize the comments into a hierarchy.
  • Do not include numbers in comments.

Reason: It avoids heavy update when removing or inserting sections.

Naming Conventions

  • Use explicit and meaningful names for variables and datasets, with a maximum length of 8.
  • For permanent datasets, use a meaningful dataset label and variable labels.
  • When possible, never use the same name for a dataset more than once in the program.

Note: However, keep in mind that large intermediate files take a lot of SAS Workspace.

  • Name IN variable using “in” plus a meaningful reference to the dataset.

Example:

data aelst;   merge aesaes (in=inae) patpat (in=inpat);   by patno;   if inae and inpat;run;
  • Labels must have a maximum length of 40 characters.

Code Structure

  • It is mandatory to include libnames, options and formats in a separate setup program unless these are temporary formats or temporary options that are reset after being used.

Reason: It will guarantee that changes of the environment are taken into account in all programs run afterwards.

  • Use standard company macros to read in libnames and settings, to write out datasets, and for standard calculating and reporting.
  • One statement per line, but several are allowed if small and repeated or related. Long statements should be split across multiple lines.
  • Control system settings to show all executed code in the log as the default, in as clear manner as possible. The log should not be so lengthy that the programmer cannot easily navigate (if so, use highly visible comments with sufficient white space). System settings should be able to be easily changed in order for a user to debug a section of code or a macro, in order to temporarily display the %included code, resolved macro names, and logic.
  • Use a standard sequence for placing statements and group like statements together.
  1. Within a program:
    1. %LET statements and macro definitions
    2. Input steps
    3. Calculations
    4. Save final (permanent) datasets and created outputs
  2. Within a DATA step:
    1. All non-executable statements first (e.g. ATTRIB, LENGTH, KEEP…)
    2. All executable statements next

Reason: It increases the readability of the program.

  • Left-justify DATA, PROC, OPTIONS statements, indent all statements within.

Example:

proc means data=osevit;   var prmres;   by prmcod treat;run;
  • End every DATA/PROC step with a left-aligned RUN statement.

Reason: It explicitly defines the step boundary.

  • Insert at least one blank line after each RUN statement in DATA/PROC steps.
  • Indent statements within a DO loop, align END with DO.
  • Avoid having too many nested DO loop and IF-ELSE statements.
  • In case of interlinked DO loop, add a comment at the start (DO) and end (END) of each loop.

Example:

data test01;  do patno=1 to 40; * cycle thru patients;    do visit=1 to 3; * cycle thru visits;      output;     end; * cycle thru visits;  end; * cycle thru patients;run;
  • Insert parentheses in meaningful places in order to clarify the sequence in which mathematical or logical operations are performed.

Example:

data test02;  set test01;  if (visit=0  and vdate lt adate1)   or (visit=99 and vdate gt adate2) then delete;run;

Style conventions

Draft section : this may not be specific to clinical programming, but may be of use when considering a general standard for sharing programs.

Use of analysis datasets

For discussion of why programming output directly from raw data is generally avoided.

Efficiency

  • When you input or output a SAS dataset, use a KEEP (preferred to DROP) statement to keep only the needed variables.

Reason: The SAS system loads only the specified variables into the Program Data Vector, eliminating all other variables from being loaded.

  • When subsetting a SAS dataset, use a WHERE statement rather than IF, if possible.

Reason: WHERE subsets the data before entering it into the Program Data Vector, whereas IF subsets the data after inputting the entire dataset.

  • When using IF condition, use IF/ELSE for mutually exclusive conditions, and check the most likely condition first.

Reason: The ELSE/IF will check only those observations that fail the first IF condition. With the IF/IF, all observations will be checked twice. Also, consider the use of a SELECT statement instead of IF/ELSE, as it may be more readable.

  • Avoid unnecessary sorting. CLASS statement can be used in some procedure to perform by-group processing without sorting the data.

Example:

proc means data=osevit;  var prmres;  class treat;run;
  • If possible (i.e. not a sorting variable), use character values for categorical variables or flags instead of numeric values.

Reason: It saves space. A character “1” uses one byte (if length is set to one), whereas a numeric 1 uses eight bytes.

  • Use the LENGTH statement to reduce variable size.

Reason: Storage space can be reduced significantly.
Note: Keep in mind that a too limited variable length could reduce the robustness of the code (lead to truncation with different sets of data).

  • Use simple macros for repeating code.

Robustness

  • Use the MSGLEVEL=I option in order to have all informational, note, warning, and error messages sent to the LOG.
  • In the final code, there should be no dead code that does not work or that is not used. This must be removed from the program.
  • Code to allow checking of the program or of the data (on all data or on a subset of patients such as clean patients, discontinued patients, patients with SAE or patients with odd data) is encouraged and should be built throughout the program. This code can be easily activated during the development phase or commented out during a production run using the piece of code detailed in Section 6.
  • It is not acceptable to have avoidable notes or warnings in the log (mandatory).

Reason: They can often lead to ambiguities, confusion, or actual error (e.g. erroneous merging, uninitialized variables, automatic numeric/character conversions, automatic formatting, operation on missing data…).
Note: If such a warning message is unavoidable, an explanation has to be given in the program (mandatory).

  • Always use DATA= in a PROC statement (mandatory).

Reason: It ensures correct dataset referencing, makes program easy to follow, and provides internal documentation.

  • Be careful when merging datasets. Erroneous merging may occur when:
  1. No BY statement is specified (set system option MERGENOBY=WARN or ERROR).
  2. Some variables, other than BY variables, exist in the two datasets (set system option MSGLEVEL=I), S writes a warning to the SAS log whenever a MERGE statement would cause variables to be overwritten at which the values of the last dataset on the MERGE statement are kept).
  3. More than one dataset contain repeats of BY values. A WARNING though not an ERROR is produced in the LOG. If you really need, PROC SQL is the only way to perform such many-to-many merges.

Reason: One has to routinely carefully check the SASLOG as the above leads to WARNING messages rather ERROR messages yet the resulting dataset is rarely correct.

  • When coding IF-THEN-ELSE constructs use a final ELSE statement to trap any observations that do not meet the conditions in the IF-THEN clauses.

Reason: You can only be sure that all possible combinations of data are covered if there is a final ELSE statement.

  • When coding a user-defined FORMAT, include the keyword ‘other’ on the left side of the equals sign so that all possible values have an entry in the format.

Reason: A missing entry in a user-defined FORMAT can be difficult to detect. The simplest way to identify this potential problem is to ensure that all values are assigned a format.
Note: This does not apply to INFORMATs. It could be more helpful to get a WARNING message when trying to INPUT data of unexpected format.

  • Try to produce code that will operate correctly with unusual data in unexpected situations (e.g. missing data).

Code for Data Checks

Build checks so that their purpose is clear, so that they can be toggled on or off, and remove them once they are no longer needed.

Activate/Deactivate Pieces of Code

In the beginning of the program, define a macro variable that you set to blank during the development phase or that you set equal to * for the production run:

%let c=;  or  %let c=*;

For the pieces of code that check the data/program, start each line with the macro variable defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c run;

This code will be executed if &c is blank (development), but will be commented out when &c=* (production).

Perform Checks on a Subset of Patients

In a separate code that you store under the study MACRO folder, list the subset of patients (clean patients, discontinued patients, patients with SAE or patients with odd data) that you want to look at:

%macro select;2076 2162 2271 2449%mend;

In the beginning of the program, define a second macro variable that you set equal to * when you want to perform checks on all data or to blank when you are interested in a subset of patients:

%let s=*;  or  %let s=;

For each checking code, add a piece of code that allows subsetting the data, and start each line of this piece of code with the 2 macro variables defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c &s where patno in (%select);&c run;

The check will be performed only if &c is blank, and it will be applied to all patients if &s=* or on the subset of patients if &s is blank.
Better still: input the list of check case IDs as a dataset.

Floating Point Error

Consider the real number system that we are familiar with. This decimal system (0 → ± ∞) is obviously infinite. Most computers use floating point representation, in which, a finite set of numbers is used to represent the infinite real number system. Thus, we can deduce that we will have some sort of error appearing from time to time. This is more generally termed Floating Point Error and occurs in computers due to hardware limitations.

The following paper goes someway in explaining why and how this happens and also possible solutions in how to approach this issue.

Paper reference:- http://www.lexjansen.com/phuse/2008/cs/cs08.pdf

Data Imputation versus Hardcoding

Please contribute

  • Definitions
    • Data Imputation
    • Hardcoding
  • Issues
  • Recommendation

Integrity of a data transfer

At a minimum, all data transfers should be validated by checking the observation counts for each SAS dataset
or the record counts in other formats, against counts provided by the sender. It is also helpful if the sender
can provide a checksum for each file transferred since this also ensures all content made it’s way to you
without transmission errors. There are many freeware programs available to calculate checksums for any file at websites such as http://sourceforge.net/ .

Macros

Draft section : recommendations particularly relevant for the development and use of macros and macro libraries.

Macros are particularly useful under the following circumstances:

  • Program code is used repeatedly
  • A number of steps must be taken conditionally, and the logic for these is clearly fixed (no need to think of all the steps that should be included in a program under a specific situation: the macro will deduce them for you and generate the appropriate data step or proc step code)
  • There is no trivial solution via “ordinary” SAS code
  • Their application must be easier as to program the code itself!
  • The usage helps users avoiding errors and omissions.

If used appropriately the following benefits can be achieved:

  • Increase in quality by avoiding programming bugs and errors
  • Savings in time and resources
  • Enforcement of standards, e.g. standard methods and standard outputs
  • Work can be more enjoyable as programmers can focus on the non-routine work

Ideally Macro development should follow a few rules:

  • Macro headers should clearly state all changes to environment and data that result from execution. Changes should be limited to those necessary for the focused purpose of the macro:
    • strictly controlled changes to input data and creation of output data
    • clear temporary data set clutter
    • no unexpected changes to system settings (options, titles, footnotes, etc)
    • no unexpected changes to external symbol tables
  • Scope of macro variables should be explicitly controlled using %global and %local statements.
  • Method of macro variable creation should demonstrate awareness of default scope:
  • The log matters:
    • Use Base SAS techniques whenever possible to avoid excessive code generation (log bloat). For example, macro definition should use DATA step array and DO loop processing rather than Macro %DO looping.
    • But use pure Macro Language for routine utility macros (see details, below).
    • Use appropriate comment style in macro definitions to properly annotate the SAS log when MPRINT in on. For example, use %* style commenting to explain macro logic, but /* style commenting to explain resulting code. (Or * style or PUT statement commenting as appropriate.)
    • Allow the users to control the appearance of the log via MPRINT, SYMBOLGEN, and MLOGIC.
  • Code within a macro definition should be germane, limited to the specific purpose of the macro. The use of a central repository for macros (“Macro library”) is suggested.
  • Macro Library: Code for routine tasks (eg, parameter checking, system and environment checking, messaging, etc.) should be handled by dedicated utility macros. Code for such routine tasks should not overwhelm the current macro definition, obscuring the purpose, and creating unnecessary maintenance overhead and lack of consistency within a library.
  • Macro Library: Parameter naming conventions should be used for common parameters such as input/output libnames and data sets. Explicit and transparent control of macro variable scope again becomes crucial to avoid accidental change of external symbol tables
  • Macro Library: Use pure Macro Language definitions whenever possible to improve program flow and avoid producing unnecessary Base SAS code. Returning a list of data set variable, checking for macro var existence, returning data set obs count can all be achieved without BASE SAS code. Such macros can be called “inline” without unnecessary overhead or interruption of program flow.

For example, instead of %count_ds_obs definition that uses DATA Step code and interrupts program flow like

%let n_obs = %count_ds_obs(DSIN=myData);%if &n_obs > 0 %then %do;  ... more statements ...%end;

an inline, pure Macro Language implemetation allows streamlined code:

%if %inline_ds_obs(DSIN=myData) > 0 %then %do;  ... more statements ...%end;

Source: Sunil Gupta, Senior SAS Consultant, Gupta Programming http://www.sascommunity.org/wiki/Good_Programming_Practice_for_Clinical_Trials

CDISC Will Be Required by Kit Howard

Over the past decade or so, a tremendous amount of development work has gone into creating standards for data to be submitted electronically to the FDA. There are those, however, who question whether these standards will ever be required, and believe that unless the FDA requires them, the standards don’t have to be adopted. Indeed, in the absence of such a requirement, there is sometimes a good business case for not adopting the standards, for example, if the company already has their own robust standards, or when the exit strategy is to achieve proof of concept and then sell. The writing is on the wall, however. Standards will be required, at least for data submitted to FDA, and that includes CDER (drugs), CBER (biologics) and CDRH (devices). This article will present evidence of that, and cover sources for the agency’s expectations.

In February 2012, FDA issued a Notice of Proposed Rule Making called Electronic Submission of Data From Studies Evaluating Human Drugs and Biologics. This Notice outlined the FDA’s intention to require that all data submitted in support of clinical trials would have to be sent in “an electronic format that FDA can process, review, and archive.” This will cover new drug applications (NDAs, usually submitted to CDER), biologic license applications (BLAs, usually submitted to CBER and sometimes jointly to CDRH), and abbreviated NDAs (ANDAs, usually submitted to CDER and CBER, and sometimes jointly to CDRH), and all of the associated supplements and amendments. Later in the Notice, it states that “FDA’s proposed rule would address the submission of study data in a standardized electronic format.” While CDISC is not specifically mentioned, it is referenced in several other FDA documents.

There isn’t currently a rule specifically mentioning CDRH requiring standards, but there is a very strong commitment internally to adopting CDISC, so the probability is good that there will eventually be some kind of regulatory requirement.

Guidance
Providing Regulatory Submissions in Electronic Format — Standardized Study Data http://www.fda.gov/downloads/Drugs/ GuidanceComplianceRegulatoryInformation/ Guidances/UCM292334.pdf

FDA is issuing a series of guidances designed to help sponsors submit data electronically, and the most recent is a draft guidance titled Providing Regulatory Submissions in Electronic Format — Standardized Study Data. Its primary purpose is to state that data submitted to the agency must be submitted electronically and in a standardized structure. The draft was issued in February, and the comments period closed on April 23.

The guidance covers a much wider set of submissions than the proposed rule, including both clinical and nonclinical data submitted in investigational new drug applications (INDs), NDAs, ANDAs and BLAs, and also medical device data in investigational device exemptions (IDEs), pre‐ marketing notifications (510(k)s), and premarketing approval applications (PMAs). It also includes all supplements and amendments. It effectively means that all clinical and nonclinical data going to CDER, CBER and CDRH are covered.

The guidance describes the kinds of data the agency receives and why it will require that the data be standardized. The reasons are much as you’d expect ‐ they can receive, import, analyze and report on the data much more efficiently and spend much less time working out what data are where. The guidance then goes on to discuss processes for data that were captured in a standardized format and those that weren’t.

It is important to note that the guidance does not state that CDISC standards must be used, but rather says that standards must be used, and to see the FDA’s standards page for the specific standards that are currently required. This permits changes to the standards to happen without having to update the guidance. Most of the data‐specific examples use CDISC, and the strong hint is that this is what should be used.

Here are some key points:

  • The guidance clearly states that collecting data in the same standardized format in which they will be submitted is preferable to converting after collection, a practice that is discouraged.
  • For both clinical and nonclinical studies, the sponsor should outline their standards plan in the IND or IDE. This means that the standards strategy must be established before the clinical program starts. This has implications for companies that currently expect to do only a few studies and license the product out ‐ we should expect to see conformance to standards become a part of the assessment of quality of study data.
  • When study data are submitted, the cover letter should include a description of the standards used. The guidance specifically states that it does not define what should be collected, just that everything that is collected should be submitted in a standardized fashion.
  • Each submission should contain a Reviewers’ Guide that orients the user to the structure and content of the datasets. There is no guidance currently available as to the desired content of the guide at this time.
  • Controlled terminologies must be used, and the submission must note what ones were used. The guidance discusses the value of such terminologies.
  • In cases where non‐standard data are mapped to the standards, any variable that cannot be fully or exactly mapped must be documented and explained. Data validation as per the guidance is the process of making sure that the data conform to the standards rules. It is one aspect of data quality. There are 2 types of validation rules ‐ technical validation rules (e.g., is the variable name correct) and business validation rules (do the data make sense, e.g., is the heart rate compatible with life?).
  • The guidance notes that different parts of the FDA are adopting standards at different rates, and there‐ fore there may be exceptions to the requirements. Sponsors can request meetings with the review division to determine what the requirements will be for their studies/submissions.

Online Resources for FDA’s Standards Expectations

The FDA is developing a robust set of online resources to help organizations understand and adopt standards in the way that is most useful for all parties involved. Most can be found via the primary data standards page, which is FDA Resources for Data Standards, http://www.fda.gov/ForIndustry/DataStandards/ default.htm, and can be accessed from the main FDA page by clicking on For Industry, and scrolling down to the For Industry Topics to Data Standards. Note that the FDA websites are reorganized periodically, so this may change in time. Among other things, this includes links to the pages for the Structured Product Labeling and Individual Case Safety Report (terminology below), the Study Data Standards page (discussed below), and a link to the FDA’s Data Council, which has overall responsibility for FDA standards. Be aware that there are no links back to the Data Standards page from any of the linked pages, which is rather annoying.

This part of this article lists and describes some of the resources that are available on this page a number of them that may be of particular use.

The Study Data Standards page has the largest number of standards resources. Its url is: http://www.fda.gov/ForIndustry/DataStandards/StudyDataStandards/default.htm. Here are some of the resources you can find there.

Study Data Specifications: a highly technical description of the software format the datasets should use (SAS XPORT), the file naming conventions, conventions for splitting large files, the format for the data tabulations (CDISC SDTM (human trials) or SEND (animal trials)), and the structure for data listings and patients profiles (this is quite limited). While CDISC ADaM is not specified, there are a number of rules for structuring the analysis datasets, including the need to provide both text and numeric versions of coded variables, and dates should be numeric rather than ISO8601. A define.xml is required that describes the dataset structures, an annotated CRF must be included, and a folder structure for the files is defined.

Study Data File Format Standards: a list of the file for‐ mats and what can be used to send what to whom (e.g., send study datasets to CDER, CBER and CDRH using SAS XPORT, send SDTM and ADaM define.xml to CBER and CDER using XML v1.0)

Study Data Exchange and Analysis Standards: speci‐ fies SDTM versions 1.1 and 1.2, the SDTM IG v3.1.1 and 3.1.2, SEND IG v3.0, and ADaM v2.1.

Study Data Terminology Standards: these are the code lists and dictionaries that should be used, including CDISC Terminology v 2011‐06‐10 or later, MedDRA v8 or later, CDRH Device Problem codes, and others.

Study Data Validation Rules: this is a list of rules to which submission data must conform, or it will be rejected by the agency’s data uploader.
There are also center‐specific pages that may be useful.

CBER: http://www.fda.gov/BiologicsBloodVaccines/ DevelopmentApprovalProcess/ucm209137.htm. This page has a very specific list of the steps you need to take to be ready to submit data to CBER in electronic format. It is critical that sponsors review and follow this list, as it is the first thing that the CBER contact (Amy Malla) will ask you. CBER started accepting CDISC‐formatted data in December 2010.

CDER: http://www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/ ucm248635.htm. The webpage starts with the following statement:

CDER strongly encourages IND sponsors and NDA applicants to consider the implementation and use of data standards for the submission of applications. Such implementation should occur as early as possible in the product development lifecycle, so that data standards are accounted for in the design, conduct, and analysis of studies.

It can’t get much blunter than that. It then references the Study Data Specifications document discussed above, the CDISC SDTM, SEND and ADaM, and includes a list of Data Standards Common Issues that lists the problems that CDER most commonly sees with respect to the standardization of submissions. Sponsors can use this to ensure they do not commit these errors. This document also suggests that CDASH‐style CRFs are best for simplifying the process of creating SDTM domains.

CDER began accepting CDISC‐formatted data some number of years ago.

  • CDRH: there isn’t a standards‐specific page for CDRH but they started accepting CDISC‐formatted data in May 2011. The same person (Amy Malla) is the contact for submitting data electronically to CDRH, and the list of steps on the CBER page is intended for use for CDRH data as well. CDRH has some standards terminologies defined that sponsors are ex‐ pected to use.

Some other resources that may be helpful are as follows.

  • A consolidated list of guidances (both draft and final) relating to electronic submissions http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm064994.htm
  • FDA terminology hosted by the National Cancer Institute’s Enterprise Vocabulary Services (the same organization that hosts CDISC’s controlled terms) http://www.cancer.gov/cancertopics/cancerlibrary/ terminologyresources/fda
  • Structured product labeling terms, in‐ cluding Mechanism of Action, Physiologic Effect, Structural Class and Problem List Subset
  • Unique Ingredient Identifier (UNII)
  • Individual Case Safety Report
  • CDRH’s terminology, including Event Problem Codes that cover Patient Prob‐ lem Codes, Device Component Codes and Device Problem Codes. Note that these are listed with the link of Center for Devices and Radiological Health, rather than by the name of the code lists.

For the sake of completeness, here are the links for the CDISC standards

These are currently available resources, but of course this will grow as the agency develops more.

Kit Howard is a Clinical Data Standards, Quality and Management consultant for Kestrel Consultants.

Reference: The Proposed Rule

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

FDA and Clinical Trials

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”