Tag Archives: open source

The Only Three (3) [Programming] Languages You Should Learn Right Now (eClinical Speaking)

On a previous article that I wrote in 2012, I mentioned 4 programming languages that you should be learning when it comes to the development of clinical trials. Why is this important, you may ask? Clinical Trials is a method to determine if a new drug or treatment will work on disease or will it be beneficial to patients. Anayansi Gamboa - Clinical Data Management Process If you have never written a line of code in your life, you are in the right place. If you have some programming experience, but interesting in learning clinical programming, this information can be helpful.

But shouldn’t I be Learning ________?

Here are the latest eClinical programming languages you should learn:

1. SAS®: Data analysis and result reporting are two major tasks to SAS® programers. Currently, SAS is offering certifications as a Clinical Trials Programmer. Some of the skills you should learned are:

  • clinical trials process
  • accessing, managing, and transforming clinical trials data
  • statistical procedures and macro programming
  • reporting clinical trials results
  • validating clinical trial data reporting

2. ODM/XML: Operational Data Modeling or ODM uses XML to build the standard data exchange models that are being developed to support the data acquisition, exchange and archiving of operational data.

3. CDISC Language: Yes. This is not just any code. This is the standard language on clinical trials and you should be learning it right now. The future is here now. The EDC code as we know it will eventually go away as more and more vendors try to adapt their systems and technologies to meet rules and regulations. Some of the skills you should learn:

  • Annotation of variables and variable values – SDTM aCRF
  • Define XML – CDISC SDTM datasets
  • ADaM datasets – CDISC ADaM datasets

CDISC has established data standards to speed-up data review and FDA is now suggesting that soon this will become the norm. Pharmaceuticals, bio-technologies companies and many sponsors within clinical research are now better equipped to improve CDISC implementation.

Everyone should learn to code

Therefore, SAS® and XML are now cooperating. XML Engine in SAS® v9.0 is built up so one can import a wide variety of XML documentation. SAS® does what is does best – statistics, and XML does what it does best – creating reportquality tables by taking advantage of the full feature set of the publishing software. This conversation can produce report-quality tables in an automated hands-off/light out process.

Standards are more than just CDISC

If you are looking for your next career in Clinical Data Management, then SAS and CDISC SDTM should land you into the right path of career development and job security.

Conclusion: Learn the basics and advanced SAS clinical programming concepts such as reading and manipulating clinical data. Using the clinical features and basic SAS programming concepts of clinical trials, you will be able to import ADAM, CDISC or other standards for domain structure and contents into the metadata, build clinical domain target table metadata from those standards, create jobs to load clinical domains, validate the structure and content of the clinical domains based on the standards, and to generate CDISC standard define.xml files that describes the domain tables for clinical submissions.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica – Open Source and Oracle Clinical.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Source:

SAS Institute
CDISC

Advertisements

OpenClinica: Printing subject casebooks, blank casebooks and blank CRFs

Wanna print subject casebooks using OpenClinica? This article is an extract from a video demo from the OpenClinica blog website. Click the link below now.

Source: http://blog.openclinica.com/2014/10/06/video-demos-printing-subject-casebooks-blank-casebooks-and-blank-crfs/

Happy Printing!!!

 

Fair Use Notice: This video contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes that go beyond fair use, you must obtain permission from the copyright owner. Fair Use notwithstanding we will immediately comply with any copyright owner who wants their material removed or modified, wants us to link to their website or wants us to add their photo.

The EDC Developer blog is “one man’s opinion”. Anything that is said on the report is either opinion, criticism, information or commentary, If making any type of investment or legal decision it would be wise to contact or consult a professional before making that decision.

Disclaimer:De inhoud van deze columns weerspiegelen niet per definitie de mening van {EDC Developer}.

ClinCapture® Tutorial – How to enter a patient schedule a visit and start entering data

ClinCapture® Tutorial – How to enter a patient, schedule a visit and start entering data – YouTube.

ClinCapture® is the most advanced open-source electronic data capture (EDC) system designed to streamline your clinical trials. As an open-source solution, ClinCapture® is tailored to meet the needs of life science companies looking to run cost-effective clinical trials.

ClinCapture® can be rapidly deployed and easily adopted, customizable to specific study requirements. ClinCapture® is repeatedly chosen as a preferred EDC solution because it is intuitive, flexible, and proven.

Source: Clinovo

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

 

anayansi gamboa

4 Programming Languages You Should Learn Right Now (eClinical Speaking)

I am a strong believer that learning a new language makes you better at the others, but I am not a “learn to code” advocate since a foreign language (I know 3 languages and currently learning my 4th and I have a “to learn” language including Italian and Arabic, if I ever find some free time) or even music (I love to play drums) are equally beneficial. But if you want to obtain a job in the pharmaceutical industry, here are the list of programming languages you should learn:

  1. C#:

What it is: A general-purpose, compiled, object-oriented programming language developed by Microsoft as part of its .NET initiative.

Why you should learn it: If you are looking to become a Medidata Custom Function programmer or Oracle InForm EDC Developer then you should.

2. Python:

What it is: An interpreted, dynamically object-oriented, open-source programming language that utilizes automatic memory management.

Why you should learn it: If you are like me always looking to learn new technology, love Google platforms and perhaps want to become a Timaeus Trial Builder, you should learn it. It is used on a lot open-source technologies.

Everyone should learn to code

3. PL/SQL or SQL:

What it is: PL/SQL stands for Procedural Language/SQL.

Why you should learn it: If you are like me additive to databases then Oracle should be your choice. If you want to become an Oracle Clinical programmer or Database administrator, you should learn Oracle PL/SQL.

4- SAS

What it is: SAS stands for “Statistical Analysis System” (software). It is the most powerful and comprehensive statistics software available.

Why you should learn it: SAS skills are in high demand nowadays. If you are able to obtain the SAS Certification and a few years of experience in the Pharmaceutical industry, you will be in good shape. If you are new and looking for training there are several options available from SAS Institute to private vendors such as Clinovo to even learning on your own. I most warn you as it will be difficult to obtain a job without experience. Nevertheless, once you are in, it can only get better.

Remember that your job is not just to code but to solve real problems. Your ability to code covers a lot of range of skills: from critical thinking, problem analysis & solving, logic, etc.

So which one are you going to give a try?

Let me know what is your preference. Happy Programming!

The best thing about a boolean is even if you are wrong, you are only off by a bit.(Anonymous)

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Source:
SAS Institute
Learn PL/SQL

Good Documentation Practice (GDP) for the EDC / SAS Developer

When writing programming codes for either validating the software or for validation checks, we often have to write comments to explain why we did something.

Since the FDA regulates computerized systems used in clinical trials under the authority of Title 21 the Code of Federal Regulations Part 11 (21 CFR Part 11) – see my other article about 21 CFR Part 11 here, we need to make sure our codes and programs are documented. As you have heard before, if it is not documented, it never happened. Nevertheless, there is no mandatory regulatory agency mandating to have to do this.

GDP is an expected practice”

So how much documentation is needed? We could get into endless discussions of when we should comment, what we should comment, and how much we should comment. I have had plenty of discussions about comments with people with various opinions on the subject.

Here’s a good documentation practice for a SAS code:

The program header was written to validate Clintrial (Oracle).

  • Program name, version, programmer and purpose.
  • Modifications
  • Risk Assessments

The second section contains information about the

  • quality testing, user testing
  • Macros, global variables and any other code that is reusable.

The document must tell the entire story about your program and must be readable by internal or external staff. Two other important things to remember, your program must be accurate “error free” and each section of your program must be traceable, such as who updated it, what and why.

Most companies have SOPs that requires you to record certain information. But do we understand what it is we are recording? or when it was recorded?

Standardized Documentation is KEY”

Do you have a preference? Tell me about it in the comments!

To hire me for services, you may contact me via Contact Me OR Join me on LinkedIn

Anayansi has an extensive background in clinical data management as well as experience with different EDC systems including Medidata Rave, Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

OpenClinica 3.1: conditional CRF: showing or hiding items, based on input

Short instructional video to show how conditional logic works in OpenClinica 3.1

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: OpenClinica

Got EDC?

Clinical trials play a key role in the pharmaceutical, biotechnology and medical device industry. With a large number of drugs coming off patent, companies are under pressure to develop and test new drugs as swiftly and efficiently as possible. This requires an increase in clinical trials and a reduction in the time cycle of those trials.

What is Electronic Data Capture?

Electronic Data Capture or EDC is the gathering of data collected by humans into computer systems without the need for manual data re-entry. EDC systems can speed time-to-market, reduce data entry errors, provide for early analysis and trend monitoring.

Data entry can be achieved using a number of mechanisms. Users can enter data directly into an electronic device such as a laptop PC, handheld device, tablet PC, and touch screen or tone dialing system such as IVRS.

EDC has been around for more than 10 years and still only about a third of all studies use it – the rest using paper-based data collection process.

While EDC has many advantages, a barrier to success is the expectation that the system is ready at the time of the enrollment of the first subject (patient). In order to achieve this target, one must have agreement on the data to be collected during the trial. I have found that this requirement frequently changes between the finalization of the protocol and first subject in. When paper case report forms are used, these changes to data can be more easily accommodated.

For EDC to become increasingly used in the pharmaceutical industry, they need to address the challenges prior to implementation. However, while pilot studies have been successful, pharmaceutical companies have not yet implemented EDC across the majority of their clinical trials. They are constrained by a lack of strategic planning, the varying requirements of each trial, the relative immaturity and fragmentation of the EDC software market and the need to address both process and change management.

Challenges:

  • What data are you gathering?
  • Is the site and clinical personal fully trained? What hardware do they have available?
  • Validation – Is validation of input data required?
  • Workflow – Do the data need validating, reviewing, approving and releasing for general consumption locally and centrally?
  • Integration – Does the system need integrating with other computer systems? Regulations. Does the system have to conform to any externally imposed regulations, such as 21 CFR Part 11, Good Manufacturing Practice, or Good Laboratory Practice?

EDC eliminate the transcription error (paper-based errors) and therefore transcription errors.

Benefits:

  • Double-Data Entry – The manual re-entry of data recorded on paper is expensive and unreliable
  • Validation – Immediate validation of data entry
  • User Friendly Web Forms – Data entry is quick and efficient
  • Access to real time data – quick executive level decision and information

EDC helps reduce invalid data entries and speed up the availability of drug trial information.

While the EDC technology still faces some challenges, benefits will drive acceptance. A leading stimulation to growth will be the reduction in price and increased sophistication and power of small handheld devices. These developments are only useful if accepted by the users. Not all EDC systems are created equally, and one must carefully pick a system that best meets your need.

If a company changes from paper-based system to EDC system, this ‘innovation’ will have a human side. Successful implementation is not just a matter of installing the software and announcing the change. Stakeholders, who are responsible for collecting, processing and communicating clinical trial data must adopt and adapt to the new systems – ensuring that a technical innovation is actually successfully adopted.

With current standards CDISC initiatives, it is inevitable that the FDA will eventually demand all information be provided in this format — although no date has yet been set.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Good Programming Practice for Clinical Trials by Sunil Gupta

The following are draft recommendations a Good Programming Practice for analysis, reporting and data manipulation in Clinical Trials and the Healthcare industries.

The purpose is to encourage contributions from across companies, non-profit organizations and regulators in an attempt to create a consensus recommendation. The ambition is that this page becomes recognized by the Pharmaceutical Industry, Clinical Research and Health Care Organizations as well as Regulatory Authorities.

The hope is that the Practice can be reviewed and endorsedby the relevant management teams of several Pharmaceutical companies and major Contract Research Organizations and promoted through relevant professional organizations such as PharmaSUG, PhUSE, PSI, CDISC.

Introduction

The Good Programming Practices are defined in order to:

  • Ensure the clarity of the code and facilitate code review;
  • Save time in case of maintenance, and ease the transfer of code among programmers or companies;
  • Minimize the need for code maintenance by robust programming;
  • Minimize the development effort by development and re-use of standard code and by use of dynamic (easily adaptable) code;
  • Minimize the resources needed at execution time (improve the efficiency of the code);
  • Reduce the risk of logical errors.
  • Meet regulatory requirements regarding validation and 21CFRPart11 compliance

Note: As often, the various guidelines provided hereafter may conflict with one another if applied in too rigorous a way. Clarity, efficiency, re-usability, adaptability and robustness of the code are all important, and must be balanced in the programming practice.

Regulatory Requirements

Validation

21CFRPart11 compliance

Readability and Maintainability

Language

English is an international language and study protocols, study reports for practical reasons (regulatory authorities, inlicensing, outlicensing, partnerships, mergers) are mostly written in English, therefore it is recommended to write the SAS code and comments in English.

Header and Revision History

  • Include a header for every program (template below).
**********************************************************;* Program name      :** Author            :** Date created      :** Study             : (Study number)*                     (Study title)** Purpose           :** Template          :** Inputs            :** Outputs           :** Program completed : Yes/No** Updated by        : (Name) – (Date): *                            (Modification and Reason)**********************************************************;
  • In addition to your name or initials, use your login ID to identify yourself in the header. This is so there is no ambiguity on the identify of each programmer.
  • Update the revision history at each code modification made after the finalization of the first version of a program.

Note: When you copy a program from another study, you became the author of this program, and you should clear the revision history. You can specify the origin of the program under the “Template” section of the header.

Below is an example with comments of an alternative comment block that I think is more useful for Open Source programming. PaulOldenKamp 16:54, 5 April 2009 (UTC)

/** ---------------------------------------------------------------------------------------- $Id: os3a_autoexec.sas 152 2008-11-17 01:48:40Z Paul_ok01 $ <== Id info automatically inserted with each commit to Subversion version control Application: OS3A - Common Programs Description: OS3A session initialization program. Previous Program: None Saved as: c:\os3a\trunk\os3a_autoexec.sas <== locations, local and web, where the pgm http://os3a.svn.sourceforge.net/viewvc/os3a/trunk/os3a_autoexec.sas can be found Change History: Date Prog Ref Description 04/26/2008 PMO [1] Initial programming for os3a <== date, programmer initials, ref number [1] to link to specific location of change Copyright: Copyright (c) 2008 OS3A Program. All rights reserved. <== Always tell who owns the program so one Copyright Contact: paul_ok01@users.sourceforge.net can ask permission from the copyright holder License: Eclipse Public License v1.0 <== Tell folks how they are licenced to use This program and the accompanying materials are made available under the program. the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at www.eclipse.org/legal/epl-v10.html Contributors: Paul OldenKamp, POK_Programming@OldenKamp.org - Develop initial pgm. <== Identify significant contributors <== tag identifies start of info @purpose OS3A system initialization program. used by the Codedoc Perl script to Set up initial options and global macro variables. produce external HTML documentation @param SYSPARM - input provided from program initiation call; default: main_FORE for production. @symbol sysRoot - system root location; Windows - C:/, UNIX - / @symbol remove_cmd - system command to remove file; Windows - erase, UNIX - rm @symbol os3aRoot - directory location of os3a root @symbol futsRoot - directory location of FUTS top level macros. @symbol Root - directory location of sub-project identified with Four Letter Acronym ----------------------------------------------------------------------------------------- */

The results from encoding the header and comments in a SAS program can be seen on the CodeDoc web page. See http://www.thotwave.com/products/codedoc.jsp.
CodeDoc Download Page

Comments

  • Include a comment before each major DATA/PROC step, especially when you are doing something complex or non-standard. Comments should be comprehensive, and should describe the rationale and not simply the action. For example, do not comment “Access demography data”; instead explain which data elements and why they are needed.
  • Organize the comments into a hierarchy.
  • Do not include numbers in comments.

Reason: It avoids heavy update when removing or inserting sections.

Naming Conventions

  • Use explicit and meaningful names for variables and datasets, with a maximum length of 8.
  • For permanent datasets, use a meaningful dataset label and variable labels.
  • When possible, never use the same name for a dataset more than once in the program.

Note: However, keep in mind that large intermediate files take a lot of SAS Workspace.

  • Name IN variable using “in” plus a meaningful reference to the dataset.

Example:

data aelst;   merge aesaes (in=inae) patpat (in=inpat);   by patno;   if inae and inpat;run;
  • Labels must have a maximum length of 40 characters.

Code Structure

  • It is mandatory to include libnames, options and formats in a separate setup program unless these are temporary formats or temporary options that are reset after being used.

Reason: It will guarantee that changes of the environment are taken into account in all programs run afterwards.

  • Use standard company macros to read in libnames and settings, to write out datasets, and for standard calculating and reporting.
  • One statement per line, but several are allowed if small and repeated or related. Long statements should be split across multiple lines.
  • Control system settings to show all executed code in the log as the default, in as clear manner as possible. The log should not be so lengthy that the programmer cannot easily navigate (if so, use highly visible comments with sufficient white space). System settings should be able to be easily changed in order for a user to debug a section of code or a macro, in order to temporarily display the %included code, resolved macro names, and logic.
  • Use a standard sequence for placing statements and group like statements together.
  1. Within a program:
    1. %LET statements and macro definitions
    2. Input steps
    3. Calculations
    4. Save final (permanent) datasets and created outputs
  2. Within a DATA step:
    1. All non-executable statements first (e.g. ATTRIB, LENGTH, KEEP…)
    2. All executable statements next

Reason: It increases the readability of the program.

  • Left-justify DATA, PROC, OPTIONS statements, indent all statements within.

Example:

proc means data=osevit;   var prmres;   by prmcod treat;run;
  • End every DATA/PROC step with a left-aligned RUN statement.

Reason: It explicitly defines the step boundary.

  • Insert at least one blank line after each RUN statement in DATA/PROC steps.
  • Indent statements within a DO loop, align END with DO.
  • Avoid having too many nested DO loop and IF-ELSE statements.
  • In case of interlinked DO loop, add a comment at the start (DO) and end (END) of each loop.

Example:

data test01;  do patno=1 to 40; * cycle thru patients;    do visit=1 to 3; * cycle thru visits;      output;     end; * cycle thru visits;  end; * cycle thru patients;run;
  • Insert parentheses in meaningful places in order to clarify the sequence in which mathematical or logical operations are performed.

Example:

data test02;  set test01;  if (visit=0  and vdate lt adate1)   or (visit=99 and vdate gt adate2) then delete;run;

Style conventions

Draft section : this may not be specific to clinical programming, but may be of use when considering a general standard for sharing programs.

Use of analysis datasets

For discussion of why programming output directly from raw data is generally avoided.

Efficiency

  • When you input or output a SAS dataset, use a KEEP (preferred to DROP) statement to keep only the needed variables.

Reason: The SAS system loads only the specified variables into the Program Data Vector, eliminating all other variables from being loaded.

  • When subsetting a SAS dataset, use a WHERE statement rather than IF, if possible.

Reason: WHERE subsets the data before entering it into the Program Data Vector, whereas IF subsets the data after inputting the entire dataset.

  • When using IF condition, use IF/ELSE for mutually exclusive conditions, and check the most likely condition first.

Reason: The ELSE/IF will check only those observations that fail the first IF condition. With the IF/IF, all observations will be checked twice. Also, consider the use of a SELECT statement instead of IF/ELSE, as it may be more readable.

  • Avoid unnecessary sorting. CLASS statement can be used in some procedure to perform by-group processing without sorting the data.

Example:

proc means data=osevit;  var prmres;  class treat;run;
  • If possible (i.e. not a sorting variable), use character values for categorical variables or flags instead of numeric values.

Reason: It saves space. A character “1” uses one byte (if length is set to one), whereas a numeric 1 uses eight bytes.

  • Use the LENGTH statement to reduce variable size.

Reason: Storage space can be reduced significantly.
Note: Keep in mind that a too limited variable length could reduce the robustness of the code (lead to truncation with different sets of data).

  • Use simple macros for repeating code.

Robustness

  • Use the MSGLEVEL=I option in order to have all informational, note, warning, and error messages sent to the LOG.
  • In the final code, there should be no dead code that does not work or that is not used. This must be removed from the program.
  • Code to allow checking of the program or of the data (on all data or on a subset of patients such as clean patients, discontinued patients, patients with SAE or patients with odd data) is encouraged and should be built throughout the program. This code can be easily activated during the development phase or commented out during a production run using the piece of code detailed in Section 6.
  • It is not acceptable to have avoidable notes or warnings in the log (mandatory).

Reason: They can often lead to ambiguities, confusion, or actual error (e.g. erroneous merging, uninitialized variables, automatic numeric/character conversions, automatic formatting, operation on missing data…).
Note: If such a warning message is unavoidable, an explanation has to be given in the program (mandatory).

  • Always use DATA= in a PROC statement (mandatory).

Reason: It ensures correct dataset referencing, makes program easy to follow, and provides internal documentation.

  • Be careful when merging datasets. Erroneous merging may occur when:
  1. No BY statement is specified (set system option MERGENOBY=WARN or ERROR).
  2. Some variables, other than BY variables, exist in the two datasets (set system option MSGLEVEL=I), S writes a warning to the SAS log whenever a MERGE statement would cause variables to be overwritten at which the values of the last dataset on the MERGE statement are kept).
  3. More than one dataset contain repeats of BY values. A WARNING though not an ERROR is produced in the LOG. If you really need, PROC SQL is the only way to perform such many-to-many merges.

Reason: One has to routinely carefully check the SASLOG as the above leads to WARNING messages rather ERROR messages yet the resulting dataset is rarely correct.

  • When coding IF-THEN-ELSE constructs use a final ELSE statement to trap any observations that do not meet the conditions in the IF-THEN clauses.

Reason: You can only be sure that all possible combinations of data are covered if there is a final ELSE statement.

  • When coding a user-defined FORMAT, include the keyword ‘other’ on the left side of the equals sign so that all possible values have an entry in the format.

Reason: A missing entry in a user-defined FORMAT can be difficult to detect. The simplest way to identify this potential problem is to ensure that all values are assigned a format.
Note: This does not apply to INFORMATs. It could be more helpful to get a WARNING message when trying to INPUT data of unexpected format.

  • Try to produce code that will operate correctly with unusual data in unexpected situations (e.g. missing data).

Code for Data Checks

Build checks so that their purpose is clear, so that they can be toggled on or off, and remove them once they are no longer needed.

Activate/Deactivate Pieces of Code

In the beginning of the program, define a macro variable that you set to blank during the development phase or that you set equal to * for the production run:

%let c=;  or  %let c=*;

For the pieces of code that check the data/program, start each line with the macro variable defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c run;

This code will be executed if &c is blank (development), but will be commented out when &c=* (production).

Perform Checks on a Subset of Patients

In a separate code that you store under the study MACRO folder, list the subset of patients (clean patients, discontinued patients, patients with SAE or patients with odd data) that you want to look at:

%macro select;2076 2162 2271 2449%mend;

In the beginning of the program, define a second macro variable that you set equal to * when you want to perform checks on all data or to blank when you are interested in a subset of patients:

%let s=*;  or  %let s=;

For each checking code, add a piece of code that allows subsetting the data, and start each line of this piece of code with the 2 macro variables defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c &s where patno in (%select);&c run;

The check will be performed only if &c is blank, and it will be applied to all patients if &s=* or on the subset of patients if &s is blank.
Better still: input the list of check case IDs as a dataset.

Floating Point Error

Consider the real number system that we are familiar with. This decimal system (0 → ± ∞) is obviously infinite. Most computers use floating point representation, in which, a finite set of numbers is used to represent the infinite real number system. Thus, we can deduce that we will have some sort of error appearing from time to time. This is more generally termed Floating Point Error and occurs in computers due to hardware limitations.

The following paper goes someway in explaining why and how this happens and also possible solutions in how to approach this issue.

Paper reference:- http://www.lexjansen.com/phuse/2008/cs/cs08.pdf

Data Imputation versus Hardcoding

Please contribute

  • Definitions
    • Data Imputation
    • Hardcoding
  • Issues
  • Recommendation

Integrity of a data transfer

At a minimum, all data transfers should be validated by checking the observation counts for each SAS dataset
or the record counts in other formats, against counts provided by the sender. It is also helpful if the sender
can provide a checksum for each file transferred since this also ensures all content made it’s way to you
without transmission errors. There are many freeware programs available to calculate checksums for any file at websites such as http://sourceforge.net/ .

Macros

Draft section : recommendations particularly relevant for the development and use of macros and macro libraries.

Macros are particularly useful under the following circumstances:

  • Program code is used repeatedly
  • A number of steps must be taken conditionally, and the logic for these is clearly fixed (no need to think of all the steps that should be included in a program under a specific situation: the macro will deduce them for you and generate the appropriate data step or proc step code)
  • There is no trivial solution via “ordinary” SAS code
  • Their application must be easier as to program the code itself!
  • The usage helps users avoiding errors and omissions.

If used appropriately the following benefits can be achieved:

  • Increase in quality by avoiding programming bugs and errors
  • Savings in time and resources
  • Enforcement of standards, e.g. standard methods and standard outputs
  • Work can be more enjoyable as programmers can focus on the non-routine work

Ideally Macro development should follow a few rules:

  • Macro headers should clearly state all changes to environment and data that result from execution. Changes should be limited to those necessary for the focused purpose of the macro:
    • strictly controlled changes to input data and creation of output data
    • clear temporary data set clutter
    • no unexpected changes to system settings (options, titles, footnotes, etc)
    • no unexpected changes to external symbol tables
  • Scope of macro variables should be explicitly controlled using %global and %local statements.
  • Method of macro variable creation should demonstrate awareness of default scope:
  • The log matters:
    • Use Base SAS techniques whenever possible to avoid excessive code generation (log bloat). For example, macro definition should use DATA step array and DO loop processing rather than Macro %DO looping.
    • But use pure Macro Language for routine utility macros (see details, below).
    • Use appropriate comment style in macro definitions to properly annotate the SAS log when MPRINT in on. For example, use %* style commenting to explain macro logic, but /* style commenting to explain resulting code. (Or * style or PUT statement commenting as appropriate.)
    • Allow the users to control the appearance of the log via MPRINT, SYMBOLGEN, and MLOGIC.
  • Code within a macro definition should be germane, limited to the specific purpose of the macro. The use of a central repository for macros (“Macro library”) is suggested.
  • Macro Library: Code for routine tasks (eg, parameter checking, system and environment checking, messaging, etc.) should be handled by dedicated utility macros. Code for such routine tasks should not overwhelm the current macro definition, obscuring the purpose, and creating unnecessary maintenance overhead and lack of consistency within a library.
  • Macro Library: Parameter naming conventions should be used for common parameters such as input/output libnames and data sets. Explicit and transparent control of macro variable scope again becomes crucial to avoid accidental change of external symbol tables
  • Macro Library: Use pure Macro Language definitions whenever possible to improve program flow and avoid producing unnecessary Base SAS code. Returning a list of data set variable, checking for macro var existence, returning data set obs count can all be achieved without BASE SAS code. Such macros can be called “inline” without unnecessary overhead or interruption of program flow.

For example, instead of %count_ds_obs definition that uses DATA Step code and interrupts program flow like

%let n_obs = %count_ds_obs(DSIN=myData);%if &n_obs > 0 %then %do;  ... more statements ...%end;

an inline, pure Macro Language implemetation allows streamlined code:

%if %inline_ds_obs(DSIN=myData) > 0 %then %do;  ... more statements ...%end;

Source: Sunil Gupta, Senior SAS Consultant, Gupta Programming http://www.sascommunity.org/wiki/Good_Programming_Practice_for_Clinical_Trials

SDTM Terminology

The current version of the Study Data Tabulation Model Implementation Guide Version 3.1.2 (SDTMIG v3.1.2) answers to CDSH questions to comply with SDTM terminology.

Some key points to remember:

  • You need to define which codelists will be applied to which questions
  • Most data entry systems require a concise list of potential terms per variable/field
  • SDTM terminology codelists are big, e.g. C66786 – COUNTRY which covers all potential COUNTRYs for all clinical trials / site management
  • Controlled terminology gap – we need to develop new terms

So how SDTM terminology become possible?

In order to improve the efficiency of human drug review through required electronic submissions and standardization of electronic drug application data, FDA and industry leaders are working together in this initiative.

Format in EXCEL


SDTM controlled terminology is extracted from the NCit by an automated procedure that creates a report organized into terminology codelists. These codelists correspond to CDISC variables.

To access SDTM Controlled Terminology visit CDISC website and click on Standards & Innovations –>; Terminology

This is the superset of codelists that are used to both collect and submit SDTM data

Excel Format – Column descriptions in the Controlled Terminology (SDTM subset)

Column. Description
Code (column A) Unique numeric code randomly generated by NCI Thesaurus (NCIt) and assigned to individual CDISC controlled terms.
Codelist Code (Column B) Unique numeric code randomly generated by NCI Thesaurus (NCIt) and assigned to the SDTM parent codelist names. This code is repeated for each controlled term (aka permissible value) belonging to a codelist. As of 9/22/2008, this code was dropped for parent codelist entries, where it created confusion.
**NOTE – light blue highlighting is used to identify the beginning of a new SDTM codelist and its applicable term set.
Codelist Extensible (Yes/No) (Column C) Defines if controlled terms may be added to the codelist. New terms may be added to existing codelist values as long as they are not duplicates or synonyms of existing terms. The expectation is that sponsors will use the published controlled terminology as a standard baseline and codelists defined as “extensible” (or “Yes”) may have terms added by the sponsor internally.
Codelist Name (Column D) Contains the descriptive name of the codelist which is also referred to as the codelist label in the SDTM IG. As with the Codelist Code, the Codelist Name is repeated for each controlled term belonging to a codelist.
CDISC Submission Value (Column E) IMPORTANT COLUMN: Currently (as per SDTMIG 3.1.2) this is the specific value expected for submissions. Each value corresponds to a SDTM Codelist Name as indicated by light blue shading.
CDISC Synonym(s) (Column F) This identifies the applicable synonyms for a CDISC Preferred Term in Column F. **NOTE – this is especially important in instances where a Test name or Parameter Test name contains a corresponding Test Code or Parameter Test Code.
CDISC Definition (Column G) This identifies the CDISC definition for a particular term. In many cases an existing NCI definition has been used. The source for a definition is noted in parentheses (e.g. NCI, CDISC glossary, FDA).
NCI Preferred Term (Column H) This identifies the NCI preferred name for a term as identified in NCIt. **NOTE – This column designates the human readable, fully specified preferred term corresponding to the NCI c-code, and is especially helpful for searching NCIt to get the entire concept with links to all instances of the term.

Above example of the spreadsheet for the ‘Route of Administration‘ codelist.

Reference: Clinical Data Interchange Standards Consortium, Inc (CDISC) Representing Controlled Terminology in CDASH

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

OpenClinica – Open Source for Clinical Research

-FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: OpenClinica