Tag Archives: Statistical Analysis System

CDISC/CDASH Standards at your Fingertips

A standard database structure using CDISC (Clinical Data Interchange Standards Consortium) and CDASH (Clinical Data Acquisition Standards Harmonization) standards can facilitate the collection, exchange, reporting, and submission of clinical data to the FDA and EMEA. CDISC and CDASH standards provide reusability and scalability to EDC (electronic data capture) trials.

There are some defiance in implementing CDISC in EDC CDMS:

1. Key personnel in companies must be committed to implementing the CDISC/CDASH standards.

2. There is an initial cost for deployment of new technology: SDTM Data Translation Software, Data Storage and Hosting, Data Distribution and Reporting Software.

3. It can be difficult to understand and interpret complex SDTM Metadata concepts and the different implementation guides.

4. Deciding at what point in a study to apply the standards can be challenging: in the study design process, during data collection within the CDMS [CDASH via EDC tools], in SAS prior to report generation [ADaM], or after study completion prior to submission [SDTM].

5. Data management staff [CDM, clinical programmers], biostatisticians, and clinical monitors may find it difficult to converge on a new standard when designing standard libraries and processes.

6. Implementing new standards involves reorganizing the operations of (an organization) so as to improve efficiency [processes and SOPs].

7. Members of Data Management team must be retrained on the use of new software and CDISC/CDASH standards.

standards8. There are technical obstacles related to implementation in several EDC systems, including 8 character limitations [SAS] on numerous variables, determining when to use supplemental qualifiers versus creating new domains, and creating vertical data structure.

Comments? Join us at {EDC Developer}

Anayansi Gamboa, MPM, an EDC Developer Consultant and clinical programmer for the Pharmaceutical and Biotech industry with more than 13 years of experience.

Available for short-term contracts or ad-hoc requests. See my specialties section (Oracle, SQL Server, EDC Inform, EDC Rave, OpenClinica, SAS and other CDM tools)

As the 3 C’s of life states: Choices, Chances and Changes- you must make a choice to take a chance or your life will never change. I continually seek to implement means of improving processes to reduce cycle time and decrease work effort.

Subscribe to my blog’s RSS feed and email newsletter to get immediate updates on latest news, articles, and tips. I am available on LinkedIn. Connect with me there for technical discussions.

Fair Use Notice: This article/video contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes that go beyond fair use, you must obtain permission from the copyright owner. Fair Use notwithstanding we will immediately comply with any copyright owner who wants their material removed or modified, wants us to link to their website or wants us to add their photo.

Disclaimer: The EDC Developer blog is “one man’s opinion”. Anything that is said on the report is either opinion, criticism, information or commentary. If making any type of investment or legal decision it would be wise to contact or consult a professional before making that decision.

Disclaimer:De inhoud van deze columns weerspiegelen niet per definitie de mening van {EDC Developer}.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Advertisements

SAS Cheat Sheet – Part 4

INFORMATS

Code Description
<w.d> Reads standard numeric data
<datew.> Reads date values (ddmmmyy)/td>;
<$w.> Reads standard character data
<$VARYINGw> Reads character data of varying length

SAS represents a date internally as the number of days
between January 1, 1960 and the specified date. Dates
before 1960 are represented with negative numbers. A
SAS date value is a numeric variable.

SAS DateValue Calendar DateValue
0 January 1, 1960
30 January 31, 1960
366 January 1, 1961
-365 January 1, 1959

How do you use an InFormat to create a SAS date?

If the raw fields are aligned in columns, use formatted input and specify the informat.
e.g. input @1 visitdt yymmdd8.;

With list input, use a separate INFORMAT statement or a modifier.
informat visitdtyymmdd8.;
input visitdt;
input visitdt : yymmdd8.;

How do you refer to a particular date in SAS?

To create a SAS date constant, write the date enclosed in single or double quotes, followed by a D.
e.g. age = ’20-MAR-2012’d – incdt;
where visitdt lt “1may12”D;

How do you work with SAS date values?

When a variable is a SAS date value, you can easily apply operations such as addition and subtraction. To find the number of days between two dates, simply subtract the two SAS date variables.
e.g. daysbtwn = visitdt1 – visitdt2;

Comparison operators can also be used.
if visitdt1 <; visitdt2 then do;

SAS Cheat Sheet – Part 6

The macro facility is a tool you can use in your programming./*

Macro Language

%DO macro-var=start_value %TO end_value

%DO %WHILE (expression); /*Executes a section of a macro repetitively while a condition is true*/

%DO %UNTIL (expression); /*Executes a section of a macro repetitively until a condition is true*/

Macro variables can be stored in either the global symbol table or in a local symbol table

%GLOBAL macro-variable(s); /*Creates macro variables that are available during the execution of an entire SAS session*/

%IF expression %THEN action; <%ELSE action;> /Conditionally process a portion of a macro*/

%LENGTH (character string | text expression) /*Returns the length of a string*/

The name assigned to a macro variable must be a valid SAS name

%LET macro-variable =<value>; /*Creates a macro variable and assigns it a value*/

%MACRO mname (<pp1><…,ppn><kp1=value<..<kpn=v>);/*Begins a macro definition*/

%MEND <macro-name>; /*Ends a macro definition*/

%SCAN(argument,n<,delimiters>)/*Search for a word that is specified by its position in a string*/

%SUBSTR (argument,position<,length>)/*Produce a substring of a character string*/

%UPCASE (character string | text expression) /*Convert values to uppercase*/

Macro variable values are text values

Macro Quoting

%QUOTE | %NRQUOTE and %BQUOTE | %NRBQUOTE /*Mask special characters and mnemonic operators in a resolved value at macro execution */

%STR | %NRSTR /*Mask special characters and mnemonic operators in constant text at macro compilation */

%SUPERQ /*Masks special characters/mnemonic operators at macro execution but prevents further resolution of the value*/

Example:

*----------------------------------------------------------------*//* This macro will produce summary statistics ----------------   */

%Macro safety1;
%odscmd (start,portnum=11,rptname=AEActivityReport);
Data _null_;
thedate=today();
fstdate = mdy(month(thedate), 1, year(thedate));
lstdate = intnx(‘month’, fstdate, 1)-1;
call symput (‘strtdate’, put(fstdate, date9.));
call symput (‘stpdate’, put(thedate, date9.));
run;

%put strtdate: &strtdate;
%put stpdate: &stpdate;
* more code goes here;

/*----------------------------------------------------------------*/

SAS Cheat Sheet – Part 3

FORMATS

Code Description
<w.d> standard numeric
<COMMAw.d> writes numeric values with commas and decimal points
<Zw.d> print leading zeros
<$w.> writes standard character data
<$CHARw.> writes standard character data
<$VARYINGw.> Writes character data of varying length

Format is used to change how the values of variables are displayed/portrayed in your SAS output. A more formal term for this process is “altering the external representation of the values of variables in a SAS data set.

So how to add leading zeros ? e.g., 999 –> 0999

In order to create variable LEADZEROS_ID, you can use the PUT function and the format Zw. which adds leading zeros to the specified length. Here’s an example:

leadzeros_id=put(id,z4.);

data leadzeros;

input ID PETNAME $ SEX $ AGE WEIGHT;
datalines;
1 Mark M 1 2.5
2 Ace M 3 44
3 Fuzzy M 2 18
4 Champ M 4 55
5 Tom M 6 63.5 62.5
6 Ivon M 7 83
7 Balboa M 12 64.5
;
run;
data leadzeros;
set leadzeros;
leadzeros_num = put(id,z4.); *z4. format is used so 3 leading zeros will be added if the ID length is 1;
run;

Another common format is the COMMAw.d: this format writes numeric values with commas that separate every three digits and a period that separates the decimal fraction.

w = specifies the width of the output field
d = optionally specifies the number of digits to the right of the decimal point in the numeric value

e.g. put @10 profit comma10.2;
so this number 45678.45 will be formatted to 45,678.45

Concatenating SAS data sets with the APPEND procedure

The APPEND procedure is an efficient method for concatenating observations from a smaller data set to a larger data set. The BASE= data set option is reserved for the larger of the two data sets with the DATA= option for the smaller data set. Essentially, the APPEND procedure avoids reading any observations in the BASE= data set by positioning the record pointer at the end of the BASE= data set.

If no additional processing is necessary, using PROC APPEND or the APPEND statement in PROC DATASETS is more efficient than using a DATA step to concatenate data sets.

Each observation from the smaller data set is then applied one at a time to the end of the BASE= data set. In the next example, the BASE= data set identifies a larger data set called MASTER and the DATA= data set identifies the smaller SOURCEDATA data set.

When one or more variables in the input data set (DATA=) are not present in the BASE= data set, an optional FORCE option can be specified as an option with the PROC APPEND statement to prevent a syntax error

PROC APPEND
BASE=master
DATA=sourcedata;
RUN;

WARNING: Variable charvar has different lengths on BASE and DATA files (BASE 25 DATA 30).
ERROR: No appending done because of anomalies listed above. Use FORCE option to append these files.

As the LOG message suggest, no appending is done and use of the FORCE option is necessary to append the input files. The consequent required syntax to correct this issue would look like this:

PROC APPEND
BASE=master
DATA=sourcedata force;
RUN;

WARNING: Variable charvar has different lengths on BASE and DATA files (BASE 25 DATA 30).
NOTE: FORCE is specified, so dropping/truncating will occur.

When two or more data sets need to be concatenated, multiple APPEND procedures are issued. In the next example, two separate PROC APPEND steps are specified to concatenate the two smaller data sets (mysasdata1 and mysasdata2) at the end of the larger BASE= data set.

PROC APPEND
BASE=master
DATA=mysasdata1;
RUN;

PROC APPEND
BASE=master
DATA=mysasdata2;
RUN;

You cannot use PROC APPEND to add observations to a SAS data set in a sequential library

Reference: SAS 9.2 Language Reference Concepts 2nd edition

From Non-SAS Programmer to SAS Programmer

SAS Programmers come from many different educational backgrounds. Many has started their careers as a Data Manager in a CRO environment and grew to become a SAS programmer. Others have gone to college and pursued degrees in math, statistics or computer science degree.

Do you have SAS Skills? First, you need to find out more about statistical programming desire skills and start to slowly learn what SAS programmers and statisticians do in the pharmaceutical industry. It is also important to understand the Drug Development and Regulatory process so that you have a better understanding of the industry as a whole as well as the drug approval process.

In addition, I have personally attended several workshop on Statistics for Non-statistician provided by several of my past employers/clients (GSK, Sanofi-Aventis, etc) so I could have a greater understanding of statistics role. I am personally more inclined to the EDC development than becoming a biostatistician but these are just some of the few steps you could take to grow your career as a SAS programmer.

Practice, Practice, Practice!

To begin learning how to actually program in SAS, it would be a good idea to enroll to a SAS course provided by the SAS Institute near you or via eLearning. I have taken the course SAS Programming 1: Essentials, and I would recommended. You could also join SUGI conferences and other user groups near your city/country. Seek every opportunity to help you gain further understanding on how to efficiently program in the pharmaceutical industry. It could well land you a Junior SAS programming position.

Transitioning to a SAS Programming role: Now that you have gotten your first SAS programming job, you will need to continue your professional development and attend additional training, workshops, seminars and study workgroup meetings. The SAS Institute provide a second level, more advance course Programming II: Manipulating Data with the Data Step, SAS Macro Language and SAS macro Programming Advanced topics. There are also SAS certifications courses available to help you prepare to become a SAS certified programmer.

There is a light at the end of the tunnel: Advance!

Your ongoing development will be very exciting and challenging. Continued attending SAS classes as needed and attending industry related conferences such as PharmaSUG to gain additional knowledge and insight on how to perform your job more effectively and efficiently.

As you can see, it is possible to ‘grow’ a SAS programmer from a non-programming background to an experience programmer. All of the classes, training, and projects you will work on are crucial in expanding your SAS knowledge and will allow you to have a very exciting career opportunity ahead of you.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

Good Programming Practice for Clinical Trials by Sunil Gupta

The following are draft recommendations a Good Programming Practice for analysis, reporting and data manipulation in Clinical Trials and the Healthcare industries.

The purpose is to encourage contributions from across companies, non-profit organizations and regulators in an attempt to create a consensus recommendation. The ambition is that this page becomes recognized by the Pharmaceutical Industry, Clinical Research and Health Care Organizations as well as Regulatory Authorities.

The hope is that the Practice can be reviewed and endorsedby the relevant management teams of several Pharmaceutical companies and major Contract Research Organizations and promoted through relevant professional organizations such as PharmaSUG, PhUSE, PSI, CDISC.

Introduction

The Good Programming Practices are defined in order to:

  • Ensure the clarity of the code and facilitate code review;
  • Save time in case of maintenance, and ease the transfer of code among programmers or companies;
  • Minimize the need for code maintenance by robust programming;
  • Minimize the development effort by development and re-use of standard code and by use of dynamic (easily adaptable) code;
  • Minimize the resources needed at execution time (improve the efficiency of the code);
  • Reduce the risk of logical errors.
  • Meet regulatory requirements regarding validation and 21CFRPart11 compliance

Note: As often, the various guidelines provided hereafter may conflict with one another if applied in too rigorous a way. Clarity, efficiency, re-usability, adaptability and robustness of the code are all important, and must be balanced in the programming practice.

Regulatory Requirements

Validation

21CFRPart11 compliance

Readability and Maintainability

Language

English is an international language and study protocols, study reports for practical reasons (regulatory authorities, inlicensing, outlicensing, partnerships, mergers) are mostly written in English, therefore it is recommended to write the SAS code and comments in English.

Header and Revision History

  • Include a header for every program (template below).
**********************************************************;* Program name      :** Author            :** Date created      :** Study             : (Study number)*                     (Study title)** Purpose           :** Template          :** Inputs            :** Outputs           :** Program completed : Yes/No** Updated by        : (Name) – (Date): *                            (Modification and Reason)**********************************************************;
  • In addition to your name or initials, use your login ID to identify yourself in the header. This is so there is no ambiguity on the identify of each programmer.
  • Update the revision history at each code modification made after the finalization of the first version of a program.

Note: When you copy a program from another study, you became the author of this program, and you should clear the revision history. You can specify the origin of the program under the “Template” section of the header.

Below is an example with comments of an alternative comment block that I think is more useful for Open Source programming. PaulOldenKamp 16:54, 5 April 2009 (UTC)

/** ---------------------------------------------------------------------------------------- $Id: os3a_autoexec.sas 152 2008-11-17 01:48:40Z Paul_ok01 $ <== Id info automatically inserted with each commit to Subversion version control Application: OS3A - Common Programs Description: OS3A session initialization program. Previous Program: None Saved as: c:\os3a\trunk\os3a_autoexec.sas <== locations, local and web, where the pgm http://os3a.svn.sourceforge.net/viewvc/os3a/trunk/os3a_autoexec.sas can be found Change History: Date Prog Ref Description 04/26/2008 PMO [1] Initial programming for os3a <== date, programmer initials, ref number [1] to link to specific location of change Copyright: Copyright (c) 2008 OS3A Program. All rights reserved. <== Always tell who owns the program so one Copyright Contact: paul_ok01@users.sourceforge.net can ask permission from the copyright holder License: Eclipse Public License v1.0 <== Tell folks how they are licenced to use This program and the accompanying materials are made available under the program. the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at www.eclipse.org/legal/epl-v10.html Contributors: Paul OldenKamp, POK_Programming@OldenKamp.org - Develop initial pgm. <== Identify significant contributors <== tag identifies start of info @purpose OS3A system initialization program. used by the Codedoc Perl script to Set up initial options and global macro variables. produce external HTML documentation @param SYSPARM - input provided from program initiation call; default: main_FORE for production. @symbol sysRoot - system root location; Windows - C:/, UNIX - / @symbol remove_cmd - system command to remove file; Windows - erase, UNIX - rm @symbol os3aRoot - directory location of os3a root @symbol futsRoot - directory location of FUTS top level macros. @symbol Root - directory location of sub-project identified with Four Letter Acronym ----------------------------------------------------------------------------------------- */

The results from encoding the header and comments in a SAS program can be seen on the CodeDoc web page. See http://www.thotwave.com/products/codedoc.jsp.
CodeDoc Download Page

Comments

  • Include a comment before each major DATA/PROC step, especially when you are doing something complex or non-standard. Comments should be comprehensive, and should describe the rationale and not simply the action. For example, do not comment “Access demography data”; instead explain which data elements and why they are needed.
  • Organize the comments into a hierarchy.
  • Do not include numbers in comments.

Reason: It avoids heavy update when removing or inserting sections.

Naming Conventions

  • Use explicit and meaningful names for variables and datasets, with a maximum length of 8.
  • For permanent datasets, use a meaningful dataset label and variable labels.
  • When possible, never use the same name for a dataset more than once in the program.

Note: However, keep in mind that large intermediate files take a lot of SAS Workspace.

  • Name IN variable using “in” plus a meaningful reference to the dataset.

Example:

data aelst;   merge aesaes (in=inae) patpat (in=inpat);   by patno;   if inae and inpat;run;
  • Labels must have a maximum length of 40 characters.

Code Structure

  • It is mandatory to include libnames, options and formats in a separate setup program unless these are temporary formats or temporary options that are reset after being used.

Reason: It will guarantee that changes of the environment are taken into account in all programs run afterwards.

  • Use standard company macros to read in libnames and settings, to write out datasets, and for standard calculating and reporting.
  • One statement per line, but several are allowed if small and repeated or related. Long statements should be split across multiple lines.
  • Control system settings to show all executed code in the log as the default, in as clear manner as possible. The log should not be so lengthy that the programmer cannot easily navigate (if so, use highly visible comments with sufficient white space). System settings should be able to be easily changed in order for a user to debug a section of code or a macro, in order to temporarily display the %included code, resolved macro names, and logic.
  • Use a standard sequence for placing statements and group like statements together.
  1. Within a program:
    1. %LET statements and macro definitions
    2. Input steps
    3. Calculations
    4. Save final (permanent) datasets and created outputs
  2. Within a DATA step:
    1. All non-executable statements first (e.g. ATTRIB, LENGTH, KEEP…)
    2. All executable statements next

Reason: It increases the readability of the program.

  • Left-justify DATA, PROC, OPTIONS statements, indent all statements within.

Example:

proc means data=osevit;   var prmres;   by prmcod treat;run;
  • End every DATA/PROC step with a left-aligned RUN statement.

Reason: It explicitly defines the step boundary.

  • Insert at least one blank line after each RUN statement in DATA/PROC steps.
  • Indent statements within a DO loop, align END with DO.
  • Avoid having too many nested DO loop and IF-ELSE statements.
  • In case of interlinked DO loop, add a comment at the start (DO) and end (END) of each loop.

Example:

data test01;  do patno=1 to 40; * cycle thru patients;    do visit=1 to 3; * cycle thru visits;      output;     end; * cycle thru visits;  end; * cycle thru patients;run;
  • Insert parentheses in meaningful places in order to clarify the sequence in which mathematical or logical operations are performed.

Example:

data test02;  set test01;  if (visit=0  and vdate lt adate1)   or (visit=99 and vdate gt adate2) then delete;run;

Style conventions

Draft section : this may not be specific to clinical programming, but may be of use when considering a general standard for sharing programs.

Use of analysis datasets

For discussion of why programming output directly from raw data is generally avoided.

Efficiency

  • When you input or output a SAS dataset, use a KEEP (preferred to DROP) statement to keep only the needed variables.

Reason: The SAS system loads only the specified variables into the Program Data Vector, eliminating all other variables from being loaded.

  • When subsetting a SAS dataset, use a WHERE statement rather than IF, if possible.

Reason: WHERE subsets the data before entering it into the Program Data Vector, whereas IF subsets the data after inputting the entire dataset.

  • When using IF condition, use IF/ELSE for mutually exclusive conditions, and check the most likely condition first.

Reason: The ELSE/IF will check only those observations that fail the first IF condition. With the IF/IF, all observations will be checked twice. Also, consider the use of a SELECT statement instead of IF/ELSE, as it may be more readable.

  • Avoid unnecessary sorting. CLASS statement can be used in some procedure to perform by-group processing without sorting the data.

Example:

proc means data=osevit;  var prmres;  class treat;run;
  • If possible (i.e. not a sorting variable), use character values for categorical variables or flags instead of numeric values.

Reason: It saves space. A character “1” uses one byte (if length is set to one), whereas a numeric 1 uses eight bytes.

  • Use the LENGTH statement to reduce variable size.

Reason: Storage space can be reduced significantly.
Note: Keep in mind that a too limited variable length could reduce the robustness of the code (lead to truncation with different sets of data).

  • Use simple macros for repeating code.

Robustness

  • Use the MSGLEVEL=I option in order to have all informational, note, warning, and error messages sent to the LOG.
  • In the final code, there should be no dead code that does not work or that is not used. This must be removed from the program.
  • Code to allow checking of the program or of the data (on all data or on a subset of patients such as clean patients, discontinued patients, patients with SAE or patients with odd data) is encouraged and should be built throughout the program. This code can be easily activated during the development phase or commented out during a production run using the piece of code detailed in Section 6.
  • It is not acceptable to have avoidable notes or warnings in the log (mandatory).

Reason: They can often lead to ambiguities, confusion, or actual error (e.g. erroneous merging, uninitialized variables, automatic numeric/character conversions, automatic formatting, operation on missing data…).
Note: If such a warning message is unavoidable, an explanation has to be given in the program (mandatory).

  • Always use DATA= in a PROC statement (mandatory).

Reason: It ensures correct dataset referencing, makes program easy to follow, and provides internal documentation.

  • Be careful when merging datasets. Erroneous merging may occur when:
  1. No BY statement is specified (set system option MERGENOBY=WARN or ERROR).
  2. Some variables, other than BY variables, exist in the two datasets (set system option MSGLEVEL=I), S writes a warning to the SAS log whenever a MERGE statement would cause variables to be overwritten at which the values of the last dataset on the MERGE statement are kept).
  3. More than one dataset contain repeats of BY values. A WARNING though not an ERROR is produced in the LOG. If you really need, PROC SQL is the only way to perform such many-to-many merges.

Reason: One has to routinely carefully check the SASLOG as the above leads to WARNING messages rather ERROR messages yet the resulting dataset is rarely correct.

  • When coding IF-THEN-ELSE constructs use a final ELSE statement to trap any observations that do not meet the conditions in the IF-THEN clauses.

Reason: You can only be sure that all possible combinations of data are covered if there is a final ELSE statement.

  • When coding a user-defined FORMAT, include the keyword ‘other’ on the left side of the equals sign so that all possible values have an entry in the format.

Reason: A missing entry in a user-defined FORMAT can be difficult to detect. The simplest way to identify this potential problem is to ensure that all values are assigned a format.
Note: This does not apply to INFORMATs. It could be more helpful to get a WARNING message when trying to INPUT data of unexpected format.

  • Try to produce code that will operate correctly with unusual data in unexpected situations (e.g. missing data).

Code for Data Checks

Build checks so that their purpose is clear, so that they can be toggled on or off, and remove them once they are no longer needed.

Activate/Deactivate Pieces of Code

In the beginning of the program, define a macro variable that you set to blank during the development phase or that you set equal to * for the production run:

%let c=;  or  %let c=*;

For the pieces of code that check the data/program, start each line with the macro variable defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c run;

This code will be executed if &c is blank (development), but will be commented out when &c=* (production).

Perform Checks on a Subset of Patients

In a separate code that you store under the study MACRO folder, list the subset of patients (clean patients, discontinued patients, patients with SAE or patients with odd data) that you want to look at:

%macro select;2076 2162 2271 2449%mend;

In the beginning of the program, define a second macro variable that you set equal to * when you want to perform checks on all data or to blank when you are interested in a subset of patients:

%let s=*;  or  %let s=;

For each checking code, add a piece of code that allows subsetting the data, and start each line of this piece of code with the 2 macro variables defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c &s where patno in (%select);&c run;

The check will be performed only if &c is blank, and it will be applied to all patients if &s=* or on the subset of patients if &s is blank.
Better still: input the list of check case IDs as a dataset.

Floating Point Error

Consider the real number system that we are familiar with. This decimal system (0 → ± ∞) is obviously infinite. Most computers use floating point representation, in which, a finite set of numbers is used to represent the infinite real number system. Thus, we can deduce that we will have some sort of error appearing from time to time. This is more generally termed Floating Point Error and occurs in computers due to hardware limitations.

The following paper goes someway in explaining why and how this happens and also possible solutions in how to approach this issue.

Paper reference:- http://www.lexjansen.com/phuse/2008/cs/cs08.pdf

Data Imputation versus Hardcoding

Please contribute

  • Definitions
    • Data Imputation
    • Hardcoding
  • Issues
  • Recommendation

Integrity of a data transfer

At a minimum, all data transfers should be validated by checking the observation counts for each SAS dataset
or the record counts in other formats, against counts provided by the sender. It is also helpful if the sender
can provide a checksum for each file transferred since this also ensures all content made it’s way to you
without transmission errors. There are many freeware programs available to calculate checksums for any file at websites such as http://sourceforge.net/ .

Macros

Draft section : recommendations particularly relevant for the development and use of macros and macro libraries.

Macros are particularly useful under the following circumstances:

  • Program code is used repeatedly
  • A number of steps must be taken conditionally, and the logic for these is clearly fixed (no need to think of all the steps that should be included in a program under a specific situation: the macro will deduce them for you and generate the appropriate data step or proc step code)
  • There is no trivial solution via “ordinary” SAS code
  • Their application must be easier as to program the code itself!
  • The usage helps users avoiding errors and omissions.

If used appropriately the following benefits can be achieved:

  • Increase in quality by avoiding programming bugs and errors
  • Savings in time and resources
  • Enforcement of standards, e.g. standard methods and standard outputs
  • Work can be more enjoyable as programmers can focus on the non-routine work

Ideally Macro development should follow a few rules:

  • Macro headers should clearly state all changes to environment and data that result from execution. Changes should be limited to those necessary for the focused purpose of the macro:
    • strictly controlled changes to input data and creation of output data
    • clear temporary data set clutter
    • no unexpected changes to system settings (options, titles, footnotes, etc)
    • no unexpected changes to external symbol tables
  • Scope of macro variables should be explicitly controlled using %global and %local statements.
  • Method of macro variable creation should demonstrate awareness of default scope:
  • The log matters:
    • Use Base SAS techniques whenever possible to avoid excessive code generation (log bloat). For example, macro definition should use DATA step array and DO loop processing rather than Macro %DO looping.
    • But use pure Macro Language for routine utility macros (see details, below).
    • Use appropriate comment style in macro definitions to properly annotate the SAS log when MPRINT in on. For example, use %* style commenting to explain macro logic, but /* style commenting to explain resulting code. (Or * style or PUT statement commenting as appropriate.)
    • Allow the users to control the appearance of the log via MPRINT, SYMBOLGEN, and MLOGIC.
  • Code within a macro definition should be germane, limited to the specific purpose of the macro. The use of a central repository for macros (“Macro library”) is suggested.
  • Macro Library: Code for routine tasks (eg, parameter checking, system and environment checking, messaging, etc.) should be handled by dedicated utility macros. Code for such routine tasks should not overwhelm the current macro definition, obscuring the purpose, and creating unnecessary maintenance overhead and lack of consistency within a library.
  • Macro Library: Parameter naming conventions should be used for common parameters such as input/output libnames and data sets. Explicit and transparent control of macro variable scope again becomes crucial to avoid accidental change of external symbol tables
  • Macro Library: Use pure Macro Language definitions whenever possible to improve program flow and avoid producing unnecessary Base SAS code. Returning a list of data set variable, checking for macro var existence, returning data set obs count can all be achieved without BASE SAS code. Such macros can be called “inline” without unnecessary overhead or interruption of program flow.

For example, instead of %count_ds_obs definition that uses DATA Step code and interrupts program flow like

%let n_obs = %count_ds_obs(DSIN=myData);%if &n_obs > 0 %then %do;  ... more statements ...%end;

an inline, pure Macro Language implemetation allows streamlined code:

%if %inline_ds_obs(DSIN=myData) > 0 %then %do;  ... more statements ...%end;

Source: Sunil Gupta, Senior SAS Consultant, Gupta Programming http://www.sascommunity.org/wiki/Good_Programming_Practice_for_Clinical_Trials

SAS Programmers Tools

Are you new to SAS and wondering how to write SAS programs?

Most SAS programmers use the built-in SAS enhanced editor for their daily works. Sometimes, this editor is replaced by the code editor of SAS Enterprise Guide which provide other features like the Log tab, Output data tab and Results tab. However, some SAS users like their text editor to be very customizable and full of features which may or may not be in the enhanced editor.

If you find that your current editor is insufficient in handling your work you are not alone. We have found some alternative editors and below are some of the text editors I have come across that you can use instead of the pre-built SAS editor:

TextPad: is a full-featured text editor offering a spelling checker, macros, and powerful formatting and file-storage options from Helios Software Solutions.

This is a great program – it’s a powerful text editing tool that’s really comfortable to use. Textpad has a very clean, simple interface that deals only with text – that is, it doesn’t let you change font halfway down the page, or make text underlined or italic; it’s built purely to deal with the content, and does that job EXTREMELY well.

These features are excellent for SAS macro programming and SCL programming. Besides these, Textpad has a built-in compiler for Java which allows for rapid switching to Java coding that is occasionally required.

Below is a screen shot of the editor:

Textpad has many macro features that allows for repetitive actions to be recorded and recycled easily.

Crimson Editor is a professional source editor for Windows Open source from Ingyu Kang and one of the most popular editor available for programmers to use.

This editor also allow programmers to install schematics (define tools) that will highlight sections of your SAS programs.

Below is a screen shot of the editor:

In summary, there are many options to help a SAS programmer increase efficiency, write cleaner code, or make SAS life easier. There are other popular editors such as Emacs but I don’t have a lot of experience using it thus I cannot comment on it properly. Your style of programming will influence the type of editor you will use.

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.