Category Archives: Macros

Project Plan: CDISC Implementation

CDISC standards have been in development for many years. There are now methodologies and technologies that would make the transformation of non-standard data into CDISC-compliance with ease. Clinical trials have evolved and become more complex and this requires a new set of skills outside of clinical research – Project Management.

As with many projects, CDISC is a huge undertake. It requires resources, technology and knowledge-transfer. The industry (FDA for example) has been working on standardization for years but on September 2013, it became official, in which the FDA released a ‘Position Statement‘.

So what is CDISC? We can say that it is way of naming convention for XPT files, or field names naming conventions or rules for handling unusual data. Currently, there are two main components of CDISC: SDTM (Study Data Tabulation Model) and aDAM (Analysis Data Model).

As a project manager and with the right tool, you can look to a single source project information to manage the project through its life-cycle – from planning, through execution, to completion.

1) Define Scope: This is where you’re tested on everything that has to do with getting a project up and running: what’s in the charter, developing the preliminary scope, understanding what your stakeholders need, and how your organization handles projects.

The scope document is a form of a requirement document which will help you identify the goals for this project. It can also be used as a communication method to other managers and team members to set the appropriate level of expectations.

The project scope management plan is a really important tool in your project. You need to make sure that what you’re delivering matches what you wrote down in the scope statement.

2) Define Tasks: we now need to document all the tasks that are required in implementing and transforming your data to CDISC.

Project Tasks  (Work packages) Estimates (work unit)
Initial data standards review 27
Data Integrity review 17
Create transformation models 35

The work breakdown structure (wbs) provides the foundation for defining work as it relates to project objectives. The scope of work in terms of deliverables and to facilitate communication between the project manager and stakeholders throughout the life of the project. Hence, even though, preliminary at first, it is a key input to other project management processes and deliverables.

3) Project Plan: Once we completed the initiation phase (preliminary estimates), we need to create a project plan assigning resources to project and schedule those tasks. Project schedules can be presented in many ways, including simple lists, bar charts with dates, and network logic diagrams with dates, to name just a few. A sample of the project plan is shown below:

project plan sample
image from Meta‐Xceed paper about CDISC

4) Validation Step: Remember 21 CFR Part 11 compliance for Computer Systems Validation? The risk management effort is not a one-time activity on the project. Uncertainty is directly associated with the change being produced by a project. The following lists some of the tasks that are performed as it pertains to validation.

  • Risk Assessment: Different organizations have different approaches towards validation of programs. This is partly due to varying interpretations of the regulations and also  due to how different managers and organizations function. Assess the level of validation that needs to take place.
  • Test Plan: In accordance with the project plan and, if not, to determine how to address any deviation. Test planning is essential in:  ensuring testing identifies and reveals as many errors as possible and to acceptable levels of quality.

test plan-cdisc

  • Summary Results: This is all the findings documented during testing.

An effective risk management process involves first identifying and defining risk factors that could affect the various stages of the CDISC implementation process as well as specific aspects of the project. riskplan

5) Transformation Specification: Dataset transformation is a process in which a set of source datasets and its variables are changed to  meet new standard requirements. Some changes will occur during this step: For example, variable name must be 8 chars long. The variable label must not be more than 40 chars in length. Combining values from multiple sources (datasets) into on variable.

6) Applying Transformation: This is done according to specification however, this document is active during the duration of a project and can change. There are now many tools available to help with this tasks as it could be time consuming and resource intensive to update the source code (SAS) manually. Transdata, CDISCXpres, SAS CDIDefine-it; just to name a few.

7) Verification Reports: The validation test plan will detail the specific test cases that need to be implemented  to ensure quality of the transformation. For example, a common report is the “Duplicate Variable” report.

8) Special Purpose Domain: CDISC has several special purpose domains: CO (comments), RELREC (related records or relationship between two datasets) and SUPPQUAL (supplemental qualifiers for non-standards variables).

9) Data Definition Documentation: In order to understand what all the variables are and how they are derived, we need a annotation document. This is the document that will be included during data submission. SAS PROC CONTENTS can help in the generation of this type of metadata documentation. The last step in the project plan for CDISC implementation is to generate the documentation in either PDF  or XML format.

CDISC has established data standards to speed-up data review and FDA is now suggesting that soon this will become the norm. Pharmaceuticals, bio-technologies companies and many sponsors within clinical research are now better equipped to improve CDISC implementation.

Need SAS programmers? RA eClinica can help provide resources in-house / off-shore to facilitate FDA review by supporting CDISC mapping, SDTM validation tool, data conversion and CDASH compliant eCRFs.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

Advertisements

CDISC/CDASH Standards at your Fingertips

A standard database structure using CDISC (Clinical Data Interchange Standards Consortium) and CDASH (Clinical Data Acquisition Standards Harmonization) standards can facilitate the collection, exchange, reporting, and submission of clinical data to the FDA and EMEA. CDISC and CDASH standards provide reusability and scalability to EDC (electronic data capture) trials.

There are some defiance in implementing CDISC in EDC CDMS:

1. Key personnel in companies must be committed to implementing the CDISC/CDASH standards.

2. There is an initial cost for deployment of new technology: SDTM Data Translation Software, Data Storage and Hosting, Data Distribution and Reporting Software.

3. It can be difficult to understand and interpret complex SDTM Metadata concepts and the different implementation guides.

4. Deciding at what point in a study to apply the standards can be challenging: in the study design process, during data collection within the CDMS [CDASH via EDC tools], in SAS prior to report generation [ADaM], or after study completion prior to submission [SDTM].

5. Data management staff [CDM, clinical programmers], biostatisticians, and clinical monitors may find it difficult to converge on a new standard when designing standard libraries and processes.

6. Implementing new standards involves reorganizing the operations of (an organization) so as to improve efficiency [processes and SOPs].

7. Members of Data Management team must be retrained on the use of new software and CDISC/CDASH standards.

standards8. There are technical obstacles related to implementation in several EDC systems, including 8 character limitations [SAS] on numerous variables, determining when to use supplemental qualifiers versus creating new domains, and creating vertical data structure.

Comments? Join us at {EDC Developer}

Anayansi Gamboa, MPM, an EDC Developer Consultant and clinical programmer for the Pharmaceutical and Biotech industry with more than 13 years of experience.

Available for short-term contracts or ad-hoc requests. See my specialties section (Oracle, SQL Server, EDC Inform, EDC Rave, OpenClinica, SAS and other CDM tools)

As the 3 C’s of life states: Choices, Chances and Changes- you must make a choice to take a chance or your life will never change. I continually seek to implement means of improving processes to reduce cycle time and decrease work effort.

Subscribe to my blog’s RSS feed and email newsletter to get immediate updates on latest news, articles, and tips. I am available on LinkedIn. Connect with me there for technical discussions.

Fair Use Notice: This article/video contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes that go beyond fair use, you must obtain permission from the copyright owner. Fair Use notwithstanding we will immediately comply with any copyright owner who wants their material removed or modified, wants us to link to their website or wants us to add their photo.

Disclaimer: The EDC Developer blog is “one man’s opinion”. Anything that is said on the report is either opinion, criticism, information or commentary. If making any type of investment or legal decision it would be wise to contact or consult a professional before making that decision.

Disclaimer:De inhoud van deze columns weerspiegelen niet per definitie de mening van {EDC Developer}.

Disclaimer: The legal entity on this blog is registered as Doing Business As (DBA) – Trade Name – Fictitious Name – Assumed Name as “GAMBOA”.

SAS: Problem Solving 1

Today we want to provide you with some problem-resolution options for a simple situation.

Problem:

We have 3 variables that we will call Var1, Var2, Var3. Their values ranges from 1-9 and we would like to create new variables that would flag a response based on their value on any of the 3 previous variables.

Sample:

Response 7 has one (1) variable Var1, then VarFlag1 should be equal to 1

If the same response 7 has a value 3 on Var3, then the VarFlag3 should be equal to 1

Solution 1: Data step

data mydata;
input Var1, Var2, Var3;
array vars Var1 Var2 Var3;
array flags flag1-flag9;
do over vars;
if 1 <=vars<=9 then
flags[vars]=1;
end;

cards;
123
987
2.5 7 9;
run;

proc print; run;

Solution 2: array solution

array flag{*} flag1-flag9;
do j=1 to 9;
flag{j}=(index(Var1||Var2||Var3,j)>0);
end;

Solution 2: Macro solution

%macro SET_Flags(Flag,num);
%do 1=1 % to &n;
&Flag.&i=(Var=&i or Var=&i or Var=&i);
%end;
%mend Set_Flags;

Data Mydata;
%Set_Flags(Flag,5);
run;

Professional Timeline – Clinical Programmer

Professional Timeline

Curriculum Vitae
CV

 

anayansi gamboa

SAS Proficiency Test

Here are some common SAS questions:

1- What happens if no VAR statement is used in PROC PRINT?

2- Understand how SAS processes a data step

3- Explain what a (data step) SELECT statement does

4- What is the default length of a numeric variable?

5-What is the difference between DO WHILE and DO UNTIL?

6-What does BY after SET A; do?

7-What does BY after SET A B; do?

8-What is a sub-setting IF statement?

If you pass it doesn’t mean you’re good; but if you’re good, you should pass

9- What’s the difference between INFILE and INPUT?

10-What information is given by a Proc Contents?

11-How do you modify the header?

12-Which elements can you change? (in reference to question 10-11)

13-Which elements can you NOT change without re-creating the dataset?(in reference to question 10-11)

14-What is the purpose of the single trailing @ sign?

15-What statements are used to include or exclude specific variables in a data step?

16-How can you generate a 2-dimensional cross-tabulation?

17-In the macro language, all macro commands begin with?

18-What occurs if multiple datasets are included in a SET statement?

19-What symbol is used to concatenate two character values?

20-Explain the substr function

21-What are the differences among these functions? Round(), Ceil(), Floor()

22-What happens if you use the same in= variable for 2 datasets?

23-How do you begin a data step if you do not want to create a SAS dataset?

24-What are the automatic variables first. and last. ?

25-What does a 2 level input dataset name such as MydataSet.Mydata indicate ?

26-What is the default libname, where one-level dataset names go?

27-What statement allows you to keep track of previous values of variables or keep a running total?

28-What happens if you retain the value of a variable in the incoming dataset?

29-When an invalid data field is encountered when inputting a numeric variable, what happens?

30-What’s the difference between $w. and $CHARw. informats?

31-Explain the _type_ variable generated by proc summary

32-What is the purpose of a %GLOBAL statement?

33-What do CALL SYMPUT and CALL SYMGET do?

34-What date is the reference date for calculating the value of a SAS date variable?

35-What is _n_?

There are 10 kinds of people in the world: those who understand binary, and those who don’t.

Those are SAS most common questions you will be asked during a SAS programming job interview or if you plan in taking the SAS based certification.

In future blogs, I will try to cover each of those questions individually with some demonstrations.

Good luck!

Your comments and questions are valued and encouraged.
Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica, Open Source and Oracle Clinical.

 

anayansi gamboa

How to Use a SAS Macro Video

Although all SAS users are familiar with procedures (or procs), many users may not be familiar with macros. This four-minute video demonstrates how to run a macro. The new %LCA_Distal macro is used as an example, but the steps are generally applicable to any macro, whether or not it was created by The Methodology Center.

Source: Penn State Methodology Center

FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

A neat new trick to trim your macro variables in 9.3

SAS macro variables are a great way to store a calculated value, so you can use it later in your code. They are not just limited to the data step — you can also use macro variables in title statements, axis statements, etc.

By default, the macro variable will be padded with blanks (per the width of the format). Here’s a simple example. Notice how the ‘avg’ (62.3) has extra spaces padded on the left in the title and in the label for the reference line:

goptions xpixels=600 ypixels=500 gunit=pct htitle=5 htext=3;

proc sql;
select avg(height) format=comma12.1 into :avg from sashelp.class;
quit; run;

axis1 label=(‘Inches’) reflabel=(c=red “&avg”);
axis2 label=none offset=(3,6);

pattern1 v=solid c=pink;
pattern2 v=solid c=cx67C8FF;

title “The average class height is &avg inches”;
proc gchart data=sashelp.class;
hbar name / type=sum sumvar=height descending
subgroup=sex nostats nolegend coutline=gray
ref=&avg cref=red raxis=axis1 maxis=axis2 noframe;
run;

One way to have sql trim the blanks when creating the macro variable is to use the the ‘separated by’ option, and tell it the values are separated by blanks. This was more intended for the scenario where you’re outputting multiple values into multiple macro variables… but is also a clever way to trim the blanks when creating a single macro variable. See how much nicer the title and reference line label look with the blanks trimmed!

proc sql;
select avg(height) format=comma12.1 into :avg separated by ‘ ‘ from sashelp.class;
quit; run;

And in SAS 9.3, we’ve added an even more elegant solution – the ‘trimmed’ option!

proc sql;
select avg(height) format=comma12.1 into :avg trimmed from sashelp.class;
quit; run;

You can learn lots of ‘tricks’ like this, that will make your graphs look better (and make your life simpler) in the SAS/GRAPH training course!

Source: Robert Ellison – http://blogs.sas.com/content/sastraining/2013/01/17/a-neat-new-trick-to-trim-your-macro-variables-in-9-3/
FAIR USE-“Copyright

Disclaimer Under Section 107 of the Copyright Act 1976, allowance is
made for “fair use” for purposes such as criticism, comment, news
reporting, teaching, scholarship, and research. Fair use is a use
permitted by copyright statute that might otherwise be infringing.
Non-profit, educational or personal use tips the balance in favor of
fair use.”

SAS Cheat Sheet – Part 6

The macro facility is a tool you can use in your programming./*

Macro Language

%DO macro-var=start_value %TO end_value

%DO %WHILE (expression); /*Executes a section of a macro repetitively while a condition is true*/

%DO %UNTIL (expression); /*Executes a section of a macro repetitively until a condition is true*/

Macro variables can be stored in either the global symbol table or in a local symbol table

%GLOBAL macro-variable(s); /*Creates macro variables that are available during the execution of an entire SAS session*/

%IF expression %THEN action; <%ELSE action;> /Conditionally process a portion of a macro*/

%LENGTH (character string | text expression) /*Returns the length of a string*/

The name assigned to a macro variable must be a valid SAS name

%LET macro-variable =<value>; /*Creates a macro variable and assigns it a value*/

%MACRO mname (<pp1><…,ppn><kp1=value<..<kpn=v>);/*Begins a macro definition*/

%MEND <macro-name>; /*Ends a macro definition*/

%SCAN(argument,n<,delimiters>)/*Search for a word that is specified by its position in a string*/

%SUBSTR (argument,position<,length>)/*Produce a substring of a character string*/

%UPCASE (character string | text expression) /*Convert values to uppercase*/

Macro variable values are text values

Macro Quoting

%QUOTE | %NRQUOTE and %BQUOTE | %NRBQUOTE /*Mask special characters and mnemonic operators in a resolved value at macro execution */

%STR | %NRSTR /*Mask special characters and mnemonic operators in constant text at macro compilation */

%SUPERQ /*Masks special characters/mnemonic operators at macro execution but prevents further resolution of the value*/

Example:

*----------------------------------------------------------------*//* This macro will produce summary statistics ----------------   */

%Macro safety1;
%odscmd (start,portnum=11,rptname=AEActivityReport);
Data _null_;
thedate=today();
fstdate = mdy(month(thedate), 1, year(thedate));
lstdate = intnx(‘month’, fstdate, 1)-1;
call symput (‘strtdate’, put(fstdate, date9.));
call symput (‘stpdate’, put(thedate, date9.));
run;

%put strtdate: &strtdate;
%put stpdate: &stpdate;
* more code goes here;

/*----------------------------------------------------------------*/

Good Programming Practice for Clinical Trials by Sunil Gupta

The following are draft recommendations a Good Programming Practice for analysis, reporting and data manipulation in Clinical Trials and the Healthcare industries.

The purpose is to encourage contributions from across companies, non-profit organizations and regulators in an attempt to create a consensus recommendation. The ambition is that this page becomes recognized by the Pharmaceutical Industry, Clinical Research and Health Care Organizations as well as Regulatory Authorities.

The hope is that the Practice can be reviewed and endorsedby the relevant management teams of several Pharmaceutical companies and major Contract Research Organizations and promoted through relevant professional organizations such as PharmaSUG, PhUSE, PSI, CDISC.

Introduction

The Good Programming Practices are defined in order to:

  • Ensure the clarity of the code and facilitate code review;
  • Save time in case of maintenance, and ease the transfer of code among programmers or companies;
  • Minimize the need for code maintenance by robust programming;
  • Minimize the development effort by development and re-use of standard code and by use of dynamic (easily adaptable) code;
  • Minimize the resources needed at execution time (improve the efficiency of the code);
  • Reduce the risk of logical errors.
  • Meet regulatory requirements regarding validation and 21CFRPart11 compliance

Note: As often, the various guidelines provided hereafter may conflict with one another if applied in too rigorous a way. Clarity, efficiency, re-usability, adaptability and robustness of the code are all important, and must be balanced in the programming practice.

Regulatory Requirements

Validation

21CFRPart11 compliance

Readability and Maintainability

Language

English is an international language and study protocols, study reports for practical reasons (regulatory authorities, inlicensing, outlicensing, partnerships, mergers) are mostly written in English, therefore it is recommended to write the SAS code and comments in English.

Header and Revision History

  • Include a header for every program (template below).
**********************************************************;* Program name      :** Author            :** Date created      :** Study             : (Study number)*                     (Study title)** Purpose           :** Template          :** Inputs            :** Outputs           :** Program completed : Yes/No** Updated by        : (Name) – (Date): *                            (Modification and Reason)**********************************************************;
  • In addition to your name or initials, use your login ID to identify yourself in the header. This is so there is no ambiguity on the identify of each programmer.
  • Update the revision history at each code modification made after the finalization of the first version of a program.

Note: When you copy a program from another study, you became the author of this program, and you should clear the revision history. You can specify the origin of the program under the “Template” section of the header.

Below is an example with comments of an alternative comment block that I think is more useful for Open Source programming. PaulOldenKamp 16:54, 5 April 2009 (UTC)

/** ---------------------------------------------------------------------------------------- $Id: os3a_autoexec.sas 152 2008-11-17 01:48:40Z Paul_ok01 $ <== Id info automatically inserted with each commit to Subversion version control Application: OS3A - Common Programs Description: OS3A session initialization program. Previous Program: None Saved as: c:\os3a\trunk\os3a_autoexec.sas <== locations, local and web, where the pgm http://os3a.svn.sourceforge.net/viewvc/os3a/trunk/os3a_autoexec.sas can be found Change History: Date Prog Ref Description 04/26/2008 PMO [1] Initial programming for os3a <== date, programmer initials, ref number [1] to link to specific location of change Copyright: Copyright (c) 2008 OS3A Program. All rights reserved. <== Always tell who owns the program so one Copyright Contact: paul_ok01@users.sourceforge.net can ask permission from the copyright holder License: Eclipse Public License v1.0 <== Tell folks how they are licenced to use This program and the accompanying materials are made available under the program. the terms of the Eclipse Public License v1.0 which accompanies this distribution, and is available at www.eclipse.org/legal/epl-v10.html Contributors: Paul OldenKamp, POK_Programming@OldenKamp.org - Develop initial pgm. <== Identify significant contributors <== tag identifies start of info @purpose OS3A system initialization program. used by the Codedoc Perl script to Set up initial options and global macro variables. produce external HTML documentation @param SYSPARM - input provided from program initiation call; default: main_FORE for production. @symbol sysRoot - system root location; Windows - C:/, UNIX - / @symbol remove_cmd - system command to remove file; Windows - erase, UNIX - rm @symbol os3aRoot - directory location of os3a root @symbol futsRoot - directory location of FUTS top level macros. @symbol Root - directory location of sub-project identified with Four Letter Acronym ----------------------------------------------------------------------------------------- */

The results from encoding the header and comments in a SAS program can be seen on the CodeDoc web page. See http://www.thotwave.com/products/codedoc.jsp.
CodeDoc Download Page

Comments

  • Include a comment before each major DATA/PROC step, especially when you are doing something complex or non-standard. Comments should be comprehensive, and should describe the rationale and not simply the action. For example, do not comment “Access demography data”; instead explain which data elements and why they are needed.
  • Organize the comments into a hierarchy.
  • Do not include numbers in comments.

Reason: It avoids heavy update when removing or inserting sections.

Naming Conventions

  • Use explicit and meaningful names for variables and datasets, with a maximum length of 8.
  • For permanent datasets, use a meaningful dataset label and variable labels.
  • When possible, never use the same name for a dataset more than once in the program.

Note: However, keep in mind that large intermediate files take a lot of SAS Workspace.

  • Name IN variable using “in” plus a meaningful reference to the dataset.

Example:

data aelst;   merge aesaes (in=inae) patpat (in=inpat);   by patno;   if inae and inpat;run;
  • Labels must have a maximum length of 40 characters.

Code Structure

  • It is mandatory to include libnames, options and formats in a separate setup program unless these are temporary formats or temporary options that are reset after being used.

Reason: It will guarantee that changes of the environment are taken into account in all programs run afterwards.

  • Use standard company macros to read in libnames and settings, to write out datasets, and for standard calculating and reporting.
  • One statement per line, but several are allowed if small and repeated or related. Long statements should be split across multiple lines.
  • Control system settings to show all executed code in the log as the default, in as clear manner as possible. The log should not be so lengthy that the programmer cannot easily navigate (if so, use highly visible comments with sufficient white space). System settings should be able to be easily changed in order for a user to debug a section of code or a macro, in order to temporarily display the %included code, resolved macro names, and logic.
  • Use a standard sequence for placing statements and group like statements together.
  1. Within a program:
    1. %LET statements and macro definitions
    2. Input steps
    3. Calculations
    4. Save final (permanent) datasets and created outputs
  2. Within a DATA step:
    1. All non-executable statements first (e.g. ATTRIB, LENGTH, KEEP…)
    2. All executable statements next

Reason: It increases the readability of the program.

  • Left-justify DATA, PROC, OPTIONS statements, indent all statements within.

Example:

proc means data=osevit;   var prmres;   by prmcod treat;run;
  • End every DATA/PROC step with a left-aligned RUN statement.

Reason: It explicitly defines the step boundary.

  • Insert at least one blank line after each RUN statement in DATA/PROC steps.
  • Indent statements within a DO loop, align END with DO.
  • Avoid having too many nested DO loop and IF-ELSE statements.
  • In case of interlinked DO loop, add a comment at the start (DO) and end (END) of each loop.

Example:

data test01;  do patno=1 to 40; * cycle thru patients;    do visit=1 to 3; * cycle thru visits;      output;     end; * cycle thru visits;  end; * cycle thru patients;run;
  • Insert parentheses in meaningful places in order to clarify the sequence in which mathematical or logical operations are performed.

Example:

data test02;  set test01;  if (visit=0  and vdate lt adate1)   or (visit=99 and vdate gt adate2) then delete;run;

Style conventions

Draft section : this may not be specific to clinical programming, but may be of use when considering a general standard for sharing programs.

Use of analysis datasets

For discussion of why programming output directly from raw data is generally avoided.

Efficiency

  • When you input or output a SAS dataset, use a KEEP (preferred to DROP) statement to keep only the needed variables.

Reason: The SAS system loads only the specified variables into the Program Data Vector, eliminating all other variables from being loaded.

  • When subsetting a SAS dataset, use a WHERE statement rather than IF, if possible.

Reason: WHERE subsets the data before entering it into the Program Data Vector, whereas IF subsets the data after inputting the entire dataset.

  • When using IF condition, use IF/ELSE for mutually exclusive conditions, and check the most likely condition first.

Reason: The ELSE/IF will check only those observations that fail the first IF condition. With the IF/IF, all observations will be checked twice. Also, consider the use of a SELECT statement instead of IF/ELSE, as it may be more readable.

  • Avoid unnecessary sorting. CLASS statement can be used in some procedure to perform by-group processing without sorting the data.

Example:

proc means data=osevit;  var prmres;  class treat;run;
  • If possible (i.e. not a sorting variable), use character values for categorical variables or flags instead of numeric values.

Reason: It saves space. A character “1” uses one byte (if length is set to one), whereas a numeric 1 uses eight bytes.

  • Use the LENGTH statement to reduce variable size.

Reason: Storage space can be reduced significantly.
Note: Keep in mind that a too limited variable length could reduce the robustness of the code (lead to truncation with different sets of data).

  • Use simple macros for repeating code.

Robustness

  • Use the MSGLEVEL=I option in order to have all informational, note, warning, and error messages sent to the LOG.
  • In the final code, there should be no dead code that does not work or that is not used. This must be removed from the program.
  • Code to allow checking of the program or of the data (on all data or on a subset of patients such as clean patients, discontinued patients, patients with SAE or patients with odd data) is encouraged and should be built throughout the program. This code can be easily activated during the development phase or commented out during a production run using the piece of code detailed in Section 6.
  • It is not acceptable to have avoidable notes or warnings in the log (mandatory).

Reason: They can often lead to ambiguities, confusion, or actual error (e.g. erroneous merging, uninitialized variables, automatic numeric/character conversions, automatic formatting, operation on missing data…).
Note: If such a warning message is unavoidable, an explanation has to be given in the program (mandatory).

  • Always use DATA= in a PROC statement (mandatory).

Reason: It ensures correct dataset referencing, makes program easy to follow, and provides internal documentation.

  • Be careful when merging datasets. Erroneous merging may occur when:
  1. No BY statement is specified (set system option MERGENOBY=WARN or ERROR).
  2. Some variables, other than BY variables, exist in the two datasets (set system option MSGLEVEL=I), S writes a warning to the SAS log whenever a MERGE statement would cause variables to be overwritten at which the values of the last dataset on the MERGE statement are kept).
  3. More than one dataset contain repeats of BY values. A WARNING though not an ERROR is produced in the LOG. If you really need, PROC SQL is the only way to perform such many-to-many merges.

Reason: One has to routinely carefully check the SASLOG as the above leads to WARNING messages rather ERROR messages yet the resulting dataset is rarely correct.

  • When coding IF-THEN-ELSE constructs use a final ELSE statement to trap any observations that do not meet the conditions in the IF-THEN clauses.

Reason: You can only be sure that all possible combinations of data are covered if there is a final ELSE statement.

  • When coding a user-defined FORMAT, include the keyword ‘other’ on the left side of the equals sign so that all possible values have an entry in the format.

Reason: A missing entry in a user-defined FORMAT can be difficult to detect. The simplest way to identify this potential problem is to ensure that all values are assigned a format.
Note: This does not apply to INFORMATs. It could be more helpful to get a WARNING message when trying to INPUT data of unexpected format.

  • Try to produce code that will operate correctly with unusual data in unexpected situations (e.g. missing data).

Code for Data Checks

Build checks so that their purpose is clear, so that they can be toggled on or off, and remove them once they are no longer needed.

Activate/Deactivate Pieces of Code

In the beginning of the program, define a macro variable that you set to blank during the development phase or that you set equal to * for the production run:

%let c=;  or  %let c=*;

For the pieces of code that check the data/program, start each line with the macro variable defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c run;

This code will be executed if &c is blank (development), but will be commented out when &c=* (production).

Perform Checks on a Subset of Patients

In a separate code that you store under the study MACRO folder, list the subset of patients (clean patients, discontinued patients, patients with SAE or patients with odd data) that you want to look at:

%macro select;2076 2162 2271 2449%mend;

In the beginning of the program, define a second macro variable that you set equal to * when you want to perform checks on all data or to blank when you are interested in a subset of patients:

%let s=*;  or  %let s=;

For each checking code, add a piece of code that allows subsetting the data, and start each line of this piece of code with the 2 macro variables defined above:

&c title “Check the visits for each patient”;&c proc freq data=patvis01;&c    table patno*visit;&c &s where patno in (%select);&c run;

The check will be performed only if &c is blank, and it will be applied to all patients if &s=* or on the subset of patients if &s is blank.
Better still: input the list of check case IDs as a dataset.

Floating Point Error

Consider the real number system that we are familiar with. This decimal system (0 → ± ∞) is obviously infinite. Most computers use floating point representation, in which, a finite set of numbers is used to represent the infinite real number system. Thus, we can deduce that we will have some sort of error appearing from time to time. This is more generally termed Floating Point Error and occurs in computers due to hardware limitations.

The following paper goes someway in explaining why and how this happens and also possible solutions in how to approach this issue.

Paper reference:- http://www.lexjansen.com/phuse/2008/cs/cs08.pdf

Data Imputation versus Hardcoding

Please contribute

  • Definitions
    • Data Imputation
    • Hardcoding
  • Issues
  • Recommendation

Integrity of a data transfer

At a minimum, all data transfers should be validated by checking the observation counts for each SAS dataset
or the record counts in other formats, against counts provided by the sender. It is also helpful if the sender
can provide a checksum for each file transferred since this also ensures all content made it’s way to you
without transmission errors. There are many freeware programs available to calculate checksums for any file at websites such as http://sourceforge.net/ .

Macros

Draft section : recommendations particularly relevant for the development and use of macros and macro libraries.

Macros are particularly useful under the following circumstances:

  • Program code is used repeatedly
  • A number of steps must be taken conditionally, and the logic for these is clearly fixed (no need to think of all the steps that should be included in a program under a specific situation: the macro will deduce them for you and generate the appropriate data step or proc step code)
  • There is no trivial solution via “ordinary” SAS code
  • Their application must be easier as to program the code itself!
  • The usage helps users avoiding errors and omissions.

If used appropriately the following benefits can be achieved:

  • Increase in quality by avoiding programming bugs and errors
  • Savings in time and resources
  • Enforcement of standards, e.g. standard methods and standard outputs
  • Work can be more enjoyable as programmers can focus on the non-routine work

Ideally Macro development should follow a few rules:

  • Macro headers should clearly state all changes to environment and data that result from execution. Changes should be limited to those necessary for the focused purpose of the macro:
    • strictly controlled changes to input data and creation of output data
    • clear temporary data set clutter
    • no unexpected changes to system settings (options, titles, footnotes, etc)
    • no unexpected changes to external symbol tables
  • Scope of macro variables should be explicitly controlled using %global and %local statements.
  • Method of macro variable creation should demonstrate awareness of default scope:
  • The log matters:
    • Use Base SAS techniques whenever possible to avoid excessive code generation (log bloat). For example, macro definition should use DATA step array and DO loop processing rather than Macro %DO looping.
    • But use pure Macro Language for routine utility macros (see details, below).
    • Use appropriate comment style in macro definitions to properly annotate the SAS log when MPRINT in on. For example, use %* style commenting to explain macro logic, but /* style commenting to explain resulting code. (Or * style or PUT statement commenting as appropriate.)
    • Allow the users to control the appearance of the log via MPRINT, SYMBOLGEN, and MLOGIC.
  • Code within a macro definition should be germane, limited to the specific purpose of the macro. The use of a central repository for macros (“Macro library”) is suggested.
  • Macro Library: Code for routine tasks (eg, parameter checking, system and environment checking, messaging, etc.) should be handled by dedicated utility macros. Code for such routine tasks should not overwhelm the current macro definition, obscuring the purpose, and creating unnecessary maintenance overhead and lack of consistency within a library.
  • Macro Library: Parameter naming conventions should be used for common parameters such as input/output libnames and data sets. Explicit and transparent control of macro variable scope again becomes crucial to avoid accidental change of external symbol tables
  • Macro Library: Use pure Macro Language definitions whenever possible to improve program flow and avoid producing unnecessary Base SAS code. Returning a list of data set variable, checking for macro var existence, returning data set obs count can all be achieved without BASE SAS code. Such macros can be called “inline” without unnecessary overhead or interruption of program flow.

For example, instead of %count_ds_obs definition that uses DATA Step code and interrupts program flow like

%let n_obs = %count_ds_obs(DSIN=myData);%if &n_obs > 0 %then %do;  ... more statements ...%end;

an inline, pure Macro Language implemetation allows streamlined code:

%if %inline_ds_obs(DSIN=myData) > 0 %then %do;  ... more statements ...%end;

Source: Sunil Gupta, Senior SAS Consultant, Gupta Programming http://www.sascommunity.org/wiki/Good_Programming_Practice_for_Clinical_Trials