Tag Archives: Data Validation

Understanding Audit Trail Requirements in Electronic GxP Systems

Computerized systems are used throughout the life sciences industry to support various regulated activities, which in turn generate many types of electronic records.  These electronic records must be maintained according to regulatory requirements contained within FDA’s 21 CFR Part 11 for US jurisdictions and Eudralex Volume 4 Annex 11 for EU jurisdictions.  Therefore, we must ensure the GxP system which maintains the electronic record(s) is capable of meeting these regulatory requirements.

What to look for in Audit Trail?

  • Is the audit trail activated? SOP?
  • Record of reviews? (most companies trust the electronic systems audit trail and generates electronic paper version of it without a full review)
  • How to prevent or detect any deletion or modification
    of audit trail data? Training of staff?
  • Filter of audit trail

Can you prove data manipulation did not occur?

Persons must still comply with all applicable predicate rule requirements related to documentation of, for example, date (e.g. 58.130(e)), time, or sequencing of events, as well as any requirements for ensuring that changes to records do not obscure previous entries.

Consideration should be given, based on a risk assessment, to building into the system the creation of a record of all GMP-relevant changes and deletions (a system generated “audit trail”).

Audit trail content:

Audit trail content and reason it is required:
Identification of the User making the entry This is needed to ensure traceability.  This could be a user’s unique ID, however there should be a way of correlating this ID to the person.
Date and Time Stamp This is a critical element in documenting a sequence of events and vital to establishing an electronic record’s trustworthiness and reliability.  It can also be effective deterrent to records falsification.
Link to Record This is needed to ensure traceability.  This could be the record’s unique ID.
Original Value  

This is needed in order to be able to have a complete history and to be able reconstruct the sequence of events

New Value
Reason for Change This is only required if stipulated by the regulations pertaining to the audit trailed record.  (See below)

FDA / Regulators findings and complaints during Inspection of Audit Trail Data:

  • Audit User sometimes is hard to describe (e.g. user123 instead use full names of each user IDs thus requirement additional mapping)
  • Field IDs or Variables names are used instead of SAS labels or Field Labels (map field names with respective field text (e.g.  AETERM displayed instead use Reported Term for the Adverse Event)
  • Default values should be easily explained or meaningful (see annotated CRF)
  • Limited access to audit trail files (many systems with different reporting tools or extraction tool. Data is not fully integrated. Too many files and cannot be easily integrated).
  • No audit trail review process. Be prepared to update SOPs or current working practices to add review time of audit trails. It is expected that at least, every 90 days, qualified staff performed a review of the audit trail for their trials. Proper documentation, filing and signature should be in place.
  • Avoid using Excel or CSV files. Auditors are now asking for SAS datasets of the audit trails. Auditors are getting trained to generate their own output based on pre-defined set of parameters to allow auditors to summarize data and produce graphs.
  • Formatting issues when exporting into Excel, for example.  Numbers and dates fields change it to text fields.
Audit Trail Review

What data must be “audit trailed”?

When it comes to determining on which data the audit trail must be applied, the regulatory agencies (i.e. FDA and EMA) recommend following a risk based approach.

Following a “risk based approach”

In 2003, the FDA issued recommendations for compliance with 21 CFR Part 11 in the “Guidance for Industry – Part 11, Electronic Records; Electronic Signatures — Scope and Application” (see reference: Ref. [04]).  This guidance narrowed the scope of 21 CFR Part 11 and identified portions of the regulations where the agency would apply enforcement discretion, including audit trails. The agency recommends considering the following when deciding whether to apply audit trails:

  • Need to comply with predicate rule requirements
  • Justified and documented risk assessment to determine the potential effect on product quality
  • product safety
  • record integrity

With respect to predicate rule requirements, the agency states, “Persons must still comply with all applicable predicate rule requirements related to documentation of, for example, date (e.g., § 58.130(e)), time, or sequencing of events, as well as any requirements for ensuring that changes to records do not obscure previous entries.”  In the docket concerning the 21 CFR Part 11 Final Rule, the FDA states, “in general, the kinds of operator actions that need to be covered by an audit trail are those important enough to memorialize in the electronic record itself.” These are actions which would typically be recorded in corresponding paper records according to existing recordkeeping requirements.

The European regulatory agency also recommends following a risk based approach.  The Eudralex Annex 11 regulations state, “consideration should be given, based on a risk assessment, to building into the system the creation of a record of all GMP-relevant changes and deletions (a system generated “audit trail”).”

MHRA Audit

When does the Audit Trail begin?

The question of when to begin capturing audit trail information comes up quite often, as audit trail initiation requirements differ for data and document records.

For data records:

If the data is recorded directly to electronic storage by a person, the audit trail begins the instant the data hits the durable media.  It should be noted, that the audit trail does not need to capture every keystroke that is made before the data is committed to permanent storage. This can be illustrated in the following example involving a system that manages information related to the manufacturing of active pharmaceutical ingredients.  If during the process, an operator makes an error while typing the lot number of an ingredient, the audit trail does not need record every time the operator may have pressed the backspace key or the subsequent keystrokes to correct the typing error prior to pressing the ‘‘return key’’ (where pressing the return key would cause the information to be saved to a disk file).  However, any subsequent ‘‘saved’’ corrections made after the data is committed to permanent storage, must be part of the audit trail.

For document records:

If the document is subject to review and approval, the audit trail begins upon approval and issuing the document.  A document record undergoing routine modifications, must be version controlled and be managed via a controlled change process. However, the interim changes which are performed in a controlled manner, i.e. during drafting or review comments collection do not need to be audit trailed.  Once the new version of a document record is issued, it will supersede all previous versions.

Questions from Auditors: Got Answers?

When was data locked? Can you find this information easily on your audit trail files?

When was the database/system released for the trial? Again, how easily can you run a query and find this information?

When did data entry by investigator (site personnel) commence?

When was access given to site staff?

Source:

Part of this article was taking, with permission, from Montrium – Understanding Audit Trail Requirements in Electronic GXP Systems

Fair Use Notice: Images/logos/graphics on this page contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law).

Advertisements

How to Avoid Electronic Data Integrity Issues: 7 Techniques for your Next Validation Project

The idea of this article was taking (with permission from the original authors) from Montrium:  how-to-avoid-electronic-data-integrity-issues-7-techniques-for-your-next-validation-project

Regulatory agencies around the globe are causing life science companies to be increasingly concerned with data integrity.  This comes with no surprise given that Guidance Documents for Data Integrity have been published by the MHRAFDA (draft), and WHO (draft).  In fact, the recent rise in awareness of the topic has been so tremendous that, less than two years after the original publication, the MHRA released a new draft of its guidance whose scope has been broadened from GMP to all GxP data.

Is data integrity an issue of good documentation practices? You can read GCP information about this topic here.

Good Documentation Practices for SAS / EDC Developers

Are you practising GCP?

In computerised systems, failures in data integrity management can arise from poor or complete lack of system controls.  Human error or lack of awareness may also cause data integrity issues.  Deficiencies in data integrity management are crucial because they may lead to issues with product quality and/or patient safety and, ultimately may manifest themselves through patient injury or even death.

I recently was at the vendor qualification tool that uses a hand held device to read data while the physician or expert manually put pressure on someone’s body parts (e..g. pain related). I was not impressed. Even though it seems like a nice device with its own software, the entire process was manual and therefore, questionable data integrity. The measurement seems to be all over the place and you would need the right personnel at the clinical site to perform a more accurate reading since again, it was all manual and dependent of someone else used of the device.

I also questioned the calibration of this device. The sale’s person answer ? “Well, it is reading 0 and therefore, it is calibrated.”….Really? You mean to tell me you have no way of proving when you perform calibration? Where is the paper trail proving your device is accurate? You mean to tell me I have to truth your words? Or your device’s screen that reads ‘0’? Well, I have news for you. Tell that to the regulators when they audit the trial.

What is Data Integrity?

Data can be defined as any original and true copy of paper or electronic records.  In the broadest sense, data integrity refers to the extent to which data are complete, consistent and accurate.

To have integrity and to meet regulatory expectations, data must at least meet the ALCOA criteria. Data that is ALCOA-plus is even better.

Alcoa

 

What is a Computerised System?

computerised system is not only the set of hardware and software, but also includes the people and documentation (including user guides and operating procedures) that are used to accomplish a set of specific functions.  It is a regulatory expectation that computer hardware and software are qualified, while the complete computerised system is validated to demonstrate that it is fit for its intended use.

How can you demonstrate Electronic Data Integrity through Validation?

Here are some techniques to assist you in ensuring the reliability of GxP data generated and maintained in computerised systems.

Specifications

What to do

Why you should do this

Outline your expectations for data integrity within a requirements specification.

For example:

  • Define requirements for the data review processes.
  • Define requirements for data retention (retention period and data format).
Validation is meant to demonstrate a system’s fitness for intended use.  If you define requirements for data integrity, you will be more inclined to verify that both system and procedural controls for data integrity are in place.
Verify that the system has adequate technical controls to prevent unauthorised changes to the configuration settings.

For example:

  • Define the system configuration parameter within a configuration specification.
  • Verify that the system configuration is “locked” to end-users.  Only authorized administrators should have access to the areas of the system where configuration changes can be made.
The inspection agencies expect you to be able to reconstruct any of the activities resulting in the generation of a given raw data set.  A static system configuration is key to being able to do this.

 

Verification of Procedural Controls

What to do

Why you should do this

Confirm that procedures are in place to oversee the creation of user accounts.

For example:

  • Confirm that user accounts are uniquely tied to specific individuals.
  • Confirm that generic system administrator accounts have been disabled.
  • Confirm that user accounts can be disabled.
Shared logins or generic user accounts should not be used since these would render data non-attributable to individuals.

System administrator privileges (allowing activities such as data deletion or system configuration changes) should be assigned to unique named accounts.  Individuals with administrator access should log in under his named account that allows audit trails to be attributed to that specific individual.

Confirm that procedures are in place to oversee user access management.

For example:

  • Verify that a security matrix is maintained, listing the individuals authorized to access the system and with what privileges.
A security matrix is a visual tool for reviewing and evaluating whether appropriate permissions are assigned to an individual. The risk of tampering with data is reduced if users are restricted to areas of the system that solely allow them to perform their job functions.
Confirm that procedures are in place to oversee training.

For example:

  • Ensure that only qualified users are granted access to the system.
People make up the part of the system that is most prone to error (intentional or not).  Untrained or unqualified users may use the system incorrectly, leading to the generation of inaccurate data or even rendering the system inoperable.

Procedures can be implemented to instruct people on the correct usage of the system.  If followed, procedures can minimize data integrity issues caused by human error. Individuals should also be sensitized to the consequences and potential harm that could arise from data integrity issues resulting from system misuse.

Logical security procedures may outline controls (such as password policies) and codes of conduct (such as prohibition of password sharing) that contribute to maintaining data integrity.

 

Testing of Technical Controls

What to do

Why you should do this

Verify calculations performed on GxP data.

For example:

  • Devise a test scenario where input data is manipulated and double-check that the calculated output is exact.
When calculations are part of the system’s intended use, they must be verified to ensure that they produce accurate results.
Verify the system is capable of generating audit trails for GxP records.

For example:

  • Devise a test scenario where data is created, modified, and deleted.  Verify each action is captured in a computer-generated audit trail.
  • Verify the audit trail includes the identity of the user performing the action on the record
  • Verify the audit trail includes a time stamp
  • Verify the system time zone settings and synchronisation.
With the intent of minimizing the falsification of data, GxP record-keeping practices prevent data from being lost or obscured.  Audit trails capture who, when and why a record was created, modified or deleted.  The record’s chronology allows for reconstruction of the course of events related to the record.

The content of the audit trails ensures that data is always attributable and contemporaneous.

For data and the corresponding audit trails to be contemporaneous, system time settings must be accurate.

 

 

 

Who can delete data?

Adequately validated and have sufficient controls to
prevent unauthorized access or changes to data.

Implement a data integrity lifecycle concept:

  • Activate audit trail and its backup
  • Backup and archiving processes
  • Disaster recovery plan
  • Verification of restoration of raw data
  • Security, user access and role privileges (Admin)

Warning Signs – Red Flags

  • Design and configuration of systems are poor
  • Data review limited to printed records – no review
    of e-source data
  • System administrators during QC, can delete data (no proper documentation)
  • Shared Identity/Passwords
  • Lack of culture of quality
  • Poor documentation practices
  • Old computerized systems not complying with part 11 or Annex 11
  • Lack of audit trail and data reviews
  • Is QA oversight lacking? Symptom of weak QMS?
I love being audited

 

 

 

 

 

 

Perform Self Audits

  • Focus on raw data handling & data review/verification
  • Consider external support to avoid bias
  • Verify the expected sequence of activities: dates,
    times, quantities, identifiers (such as batch,
    sample or equipment numbers) and signatures
  • Constantly double check and cross reference
  • Verify signatures against a master signature list
  • Check source of materials received
  • Review batch record for inconsistencies
  • Interview staff not the managers

FDA 483 observations

“…over-writing electronic raw data…..”

“…OOS not investigated as required by SOP….”

“….records are not completed contemporaneously”

“… back-dating….”

“… fabricating data…”

“…. No saving electronic or hard copy data…”

“…results failing specifications are retested until
acceptable results are obtained….”

  • No traceability of reported data to source documents

Conclusion:

Even though we try to comply with regulations (regulatory expectations from different agencies e.g. EMA, MHRA, FDA, etc), data integrity is not always easy to detect. It is important the staff working in a regulated environment be properly trained and continuous refresher provided through their career (awareness training of new regulations and updates to regulations).

Companies should also integrate a self-audit program and develop a strong quality culture by implementing lesson learned from audits.

Sources:

You can read more about data integrity findings by searching the followng topics:

MHRA GMP Data Integrity Definitions & Guidance for the Industry,
MHRA DI blogs: org behaviour, ALCOA principles
FDA Warning Letters and Import Alerts
EUDRA GMDP database noncompliance

The Mind-Numbing Way FDA Uncovers Data
Integrity Laps”, Gold Sheet, 30 January 2015

Data Integrity Pitfalls – Expectations and Experiences

Fair Use Notice: Images/logos/graphics on this page contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law)

How to write query texts – 6 template sentences

How to write queries unambiguously expressing what is asked for? Using short, polite sentences? Objectively explaining the underlying inconsistency?

First of all my general guidelines.

My preference is to use no more capitals then needed. Capitals in the middle of a query text, e.g. for CRF fields or for tick box options, could distract from getting the actual question asked. E.g. compare the same query texts, with and without extra capitals. Please verify stop date. (Ensure that stop date is after or at start date and that stop date is not a future date.) Please verify Stop date. (Ensure that Stop date is after or at Start date AND that Stop date is not a future date.)

Referring to CRF fields as they are shown on the CRF. To easily find the involved field(s).

I prefer to leave any ‘the’ before a CRF field referral out of the query text. For more to-the-point query texts. E.g. compare the same query texts, with and without ‘the’ before data fields. Please verify stop date. (Ensure that stop date is after or at start date and that stop date is not a future date.) Please verify the stop date. (Ensure that the stop date is after or at the start date and that the stop date is not a future date.)

Consistency in phrasing a query text can help to quickly write query texts or pre-program query texts in a structured, familiar way. That’s the thought behind the following 6 template sentences for query texts. Which you can use to help you write or program your queries.

The six ‘template’ sentences for query texts:

Please provide…

For asking the study site people to provide required data from patient care recordings. Examples: Please provide date of visit. Please provide date of blood specimen collection. Please provide platelet count. Please provide % plasma cells bone marrow aspirate. Please provide calcium result.

Please complete… For asking the study site people to complete required data as required by the study CRF design. (Not necessarily required for patient care). Examples: Please complete centre number. Please complete subject number. Other frequency is specified, please complete frequency drop-down list accordingly.

Please verify…

For asking the study site people to check date and time fields fulfilling expected timelines. Or for asking the study site people to check field formats. Examples: Please verify start date. (Ensure that start date is before date of visit.) Please verify stop date. (Ensure that stop date is after or at start date and that stop date is not a future date.) Please verify date of blood specimen collection. (Ensure that date of blood specimen collection is before or equal to date of visit and after date of previous visit.) Please verify date last pregnancy test performed. Please verify date of informed consent. (Ensure date of informed consent is equal to date of screening or prior to date of screening.) Please verify date as DDMMYYY.

…., please correct.

For asking the study site people to correct a data recording inconsistent with another data recording. Example: Visit number should be greater than 2, please correct.

…., please tick…

For asking the study site people to complete required tick boxes. Examples: Gender, please tick male or female. Pregnancy test result, please tick negative or positive. Any new adverse events or changes in adverse events since the previous visit, please tick yes or no. Laboratory assessment performed since the previous visit, please tick yes or no. LDH, please tick normal, abnormal or not done.

Please specify…

For asking the study site people to specify the previous data recording. Examples: Please specify other dose. Please specify other frequency. Please specify other method used. Please specify other indication for treatment.

Finally, for query texts popping up during CRF data recording, it could be helpful to put location information in it. Like: Page 12: Please verify start date. (Ensure that start date is after or at start date on page 11.)

Good luck finding your way to structure query texts…

Source:

This article is written by Maritza Witteveen of ProCDM. For clinical data management. You can subscribe to her blog posts at www.procdm.nl.”

Comments? Join us at {EDC Developer}

Anayansi Gamboa, MPM, an EDC Developer Consultant and clinical programmer for the Pharmaceutical and Biotech industry with more than 13 years of experience.

Available for short-term contracts or ad-hoc requests. See my specialties section (Oracle, SQL Server, EDC Inform, EDC Rave, OpenClinica, SAS and other CDM tools)

As the 3 C’s of life states: Choices, Chances and Changes- you must make a choice to take a chance or your life will never change. I continually seek to implement means of improving processes to reduce cycle time and decrease work effort.

Subscribe to my blog’s RSS feed and email newsletter to get immediate updates on latest news, articles, and tips. I am available on LinkedIn. Connect with me there for technical discussions.

Fair Use Notice: This article/video contains some copyrighted material whose use has not been authorized by the copyright owners. We believe that this not-for-profit, educational, and/or criticism or commentary use on the Web constitutes a fair use of the copyrighted material (as provided for in section 107 of the US Copyright Law. If you wish to use this copyrighted material for purposes that go beyond fair use, you must obtain permission from the copyright owner. Fair Use notwithstanding we will immediately comply with any copyright owner who wants their material removed or modified, wants us to link to their website or wants us to add their photo.

Disclaimer: The EDC Developer blog is “one man’s opinion”. Anything that is said on the report is either opinion, criticism, information or commentary. If making any type of investment or legal decision it would be wise to contact or consult a professional before making that decision.

Disclaimer:De inhoud van deze columns weerspiegelen niet per definitie de mening van {EDC Developer}.

Did You Know?

Did You Know? »

SAS Proc Contents option
Proc Contents that will arrange the output variable list in the order the variables appear in the dataset as opposed to arranging the variables alphabetically

Proc contents data=<dsn> varnum;
Run;

  • POSITION option

proc contents position data=;
run;

proc contents data=mysasdata POSITION;

  • VARNUM option

Use the VARNUM option:

proc contents data=mysasdata VARNUM;

Central Designer – Troubleshooting Tips

If an edit check or function fails to behave as expected, it is time to use your ‘troubleshooting’ skills. The following tips may help you when you are troubleshooting rules in InForm:

Rules:

  • check if rules are running
  • check the rule logic
  • check Rule Dependencies: a rule on a form has access to items on that form, but not other forms or other visits
  • check InForm machine’s Application Event Log

Though some vendors will correct major problems with their products by releasing entirely new versions, other vendors may fix minor bugs by issuing patches, small software updates that address problems detected by users or developers.

Check the release notes for Central Designer for known problems. The release notes provide descriptions and workaround solutions for known problems.
Remember that there is a report available you can run “Data Entry Rule Actions Report”. This report outputs all data entry rules in CSV format and can be formatted into an edit check specification documentation for QA testing.
A rule can be written in more than one way, which makes it difficult to impose any restrictions:
Scenario: Route item has 3 choices. OP, SC and IV. Query should fire if the user does not choose either OP or SC. This rule could be written in many ways:

–Value = route.Value

If (value == 3)

–Value = route.Value == 3

If (value == true)

–Value = !(route.Value == 3)

If (value == false)

–Value = (route.Value == 1 || route.Value == 2)

If (value == false)

–Value = route.Value !=1 && route.Value != 2

If (value == true)

Keep it consistent across the trial. Do not overuse the conditional statements when a simple range check should be program.

Note: Be aware that if you want to reuse a rule that uses data from a logical schema in another study, the other study must also contain the logical schema.

If you have explored most of the obvious possibilities and still
cannot get your rule / edit check to work, ask someone in your team to peer review the build.

 

  • unit test your code
  • context available for defining test cases
  • Site name, date/time, locale; Form associations; Empty values; Unknown dates; Repeating objects
  • test case results: Pass or Fail based on expected results
  • perform formal QA / QT

Remember to check the Event log via Control Panel -> Administrative Tools -> Event Viewer

Reference Document : Central Designer – Rule Troubleshooting.pdf

Your comments and questions are valued and encouraged.
Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica, Open Source and Oracle Clinical.

CDISC Clinical Research “A” Terminology

acronym: A word formed from the beginning letters (e.g., ANSI) or
a combination of syllables and letters (e.g., MedDRA) of a name or phrase.
admission criteria:Basis for selecting target population for a clinical trial.
Subjects must be screened to ensure that their characteristics match a list of admission criteria and that none of their characteristics match any single one of the exclusion criteria set up for the study.
algorithm: Step-by-step procedure
for solving a mathematical problem;
also used to describe step-by-step
procedures for making a series of
choices among alternative decisions to
reach a calculated result or decision.
amendment: A written description
of a change(s) to, or formal clarification
of, a protocol.
analysis dataset:An organized collection of data or
information with a common theme arranged in rows and columns and
represented as a single file; comparable to a database table.
analysis variables: Variables used
to test the statistical hypotheses
identified in the protocol and analysis
plan; variables to be analyzed.
approvable letter:An official communication from FDA to an
NDA/BLA sponsor that lists issues to be resolved before an approval can be issued.
[Modified from 21 CFR 314.3;Guidance to Industry and FDA Staff

arm: A planned sequence of elements,
typically equivalent to a treatment
group.

attribute (n): In data modeling,
refers to specific items of data that can
be collected for a class.
audit:A systematic and independent
examination of trial-related activities
and documents to determine whether
the evaluated trial-related activities were
conducted and the data were recorded,
analyzed, and accurately reported
according to the protocol, sponsor’s
standard operating procedures (SOPs),
good clinical practice (GCP), and the
applicable regulatory requirement(s).
[ICH E6 Glossary]
audit report: A written evaluation by
the auditor of the results of the audit.
[Modified from ICH E6 Glossary]
audit trail. A process that captures
details such as additions, deletions,
or alterations of information in an
electronic record without obliterating the original record. An audit trail
facilitates the reconstruction of the
history of such actions relating to the
electronic record.

Source:Applied Clinical Trials

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

What is Clinical Reviewer?

Clinical Review on iPad

  1. CRF Images View CRF (Case Report Form) data with pinch zoom.
  2. SAS Datasets Search and view data exported from SAS datasets.
  3. DEFINE.XML Import meta data directly from DEFINE.XML.
  4. Control Terminology View all metadata including coded terms defined in DEFINE.XML
  5. Secure Data – Data transferred to local memory viewed in “Airplane” mode with self deleting expiration.

Clinical data can be imported from standard CDISC DEFINE.XML format directly onto Clinical Reviewer app on iPad. Take advantage of the multi-touch interface to view Case Report Form or SAS datasets directly. Metadata including variable and value level metadata is viewable as defined in DEFINE.XML.

Watch this tutorial and see for yourself…

-FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: Meta-x

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

How to Use SAS – Lesson 5 – Data Reduction and Data Cleaning

This video series is intended to help you learn how to program using SAS for your statistical needs. Lesson 5 introduces the concept of data reduction (also known as subsetting ;data sets). I discuss how one can subset a data set (i.e. reduce a data set’s number of observations) based on some criteria using the IF statement in the DATA STEP, or using the WHERE statement in a PROC STEP. I also discuss using the KEEP, DROP, and RENAME statements for reducing data to only a handful of the original variables (i.e. reduce a data set’s number of variables). Furthermore, I show how one can label variables so that descriptive information can be presented in output and value formats so that specific values are easy to understand. Finally, I provide basic examples of each of these for three hypothetical data sets.

Helpful Notes:

1. There are two places you can reduce the data you analyze; in the DATA STEP, and in the PROC STEP.

2. To subset data in the DATA STEP, use the IF statement.

3. To subset data in the PROC STEP, use the WHERE statement.

4. Another way to reduce data is to eliminate variables using a KEEP or DROP statement. This method is useful if you are creating a second data set or analytic version of your main dataset.

5. The RENAME statement simply changes a variables name.

Today’s Code:

data main;
input x y z;
cards;
1 2 3
7 8 9
;
run;

proc contents data=main; run;
proc print data=main; run;

/* 1. Reduce data in the DATA STEP using a simple IF statement */
data reduced_main; set main;
if x = 1;
run;

proc print data=main; run;
proc print data=reduced_main; run;

/* 2. Reduce data in the PROC STEP using a simple WHERE statement */
proc print data=main;
where x = 1;
run;

proc print data=main; run;
proc print data=reduced_main; run;

/* 3. Reduce data in the DATA STEP by KEEPing only the variables you do want */
data reduced_main; set main;
KEEP x y;
run;

proc print data=main; run;
proc print data=reduced_main; run;

/* 4. Reduce data in the DATA STEP by DROPing the variables you don’t want */
data reduced_main; set main;
DROP y;
run;

proc print data=main; run;
proc print data=reduced_main; run;

/* 5. Clean up variables using the RENAME statement within a DATA STEP */
data clean_main; set main;
rename x = ID y = month z = day;
run;

proc contents data=main; run;
proc contents data=clean_main; run;

/* 6. Clean up variables using a LABEL statement within a DATA STEP */
data clean_main; set clean_main;
label ID = “Identification Number” month = “Month of the Year” day = “Day of the Year”;
run;

proc contents data=main; run;
proc contents data=clean_main; run;

/* 7. FORMAT value labels using the PROC FORMAT and FORMAT statements */
PROC FORMAT;
value months 1=”January” 2=”February” 3=”March” 4=”April” 5=”May” 6=”June” 7=”July” 8=”August” 9=”September” 10=”October” 11=”November” 12=”December”;
run;

data clean_main; set clean_main;
format month months.;
run;

proc ;freq data=clean_main;
table month;
run;

-FAIR ;USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.

Assigning Libraries to Access and Store SAS® Data

Use SAS learning software to learn how to assign libraries to access and store SAS data.

-FAIR USE-
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”

Source: http://support.sas.com/learn/ondemand/professionals