This video series is intended to help you learn how to program using SAS for your statistical needs. Lesson 4 introduces the concept of merging SAS data sets using a variety of methods. I discuss how one can merge two or more data sets in the DATA STEP using the SET statement. I also describe how one can use the MERGE statement to bring two or more datasets together that may have a common index variable. Furthermore, I describe the SORT procedure (PROC ;SORT) that must be used with the MERGE statement. Finally, I provide basic methods of merging data sets using PROC SQL.
1. Use one SET statement when you have the same variables, but different observations.
2. Use two SET statements when you have different variables, but the same observations.
3. Use the MERGE statement when you have a common index variable, and any new variables or observations.
4. The MERGE statement first requires that you use the SORT procedure (PROC SORT) to sort on the index variable before merging.
5. Make sure that you add the BY statement after the MERGE statement in your DATA step or you will have a new dataset that is merged incorrectly.
6. PROC SQL is an advanced method of merging data that can be very powerful for large datasets. It uses different kinds of “JOINS” that I will provide more information on in a later video.
input x y z;
1 2 3
7 8 9
/* 1. Use one SET statement when you have the same variables, but different observations */
input x y z;
4 5 6
3 6 9
set main more_people;
proc print data=final; run;
/* 2. Use two SET statements when you have different variables, but the same observations */
input a b c;
20 40 60
10 20 30
proc print data=new_final; run;
/* 3. Use the MERGE statement when you have a common index variable, and any new variables or observations */
input x a b c;
1 20 40 60
7 10 20 30
2 11 12 13
3 14 15 16
* The MERGE statement requires that you use an index variable to merge on (e.g. an ID variable).;
* Thus, you must SORT your data BY that index variable.;
proc sort data=main;
proc sort data=more_vars_and_people;
merge main more_vars_and_people;
proc print data=merged_final; run;
/* 4. SQL is an advanced programming language for databases. Here, I provide a basic example to merge the two datasets using a LEFT JOIN. I will include more information about JOIN types in a follow up video. For now, think of a LEFT JOIN as one that only includes the data from the second dataset (more_vars_and_people) that corresponds to data from the original dataset (main).
create table sql_final as
select L.*, R.*
from main as L
LEFT JOIN more_vars_and_people as R
on L.x = R.x;
proc print data=sql_final; run;
“Copyright Disclaimer Under Section 107 of the Copyright Act 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, and research. Fair use is a use permitted by copyright statute that might otherwise be infringing. Non-profit, educational or personal use tips the balance in favor of fair use.”
Anayansi Gamboa has an extensive background in clinical data management as well as experience with different EDC systems including Oracle InForm, InForm Architect, Central Designer, CIS, Clintrial, Medidata Rave, Central Coding, OpenClinica Open Source and Oracle Clinical.