Chapter Contents

Previous

Next
The SQL Procedure

joined-table


Joins a table with itself or with other tables.

Restrictions: Joins are limited to 32 tables.
See also: FROM Clause and query-expression
Featured in: Joining Two Tables , Performing an Outer Join , Joining Three Tables , Producing All the Possible Combinations of the Values in a Column , and Matching Case Rows and Control Rows


table-name <<AS> alias>, table-name <<AS> alias>
<, table-name <<AS> alias>...>
table-name <INNER> JOIN table-name
ON sql-expression
table-name LEFT JOIN table-name ON sql-expression
table-name RIGHT JOIN table-name ON sql-expression
table-name FULL JOIN table-name ON sql-expression


Joining Tables
When multiple tables, views, or query-expressions are listed in the FROM clause, they are processed to form one table. The resulting table contains data from each contributing table. These queries are referred to as joins.

Conceptually, when two tables are specified, each row of table A is matched with all the rows of table B to produce an internal or intermediate table. The number of rows in the intermediate table (Cartesian product) is equal to the product of the number of rows in each of the source tables. The intermediate table becomes the input to the rest of the query in which some of its rows may be eliminated by the WHERE clause or summarized by a summary function.

A common type of join is an equijoin, in which the values from a column in the first table must equal the values of a column in the second table.


Table Limit
PROC SQL can process a maximum of 32 tables for a join. If you are using views in a join, the number of tables on which the views are based count toward the 32-table limit. Each CONNECTION TO component in the Pass-Through Facility counts as one table.


Specifying the Rows to Be Returned
The WHERE clause or ON clause contains the conditions (sql-expression) under which the rows in the Cartesian product are kept or eliminated in the result table. WHERE is used to select rows from inner joins. ON is used to select rows from inner or outer joins.

The expression is evaluated for each row from each table in the intermediate table described earlier in Joining Tables . The row is considered to be matching if the result of the expression is true (a nonzero, nonmissing value) for that row.

Note:   You can follow the ON clause with a WHERE clause to further subset the query result. See Performing an Outer Join for an example.  [cautionend]


Table Aliases
Table aliases are used in joins to distinguish the columns of one table from those in the other table(s). A table name or alias must be prefixed to a column name when you are joining tables that have matching column names. See FROM Clause for more information on table aliases.


Joining a Table with Itself
A single table can be joined with itself to produce more information. These joins are sometimes called reflexive joins. In these joins, the same table is listed twice in the FROM clause. Each instance of the table must have a table alias or you will not be able to distinguish between references to columns in either instance of the table. See Producing All the Possible Combinations of the Values in a Column and Matching Case Rows and Control Rows for examples.


Inner Joins
An inner join returns a result table for all the rows in a table that have one or more matching rows in the other table(s), as specified by the sql-expression. Inner joins can be performed on up to 32 tables in the same query-expression.

You can perform an inner join by using a list of table-names separated by commas or by using the INNER, JOIN, and ON keywords.

The LEFTTAB and RIGHTTAB tables are used to illustrate this type of join: [HTML Output]  [Listing Output]The following example joins the LEFTTAB and RIGHTTAB tables to get the Cartesian product of the two tables. The Cartesian product is the result of combining every row from one table with every row from another table. You get the Cartesian product when you join two tables and do not subset them with a WHERE clause or ON clause.

proc sql;
   title 'The Cartesian Product of';
   title2 'LEFTTAB and RIGHTTAB';
   select *
      from lefttab, righttab;
[HTML Output]  [Listing Output] The LEFTTAB and RIGHTTAB tables can be joined by listing the table names in the FROM clause. The following query represents an equijoin because the values of Continent from each table are matched. The column names are prefixed with the table aliases so that the correct columns can be selected.
proc sql;
   title 'Inner Join';
   select *
      from lefttab as l, righttab as r
      where l.continent=r.continent;
[HTML Output]  [Listing Output] The following PROC SQL step is equivalent to the previous one and shows how to write an equijoin using the INNER JOIN and ON keywords.
proc sql;
   title 'Inner Join';
   select *
      from lefttab as l inner join 
           righttab as r
      on l.continent=r.continent;

See Joining Two Tables , Producing All the Possible Combinations of the Values in a Column , and Matching Case Rows and Control Rows for more examples.


Outer Joins
Outer joins are inner joins that have been augmented with rows that did not match with any row from the other table in the join. The three types of outer joins are left, right, and full.

A left outer join, specified with the keywords LEFT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the first (LEFTTAB) table that do not match any row in the second (RIGHTTAB) table.

proc sql;
   title 'Left Outer Join';
   select *
      from lefttab as l left join 
           righttab as r
      on l.continent=r.continent;
[HTML Output]  [Listing Output] A right outer join, specified with the keywords RIGHT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the second (RIGHTTAB) table that do not match any row in the first (LEFTTAB) table.
proc sql;
   title 'Right Outer Join';
   select *
      from lefttab as l right join 
           righttab as r
      on l.continent=r.continent;
[HTML Output]  [Listing Output]

A full outer join, specified with the keywords FULL JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from each table that do not match any row in the other table.

proc sql;
   title 'Full Outer Join';
   select *
      from lefttab as l full join 
           righttab as r
      on l.continent=r.continent;
[HTML Output]  [Listing Output] See Performing an Outer Join for another example.


Joining More Than Two Tables
Inner joins are usually performed on two or three tables, but they can be performed on up to 32 tables in PROC SQL. A join on three tables is described here to explain how and why the relationships work among the tables.

In a three-way join, the sql-expression consists of two conditions: one relates the first table to the second table and the other relates the second table to the third table. It is possible to break this example into stages, performing a two-way join into a temporary table and then joining that table with the third one for the same result. However, PROC SQL can do it all in one step as shown in the next example.

The example shows the joining of three tables: COMM, PRICE, and AMOUNT. To calculate the total revenue from exports for each country, you need to multiply the amount exported (AMOUNT table) by the price of each unit (PRICE table), and you must know the commodity that each country exports (COMM table). [HTML Output]  [Listing Output]

proc sql;
   title  'Total Export Revenue';
   select c.Country, p.Export, p.Price, 
          a.Quantity,a.quantity*p.price 
          as Total
      from comm c, price p, amount a
      where c.export=p.export 
            and c.country=a.country;
[HTML Output]  [Listing Output] See Joining Three Tables for another example.


Comparison of Joins and Subqueries
You can often use a subquery and a join to get the same result. However, it is often more efficient to use a join if the outer query and the subquery do not return duplicate rows. For example, the following queries produce the same result. The second query is more efficient:

proc sql;
   select IDNumber, Birth
      from proclib.payroll
      where IDNumber in (select idnum
                      from proclib.staff
                      where lname like 'B%');

proc sql;
   select  p.IDNumber, p.Birth
     from proclib.payroll p, proclib.staff s
     where p.idnumber=s.idnum 
           and s.lname like 'B%';

Note:   PROCLIB.PAYROLL is shown in Creating a Table from a Query's Result .  [cautionend]


Chapter Contents

Previous

Next

Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.