The SELECT statement is perhaps the most frequently used SQL command. It retrieves records (or specified fields from records) from one or more tables in the database. In this section, we will cover some simple SELECT statements, which take the following general form: SELECT fieldlist FROM tablename [WHERE where clause] Note In this chapter, statement formats are listed with the SQL keywords in all capital letters. The italicized words indicate terms that a programmer would replace in an actual statement, and phrases inside square brackets ([ ]) are optional parts of the statement. The simplest, most common example of a SELECT statement is selecting all fields from a single table. To display all of the data in the person table, execute the following SELECT statement: SELECT * FROM Person The statement would list every field of every record in the Person table. Note SQL keywords such as SELECT are not case sensitive. However, table and field names may be case sensitive depending on the configuration of your database management system. Choosing the Field List In the fieldlist portion of a SELECT statement, you can specify the asterisk (*) or a list of specific field names. The asterisk indicates that you want to display all fields in the table. If you want to display only certain fields, you need to include a comma-separated list of fields, as in the following example: SELECT SSN, LastName, FirstName FROM Person Figure 20.17 shows the output from the preceding SQL statement. Figure 20.17. A field list allows you to return only the desired fields for each record. In general, it is good programming practice to specify a field list whenever possible, rather than just using asterisk, for the following reasons: Field lists return only the desired fields, minimizing network traffic. Field lists control the order in which fields are returned to the calling program. For example, you might decide to change the field order of the Person by adding a new field before the FirstName field, such as MiddleInitial. However, the order of fields returned by the SELECT statement would be unaffected by these changes. Although I do not recommend it, modern database systems allow you to include spaces in your field and table names. If you decide to do this, your SQL statements must include square brackets ([ ]) around each name with a space: SELECT SSN, [Last Name], [First Name] FROM Person In this example, the brackets indicate to the query processor that the words Last and Name together represent a single field name. Again, the authors recommend for simplicity's sake that you do not use spaces in database field names. Filtering the Records One of the most powerful features of SELECT (and other SQL commands) is the ability to control the records affected using a WHERE clause. A WHERE clause includes the keyword WHERE followed by a logical expression used to identify the desired records. For example, to return only the names of the Smiths in the Person table, you could execute the following SELECT statement: SELECT LastName, FirstName FROM Person WHERE LastName = 'Smith' This statement would return the LastName and FirstName fields only for those records that matched the WHERE expression; in this case, only records where the LastName field value is Smith. Also note that the fields in the field list are independent from those in the WHERE clause. A WHERE clause can use any field from the tables specified in the FROM clause: SELECT LastName, FirstName FROM Person WHERE ZipCode='38117' In the previous example, we are using the ZipCode field to filter records, but only displaying the name fields. In the SQL examples thus far, we have included only one field name, and asked for an exact match using the equals (=) operator. However, the logical expressions WHERE clause can contain a number of conditions which can be combined using AND, OR, and NOT: SELECT * FROM Person WHERE Age >= 30 AND Age < 50 The preceding SQL query returns records from the Person table where the person's age is between 30 and 49. Table 20.2 lists the field comparison operators: Table 20.2. Comparison Operators Used in the WHERE Clause Operator | Definition |
---|
< | Less than | <= | Less than or equal to | = | Equal to | >= | Greater than or equal to | > | Greater than | <> | Not equal to | For all comparisons, both expressions must be of the same type (for example, both must be numbers or both must be strings). You can compare field values or a field value to a literal value of the same type. The comparison values for strings and dates require special formats. Strings must be enclosed in quotes. Dates can be enclosed between pound signs (for example #5/15/94.htm#), and on some databases single quotes. Note In a WHERE clause, literal string expressions are enclosed in quotes (') and numbers are not. Some database management systems allow you to use double quotes ("), while others do not. If your string value contains a quote, you need to double the single quote, as in the following example: SELECT * from Person Where Name = 'O''Connor' The query tool included with SQL Server 2000 expects single quotes (double-quotes are used to identify field names, unless you execute the [SET QUOTED_IDENTIFIER OFF] statement.) In addition to comparisons, you can use other types of logical statements in a WHERE clause. For example, at the beginning of this chapter we mentioned a query that would return records where someone's name started with a certain letter. The following SQL statement uses the LIKE keyword to perform pattern matching in the WHERE clause: SELECT * FROM Person WHERE Age >= 30 AND LastName LIKE 'S%' Notice the percent sign in the string expression. This is a wildcard indicator that means any characters match the pattern. The preceding statement would return any records where the age was 30 or greater and the last name begins with the letter S. Table 20.3 lists some other keywords used with the WHERE clause: Table 20.3. Keywords Used in the WHERE Clause Keyword | Action |
---|
LIKE | Compares field to pattern | IN | Compares field to list of valid values | BETWEEN | Compares field to value range | The following SQL statements illustrate the use of the above keywords: SELECT * From Person Where State IN ('TN', 'AL', 'GA') SELECT * From Person Where Age NOT BETWEEN 30 And 49 SELECT * From Person Where FirstName BETWEEN 'B' AND 'D' It should be noted that the range specified in BETWEEN is an inclusive search, meaning that if the value is equal to one of the endpoints of the range it is included. The preceding sample statements also indicate using a NOT operator to return records outside the range. Note Some flavors of SQL, such as Microsoft Access, use the asterisk (*) as a wild card indicator. However, most databases (even Microsoft Access when connecting via ADO) use the percent symbol (%). This is very confusing, because you can actually create stored queries in Access that don't work when run from within Access but do work when executed by ADO! To avoid confusion, we suggest you look up LIKE in your database's help files to familiarize yourself with the syntax. Note An entirely separate SQL query, known as a sub-select, can be used with the IN clause to provide a list of valid values from another table: SELECT * from Person WHERE State IN (SELECT State FROM_ MyFavoriteStates) Understanding the Null Value As we mentioned earlier, each record in a table has the same set of fields, even if it does not have a value in each field. A special value, null, is used to indicate an empty field. When you create a table, you can determine whether the database will allow users to place a null value in the field. (Primary keys cannot be null). The expression IS NULL can be used in a WHERE clause to check for an empty field: Select * from Employee Where HireDate IS NOT NULL Controlling the Order of Returned Records One additional feature of the SELECT statement is the ability to return records in a desired order. For example, you might want to print a class roster ordered by name, or mailing labels ordered by ZIP Code for a discounted postage rate. To control the order of records returned from a SELECT query, you add an ORDER BY clause to your SQL statements. The general format and position of an ORDER BY clause is shown next: SELECT fieldlist FROM tablename [WHERE where clause] [ORDER BY fieldlist [DESC]] You can order records by one or more fields, as in the following examples: SELECT LastName, FirstName, Age FROM Person ORDER BY LastName, FirstName SELECT LastName, FirstName, Age FROM Person ORDER BY Age DESC The default sort order for all fields is ascending (that is, A-Z, 0-9). You can place the DESC keyword after a field name to indicate you want to sort the field in descending order. (The DESC keyword affects only the field immediately preceding it.) Figure 20.18 shows the order of some records returned with the previous two ORDER BY clauses. Figure 20.18. The ORDER BY clause specifies the order of the records returned from a SELECT query. Joining Tables with SELECT When you design database tables, you use key fields so that you can relate the tables to each other. In our earlier example, the SSN field relates the records in the Person table to the records in the Employees table. You can use these same key fields in a SELECT statement to set the table relationships so that you can display the related data. There are two types of clauses you can use to specify the relationships between tables: JOIN The JOIN clause is located near the FROM keyword and can be used to combine two tables on their related fields. The syntax of the JOIN clause is: SELECT fieldlist FROM table1name jointype JOIN table2name ON join expression WHERE Although we have already discussed how the WHERE clause can be used to filter records, it can also be used to join tables, because the join is itself a type of filter. This type of join syntax is older than the JOIN clause but still widely used. The only difference between this and the SQL statements we have already discussed is the multiple tables listed in the FROM clause: SELECT fieldlist FROM table1name, table2name WHERE join expression As an example, the following two queries join the Person table to the Employee table, returning the joined records. Figure 20.19 shows the records returned when either of the following statements is executed: Figure 20.19. With the default join type (inner join), only records that appear in both tables are included in the result. SELECT * from Person INNER JOIN Employee ON Person.SSN = Employee.SSN SELECT * from Person, Employee WHERE Person.SSN = Employee.SSN When you join tables, you specify the join expression using the key field or fields to link the two tables together. If more than one field makes up the key, simply continue the join expression using the AND keyword to join the additional fields. In our example, the key field is named SSN in both tables, but the fields' names do not necessarily have to match. (Note that if they do match, the table name has to be specified so that the query processor can identify the correct field, as in the previous example.) Note Some programmers prefer the JOIN clause syntax, because table joins usually do not change and tend to clutter up the filter conditions in the WHERE clause. Others programmers like the WHERE clause syntax, because it can be easier to read when joining a lot of tables. Still others do their joins graphically in Access and let the program write the SQL statement for them! Using Aliases for Table Names As you can see in Figure 20.19, the asterisk returns all of the fields from both tables in the join. If you want to specify a field list to limit the fields returned, you must specify the table name for any fields that exist in both tables: SELECT Person.SSN, LastName, FirstName, Salary FROM Person_ INNER JOIN Employee ON Person.SSN = Employee.SSN Because the SSN field exists in both tables, we have to specify the table name when we list it in the field list, join expression, or WHERE clause. LastName, FirstName, and Salary only exist in one table, so the query processor automatically knows which field you are talking about. Note To specify all the fields from just one of the tables in the join, use the table name followed by a period and then an asterisk. The preceding SQL statement works, but in the author's opinion it could be written more clearly. First, readability will become a problem if you add more fields, because the full table name will appear repeatedly. Second, well-written SQL statements consistently identify the table name for every field or do not identify it at all; the previous statement is not consistent. Using aliases for the table names in your SQL queries solves both of these problems. You create an alias by simply typing a space after the table name then the desired alias name. Typically, aliases are only one or two letters: SELECT P.SSN, P.LastName, P.FirstName, E.Salary FROM Person P_ INNER JOIN Employee E ON P.SSN = E.SSN The preceding SELECT statement assigns the alias P to the Person table and the alias E to the Employee table. The shorter alias takes up less space than the table name and can be used in all parts of the SQL statement. Understanding Join Types When you join tables together, you link them by one or more key fields. If a record from one table does not match any keys in the other table, that record falls out of the join and is not returned in the results. For example, notice that Figure 20.3 shows that a record exists in the Person table for Marge Thrasher (whose SSN is 111-22-3333). However, in Figure 20.19, Marge is missing from the query results. This is because no record existed in the employee table with her SSN. The join shown in Figure 20.19 is the default type of join, known as an inner join. In an inner join, records have to exist in all of the joined tables to be included in the result. Note Microsoft Access requires the word INNER to be placed in the SQL statement before the word JOIN, but SQL Server allows you to just use the word JOIN by itself to indicate an inner join. Joining fields in a WHERE clause using the equality operator is also an inner join. SQL also supports another type of join, the outer join. If you join two tables using an outer join, records from one table will always appear in the results, even if no corresponding record exists in the other table. For example, if we wanted to get a list of all persons and include salaries if possible, the following query would suffice: SELECT P.SSN, P.LastName, P.FirstName, E.Salary FROM Person P_ LEFT OUTER JOIN Employee E ON P.SSN = E.SSN The results of the previous query would include our missing person, Marge, even though she is not an employee. Normally, her record from the Person table would fall out of the results, because there is no corresponding record in the Employee table. However, the LEFT OUTER JOIN statement forces the query to always include records from the leftmost table (Person). Table 20.4 summarizes the differences between the types of SQL joins. Table 20.4. Records Returned Based on the Type of Join Used Join Type | Records from Left Table | Records from Right Table |
---|
INNER | Only records with corresponding record in the right table | Only records with corresponding record in the left table | LEFT OUTER | All records | Only records with corresponding record in the left table | RIGHT OUTER | Only records with corresponding record in the right table | All records | Note The word OUTER is usually optional and you can just specify LEFT or RIGHT. If you use the WHERE join syntax, place an asterisk on the left or right side of the equals to indicate the type of outer join, as in the following left join example: SELECT * from Person, Employee WHERE Person.SSN *= Employee.SSN For more on join types, Working with More Complex Joins Simple joins between two tables are fairly easy to understand. However, you will often be joining more than two tables using a number of fields and a mixture of join types. When using the WHERE join syntax, you just list the tables in the FROM clause, making sure to join all the appropriate fields in the WHERE clause. However, with the join syntax, you may want (or need, depending on the DBMS) to add parentheses to separate multiple joins, as in the following example: SELECT S.Description, P.LastName, P.FirstName FROM (StateList S INNER JOIN Person P ON S.State = P.State) INNER JOIN Employee E ON E.SSN = P.SSN ORDER BY S.Description, P.LastName, P.FirstName The previous query uses a lookup table, StateList, to provide the state names. The easiest way to visualize the way the joins work in the preceding statement is to use the parentheses as a guide; the first join combines the StateList and Person tables, and the second part joins the result of the first to the Employee table. Note You can join across to another database on the same SQL server (provided you have permission) by using the dot notation, as in the following example: select * from master..sysprocesses The table mentioned previously is a system table used to track current processes and connections on the server. Other join expressions you might run into are the WHERE EXISTS and WHERE NOT EXISTS clauses: SELECT * FROM StateList S WHERE NOT EXISTS (Select * FROM Person P Where P.State=S.State) The previous statement displays all the states for which there is no entry in the Person table; that is, you have no employees or members in the states returned by this query. Although it may not look like it, the query is a join. The outermost SELECT statement is joined to the sub-select statement by the State field. EXISTS and NOT EXISTS are both features that can be replaced by joins. For example, you may recall that an outer join returns a record containing fields from both tables, even if a record does not exist. By checking for the Null value, the previous NOT EXISTS example could be rewritten as follows: SELECT S.* FROM StateList S LEFT OUTER JOIN Person P ON P.State=S.State WHERE P.State IS NULL Calculated Values and Aggregate Functions In addition to retrieving data from one or more tables, SQL SELECT statements can be used to create calculated values. These values can be entirely made up or based on existing fields values combined with mathematical operations. In addition, built-in aggregation functions can perform calculations on field values from multiple records. The results of these calculations are not stored in the database, but instead calculated during the execution of your SELECT query. Calculating Values Using mathematical operators and aliases, you can create fields that are calculated on the fly by your SELECT statement. For example, the following SQL statement calculates total cost based on quantity and unit price: Select itemno, unitprice, quantity, unitprice * quantity AS TotalCost From orders Notice when using aliases with field names, you have to supply the keyword AS. (If you do not supply an alias, some DBMSs like Access will assign a default name such as Expr1001 to your calculated field.) You also can perform manipulation on string fields, such as concatenation: Select LastName, FirstName, RTrim(LastName) + ', '_ + FirstName AS CompleteName From Person The preceding statement uses string concatenation and the RTRIM function to create a combined names field. For example, the CompleteName field in the record for Jane Doe would contain the following value: Doe, Jane When the results of the query are returned to your program, the fake complete name field will look just like any other field to the program. Aggregation and Group By In addition to the operations we just described that work on individual records, SQL provides several built-in functions that aggregate or group records. For example, the COUNT function can be used to count records in a table: SELECT Count(*) FROM Person WHERE State = 'FL' The preceding statement returns the total number of people in the Person table whose address is in Florida. Similar functions exist for other types of aggregations, including the following: Function | Purpose |
---|
SUM | Adds values together | MAX | Finds the largest value | MIN | Finds the minimum value | AVG | Finds the average | COUNT | Returns record count | Note To find out about all of the built-in functions of your database management system, consult its help file. Each of the functions listed earlier can be used on the table as a whole or on just one set of records by adding a WHERE clause. However, these aggregation functions become even more powerful when combined with the GROUP BY clause. GROUP BY allows you to aggregate multiple groups of records in a single query. Consider our earlier example, where we counted the number of people in Florida. Suppose we wanted to count all of the records in the Person table by state. You could run a query for each state, but a more efficient method would be to use GROUP BY: SELECT State, COUNT(*) As PersonCount FROM Person_ GROUP BY State ORDER BY PersonCount DESC The previous statement lists each state and the number of records associated with it, in descending order. Note that when you use GROUP BY, any other field not being aggregated must appear in the GROUP BY clause. |