Variables
The MySQL stored program language is technically a strongly typed language in the sense that before you can work with a variable, you must first declare it. And when you declare it, you specify its type and, optionally, an initial or default value. Be aware, however, when not in "strict" mode (sql_mode contains neither STRICT_TRANS_TABLES nor STRICT_ALL_TABLES), MySQL will generate warnings only when you violate a variable's type or storage limits.
We strongly urge you, therefore, to take special care with declaring your variables.
DAT-01: Use a consistent and meaningful variable naming style |
All of us have a distinct variable naming style, often based on the conventions of our first programming language. In the very early days of programming, programmers were required to keep variable names short so as to reduce memory overhead. Various programming languages impose additional restrictions on the programmer: case-sensitivity, maximum lengths, and allowable characters, for instance.
However programmers might differ as regards the "one true style" that is optimal for a particular language, almost every programmer would agree that, above all, variable names should be meaningful, and whatever style might be employed, it should be employed consistently throughout your programs.
Meaningful variable names are those that clearly articulate the data that the variable holds. It's as simple as that. Avoid variable names that contain confusing or ambiguous abbreviations, and certainly avoid meaningless variable names such as v1, v2, etc.
Beyond being meaningful, conventions can help us understand the scope, data type, or some other property of our variables . In MySQL stored programs, we could use a convention that allows us to:
- Determine the data type of a variable from its name.
- Distinguish table column names from local variables or parameters.
- Identify the type of data held in the variables: data from a cursor, intermediate data, bits of SQL for a dynamic SQL, etc.
We believe that most of the above items are matters of personal preference and that, while arguments can be made for and against any or all of these styles, you can write high-quality code regardless of the style you adopt. We feel that the following recommendations, however, should be followed:
- You should generally identify local variables with a prefix or a suffix, especially if they are used to receive values from cursors. Creating local variables with the same name as a column returned by a cursor is dangerous.
- Because variable names are case insensitive, "camel" notationin which capitalization is used to separate "words" within a variable nameis possibly inappropriate since isNull and isnull will reference the same variable.
Example
If you're not sold on the value of meaningful variable names, try to work out what this assignment statement is doing:
SET ns=gs-tx+bn-fd;
Now try one with meaningful variable names:
SET net_salary=gross_salary-tax+bonus-fund401k;
DAT-02: Avoid overriding variable declarations within "inner" blocks |
It is possible to declare a variable inside an inner block that has the same name as a variable in the enclosing block. Neverthelessthough legalthis practice can be extremely confusing.
For instance, in the following example the v_counter variable is declared both within the inner block and within the outer block:
DECLARE v_counter INT DEFAULT 1; . . . Lots of code . . . inr_blk: BEGIN DECLARE v_counter INT DEFAULT 2; . . . Lots of code . . . SET v_counter=v_counter+1; END inr_blk;
There are two undesirable consequences to this practice:
- Someone reading the code might be confused as to which variable is being updated.
- It is not possible in the inner block to modify the value of a variable in the outer block. The SQL:2003 specification allows us to prefix a variable name with its block label, but this isn't supported in MySQL yet.
It's much better to ensure that every variable declared in a stored program has a unique name, regardless of its block scope.
DAT-03: Replace complex expressions with functions |
A Boolean function evaluates to one of three values: TRUE (1), FALSE (0), or NULL. You can use Boolean functions to hide complex expressions; the result is code that is virtually as readable as "straight" Englishor whatever language you use to communicate with other human beings.
Example
Consider this code:
SELECT salary, status, hire_date INTO v_total_sal, v_emp_status, v_hire_date FROM employees WHERE employee_id=in_employee_id; IF (v_total_sal BETWEEN 10000 AND 50000) AND v_emp_status = 'N' AND DATEDIFF(NOW( ), v_hire_date)> 365 THEN CALL give_raise (in_employee_id); END IF;
Wow, that's hard to understand! It would be much easier if the code looked like this:
IF eligible_for_raise(in_employee_id) THEN CALL give_raise(in_employee_id); END IF;
Benefits
It will be much easier for anyone to read your code; you can literally read it. If you then need to understand how the Boolean expression is computed, you can look "under the covers."
This is a technique that can be applied (with care) to existing "spaghetti code." As you go into a program to fix or enhance it, look for opportunities to simplify and shorten executable sections by shifting complexity to other functions or procedures.
Challenges
Before you modify existing code, make sure you have solid unit test scripts in place so you can quickly verify that your changes haven't introduced bugs into the program.
DAT-04: Remove unused variables and code |
You should go through your programs and remove any part of your code that is no longer used. This is a relatively straightforward process for variables and named constants. Simply execute searches for a variable's name in that variable's scope. If you find that the only place it appears is in its declaration, delete the declaration and, by doing so, delete one more potential question mark from your code.
There is never a better time to review all the steps you took, and to understand the reasons you took them, than immediately upon completion of your program. If you wait, you will find it particularly difficult to remember those parts of the program that were needed at one point but were rendered unnecessary in the end. "Dead zones" in your code become sources of deep insecurity for maintenance programmers.
Example
The following block of code has several dead zones that could cause a variety of problems. Can you find them all?
CREATE PROCEDURE weekly_check ( in_isbn VARCHAR(20), in_author VARCHAR(60) ) BEGIN DECLARE v_count INT; DECLARE v_counter INT; DECLARE v_available INT; DECLARE v_new_location INT DEFAULT 1056; DECLARE v_published_date DATE DEFAULT NOW( ); SET v_published_date=book_published_date(in_isbn); IF DATE_SUB(NOW( ), INTERVAL 60 DAY) > v_published_date THEN CALL review_usage( ); ELSEIF DATE_SUB(NOW( ), INTERVAL 24 DAY) > v_published_date THEN CALL check_availability (in_isbn, v_available, v_count); IF v_available AND /* Turn off due to Req A12.6 */ FALSE THEN CALL transfer_book (in_isbn, v_count - 1, v_new_location); END IF; -- Check for reserves -- CALL analyze_requests (isbn_in); END IF; END$$
Here are a few potential dead spots:
- The in_author parameter is declared but never used. It doesn't even have a default value, so you have to pass in an ignored value.
- v_counter is declared but not used.
- v_published_date is assigned a default value of NOW( ), which is immediately overridden by the call to book_published_date.
- The call to TRansfer_book has been turned off with the addition of AND FALSE.
- The call to analyze_requests has been commented out.
Benefits
It's much easier to maintain, debug, and enhance code that doesn't have "dead zones."
Challenges
There are sometimes valid reasons for keeping dead code in place. You may want to turn off code temporarily. Also, you may need to comment out some logic but still show that this action was done and explain why. In such cases, make sure that you include the necessary documentation in the code. Even better, use problem-tracking or bug-reporting software to keep a comprehensive history of any changes made to code.
DAT-05: Don't assume that the result of an expression is TRUE or FALSE; it could be NULL |
Three-valued logicthe logic that includes NULLsis an essential part of the relational database model. However, the tendency of humans to think in terms of two-valued logican expression is either TRUE or FALSEcan lead to serious logic bugs.
For instance, consider the following logic, which is intended to retire employees older than 65 years and older, and keep those younger than 65 years:
IF v_date_of_birth > DATE_SUB(NOW( ), INTERVAL 65 YEAR) THEN CALL keep_employee( v_employee_id); ELSE CALL retire_employee( v_employee_id); END IF;
This logic seems valid from a two-valued logic perspective, but what if v_date_of_birth is NULL? If the date of birth is NULL, then the date comparison will return NULL, rather than TRUE or FALSE. Consequently, the ELSE condition will be executed and we will retire an employee, although in fact we have no idea how old the employee is.
NULL values can be handled in a couple of ways:
- Explicitly check that a value is NOT NULL before attempting a comparison.
- Explicitly check each condition: don't assume that an expression that is not TRUE, is necessarily FALSE.
If we are worried about the date of birth being NULL in the above example, we might recode it as follows:
SET v_age_in_years=DATEDIFF(NOW( ), v_date_of_birth)/365.25; IF v_age_in_years > 65 THEN CALL retire_employee( v_employee_id); ELSEIF v_age_in_years <= 65 THEN CALL keep_employee( v_employee_id); ELSE CALL invalid_dob_error( v_employee_id); END IF;
DAT-06: Employ "user" variables for global data sparingly |
A global variable is a data structure that can be referenced outside the scope or block in which it's declared. In MySQL, we can use "user" variableswhich can be recognized by being prefixed with @to set values that are available to any program within the current session.
In the following procedure, for example, we store the number of customers into the user variable @customer_count:
CREATE PROCEDURE sp_customer_count( ) SELECT COUNT(*) INTO @customer_count FROM customers;
Other procedures can examine the @customer_count and make decisions without having to recalculate the value. For instance, in this procedure we use the session variable in our setup logic:
CREATE PROCEDURE sp_crm_setup ( ) BEGIN IF @customer_count IS NULL THEN CALL sp_customer_count( ); END IF; IF @customer_count > 1000 THEN . . . Logic for larger enterprises . . ..
There is no doubt that the use of global variables can create easy solutions for difficult problems. However, the modern consensus is that global variables create their own problems and that these problems generally overwhelm any of the advantages they might confer.
Global variables defeat modularity and hinder code reuse, because any module that uses a global variable becomes dependent on some other module that creates or initializes the global variable. In the case of MySQL user variableswhich don't require a formal declarationthere is also the chance that two programmers might create identical "global" variables of the same name, thus causing subtle bugs that might occur only when modules are called in a certain order.
References
Code Complete, by Steve McConnell (Microsoft Press) contains an excellent discussion on the pros and cons of global variables.
DAT-07: Create stored programs in strict mode to avoid invalid data assignments |
Stored program type checking is very dependent on the setting of the sql_mode configuration variable. If a program is created when the sql_mode variable includes one of the "strict" settings (STRICT_TRANS_TABLES or STRICT_ALL_TABLES), then the program will reject invalid variable assignments with an error. If neither of the strict modes is in effect, then the stored program will generate a warning when invalid data assignments occur, but will continue execution.
For instance, in the following program, we accidentally declared a variable as CHAR(1) instead of INT:
CREATE PROCEDURE TenPlusTen( ) BEGIN DECLARE a INTEGER DEFAULT 10; DECLARE b CHAR(1) DEFAULT 10; DECLARE c INTEGER; SET c=a+b; SELECT c ; END;
If created in "non-strict" mode, this program generates a warning, but continues execution and returns the wrong result (10+10=11?):
mysql> CALL TenPlusTen( ); +------+ | C | +------+ | 11 | +------+ 1 row in set (0.00 sec) Query OK, 0 rows affected, 1 warning (0.01 sec) mysql> SHOW WARNINGS; +---------+------+----------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------+ | Warning | 1265 | Data truncated for column 'B' at row 1 | +---------+------+----------------------------------------+ 1 row in set (0.00 sec)
If created in strict mode, the program generates an error during execution, which is clearly better than returning the wrong result:
mysql> CALL TenPlusTen( ); ERROR 1406 (22001): Data too long for column 'b' at row 1
Non-strict stored program behavior can lead to unexpected and subtle bugs, and we recommend that you use strict mode when creating your stored programs. To enable strict mode, assign one of STRICT_TRANS_TABLES or STRICT_ALL_TABLES to your sql_mode variable:
SET sql_mode='STRICT_TRANS_TABLES';
Remember, it is the sql_mode that was in effect when the program is created that determines program behavior.