Program Construction
There are as many ways to write and structure a program as there are programmersor so it sometimes seems. We offer suggestions on how to structure your programs and how best to design parameter lists that we have found effective.
PRG-01: Encapsulate business rules and formulas behind accurately named functions |
This might be one of the most important best practices you will ever readand, we hope, follow. There is only one aspect of every software project that never changes: the fact that everything always changes. Business requirements, data structures, user interfaces: all of these things change and change frequently. Your job as a programmer is to write code that adapts easily to these changes.
So whenever you need to express a business rule (such as, "Is this string a valid ISBN?"), put it inside a subroutine that hides the individual steps (which might change) and returns the results (if any).
And whenever you need a formula (such as, "the total fine for an overdue book is the number of days overdue times $.50"), express that formula inside its own function.
Example
Suppose that you must be at least 10 years old to borrow books from the library. This is a simple formula and very unlikely to change. We set about building the application by creating the following logic:
IF v_dob > DATE_SUB(now( ), INTERVAL 10 YEAR) THEN SELECT Borrower must be at least 10 yrs old; ELSE INSERT INTO borrower (firstname,surname,date_of_birth) VALUES( v_firstname, v_surname, v_dob); END IF;
Later, while building a batch-processing script that checks and loads over 10,000 borrower applications, we include the following check in the program:
load_data:BEGIN IF DATEDIFF(now( ), v_dob)/365 < 10 THEN select (Borrower is not ten years old.); ELSE . . . load data . . . END IF; END load_data;
And so on from there. We are left, unfortunately, with a real job on our hands when we get a memo that says: "In order to support a new city-wide initiative to increase literacy, the minimum age for a library card has been changed from 10 to 8." And then, of course and by the way, there is the minor bug we introduced into our second construction of the rule (we forgot about leap years).
If only we had created a simple function the first time we needed to calculate minimum valid age! It would be something like this:
CREATE FUNCTION borrower_old_enough (in_dob DATE) RETURNS INT NO SQL BEGIN DECLARE v_retval INT DEFAULT 0; IF (in_dob < DATE_SUB(NOW( ), INTERVAL 10 YEAR)) THEN SET v_retval=1; ELSE SET v_retval=0; END IF; RETURN( v_retval); END;
And this function copes correctly with a NULL input, for which we forgot to check in those other programs. We can correct the age calculation logic in one place and easily change the minimum age from 10 to 8:
Benefits
You can update business rules and formulas in your code about as quickly and as often as users change that which was supposedly "cast in stone." Developers apply those rules consistently throughout the application base, since they are simply calling a program.
Your code is much easier to understand, since developers don have to wade through complex logic to understand which business rule is being implemented.
Challenges
Its mostly a matter of discipline and advance planning. Before you start building your application, create a set of programs to hold business rules and formulas for distinct areas of functionality. Make sure that the names of the programs clearly identify their purpose. Then promote and use them rigorously throughout the development organization.
PRG-02: Standardize module structure using function and procedure templates |
Once you adopt a set of guidelines for how developers should write procedures and functions, you need to help those developers follow their best practices. The bottom line is that guidelines will be followed if you make it easier to follow them than to ignore them.
For module standards, you can use either of the following approaches:
- Create a static template file that contains the generic logical structure for a procedure and/or function. Developers then copy that file to their own file, "de-genericize" the template by performing search-and-replace operations on placeholder strings with their own specific values (such as table names), and modify it from there.
- Use a program (one that youve written or a commercially available tool) that generates the code you want. This approach can be more flexible and can save you time, depending on how sophisticated a generator you use/create.
Example
Heres a simple function template that reinforces the single RETURN recommendation and encourages a standard header.
CREATE FUNCTION f_
Some third-party products (Toad for MySQL, for instance) allow you to define such a template and have it automatically applied to new stored programs.
The quality of each individual program is higher, since its more likely to conform to best practices. Programs are more consistent across the team and therefore easier to maintain and enhance. Sure, you
e laughing out loud. You write code for the real world. Its really complicated. Only 50 or 60 lines? You
e lucky if your programs are less than 500 lines! Well, its not a matter of complexity; its more an issue of how you handle that complexity. If your executable sections go on for hundreds of lines, with a loop starting on page 2 and ending on page 6, and so on, you will have a hard time "grasping the whole" and following the logic of the program. An alternative is to use step-wise refinement (a.k.a. "top down decomposition"): don dive into all the details immediately. Instead, start with a general description (written in actual code, mind you) of what your program is supposed to do. Then implement all subprogram calls in that description following the same method. The result is that at any given level of refinement, you can take in and easily comprehend the full underlying logic at that level. This technique is also referred to as "divide and conquer."
Consider the following procedure. The entire program might be hundreds of lines long, but the main body of assign_workload (starting with BEGIN /*main*/) is only 24 lines long. Not only that, you can read it pretty much as an exciting novel: "For every telesales rep, if that persons case load is less than his departments average, assign the next open case to that person and schedule the next appointment for that case" (well, maybe not that exciting).
CREATE PROCEDURE assign_workload( )
BEGIN /*main*/
DECLARE v_last_row INT DEFAULT 0;
DECLARE v_case_id, v_telesales_id, v_department_id INT;
DECLARE telesales_cur CURSOR FOR
SELECT telesales_id,department_id FROM telesales;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_last_row=1;
OPEN telesales_cur;
ts_loop:LOOP
FETCH telesales_cur INTO v_telesales_id, v_department_id;
IF v_last_row THEN LEAVE ts_loop; END IF;
IF analysis_caseload( v_telesales_id)<
analysis_avg_cases( v_department_id) THEN
SET v_case_id=assign_next_open_case( v_telesales_id);
CALL schedule_case( v_case_id);
END IF;
END LOOP;
CLOSE telesales_cur;
SET v_last_row=0;
END$$
You can implement complicated functionality with a minimum number of bugs by using step-wise refinement. A developer can understand and maintain a program with confidence if he can read and grasp the logic of the code.
You have to be disciplined about holding off writing the low-level implementation of functionality. Instead, come up with accurate, descriptive names for procedures and functions that contain the implementations themselves.
http://www.construx.com: Contains lots of good advice on writing modular code. Build lots of individual programs. Design each program so that it has a single, clearly defined purpose. That purpose should, of course, be expressed in the programs name, as well as in the program header. Avoid throwing extraneous functionality inside a program. Such statements are called side-effects and can cause lots of problems for people using your codewhich means your code won get used, except perhaps as source for a cut-and-paste session (orin hardcopy formfor kindling).
Heres a program that by name and "core" functionality displays information about all books published within a certain date range:
CREATE PROCEDURE book_details (
in_start_date DATE,
in_end_date DATE)
BEGIN
DECLARE v_title, v_author VARCHAR(60);
DECLARE v_last_book, v_book_id INT DEFAULT 0;
DECLARE book_cur CURSOR FOR
SELECT book_id,title,author
FROM books
WHERE date_published BETWEEN in_start_date
AND in_end_date;
OPEN book_cur;
book_loop:LOOP
FETCH book_cur INTO v_book_id, v_title, v_author;
IF v_last_book THEN LEAVE book_loop; END IF;
CALL details_show( v_title, v_author);
CALL update_borrow_history ( v_book_id);
END LOOP;
END$$
Notice, however, that it also updates the borrowing history for that book. Now, it might well be that at this point in time the display_book_info procedure is called only when the borrowing history also needs to be updated, justifying to some extent the way this program is written. However, regardless of current requirements, the name of the program is clearly misleading; there is no way to know that display_book_info also updates borrowing history. This is a hidden side-effect, and one that can cause serious problems over time.
Your code can be used with greater confidence, since it does only what it says (via its name, for the most part) its going to do. Developers will call and combine single-purpose programs as needed to get their jobs done. Many studies have confirmed that excessive nesting of IF, CASE, or LOOP structures leads to code that is difficult to understand. More than two or three levels of nesting is probably undesirable. Consider the following logic:
IF v_state=CA THEN
IF v_quantity > 100 THEN
IF v_customer_status=A THEN
IF v_product_code=X THEN
SET v_discount=.04;
ELSEIF v_product_code=Y THEN
SET v_discount=.04;
ELSE
SET v_discount=.01;
END IF;
ELSE
SET v_discount=0;
END IF;
ELSEIF v_quantity > 50 THEN
SET v_discount=.1;
. . . More logic . . .
END IF;
Its fairly difficult to determine which set of conditions is applied to any particular discount. For instance, consider the highlighted line aboveit takes a bit of puzzling to work out which states, quantities, and so on are associated with this discount: and that is with the vast majority of the logic removed. There are a few possible solutions to this deep nesting: A good general rule to follow as you write your stored programs is: "one way in and one way out." In other words, there should be just one way to enter or call a program (there is; you don have any choice in this matter). And there should be one way out, one exit path from a program (or loop) on successful termination. By following this rule, you end up with code that is much easier to trace, debug, and maintain. For a function, this means you should think of the executable section as a funnel; all the lines of code narrow down to the last executable statement:
RETURN return value;
Heres a simple function that relies on multiple RETURNs:
CREATE FUNCTION status_desc (in_cd CHAR(1))
RETURNS VARCHAR(20)
DETERMINISTIC
BEGIN
IF in_cd = C THEN
RETURN CLOSED;
ELSEIF in_cd = O THEN
RETURN OPEN;
ELSEIF in_cd = I THEN
RETURN INACTIVE;
END IF;
END;
At first glance, this function looks very reasonable. Yet this function has a deep flaw, due to the reliance upon separate RETURNs: if you don pass in "C", "O", or "I" for the cd_in argument, the function raises:
mysql> SELECT status_desc(A);
ERROR 1321 (2F005): FUNCTION status_desc ended without RETURN
Heres a rewrite that relies upon a single RETURN at the end of the function:
CREATE FUNCTION status_desc (in_cd CHAR(1))
RETURNS VARCHAR(20)
DETERMINISTIC
BEGIN
DECLARE v_status VARCHAR(20) ;
IF in_cd = C THEN
SET v_status=CLOSED;
ELSEIF in_cd = O THEN
SET v_status=OPEN;
ELSEIF in_cd = I THEN
SET v_status=INACTIVE;
END IF;
RETURN v_status;
END$$
This program also safely and correctly returns NULL if the program doesn receive a value of "C", "O", or "I", unlike the first implementation.
You
e less likely to write a function that raises the exception ERROR 1321 (2F005): FUNCTION %s ended without RETURNa nasty and embarrassing error. A single RETURN function is easier to trace and debug, since you don have to worry about multiple
exit pathways from the function. Because you often need to create both an UPDATE and an INSERT trigger to maintain a derived or denormalized column, you might find yourself replicating the same logic in each trigger. For instance, in a previous example we created BEFORE UPDATE and BEFORE INSERT TRiggers to calculate free shipping and discount rate. If the logic is nontrivial, you should implement the logic in a stored procedure or function and call that routine from your trigger.
Imagine that we are trying to automate the maintenance of a superannuation (18K plan) for our employees. We might create a trigger as follows to automate this processing upon insertion of a new employee row:
CREATE TRIGGER employees_bu
BEFORE UPDATE
ON employees
FOR EACH ROW
BEGIN
DECLARE v_18k_contrib NUMERIC(4,2);
IF NEW.salary <20000 THEN
SET NEW.contrib_18k=0;
ELSEIF NEW.salary <40000 THEN
SET NEW.contrib_18k=NEW.salary*.015;
ELSEIF NEW.salary<55000 THEN
SET NEW.contrib_18k=NEW.salary*.02;
ELSE
SET NEW.contrib_18k=NEW.salary*.025;
END IF;
END$$
But we need to ensure that this column is maintained when we create a new employee row. Instead of performing a copy-and-paste into a BEFORE INSERT trigger, we should locate this logic in a stored function as follows:
CREATE FUNCTION emp18k_contrib(in_salary NUMERIC(10,2))
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE v_contrib NUMERIC(10,2);
IF in_salary <20000 THEN
SET v_contrib=0;
ELSEIF in_salary <40000 THEN
SET v_contrib=in_salary*.015;
ELSEIF in_salary<55000 THEN
SET v_contrib=in_salary*.02;
ELSE
SET v_contrib=in_salary*.025;
END IF;
RETURN( v_contrib);
END;
Now we can use that function in both the INSERT and the UPDATE triggers. If the logic changes, we can modify the logic in one place and can therefore eliminate the risk of any inconsistency between inserted and updated rows.
DROP TRIGGER employees_bu$$
CREATE TRIGGER employees_bu
BEFORE UPDATE
ON employees
FOR EACH ROW
BEGIN
SET NEW.contrib_18k=emp18k_contrib(NEW.salary);
END;
Benefits
PRG-03: Limit execution section sizes to a single page (50-60 lines) using modularization Example
Benefits
Challenges
Resources
PRG-04: Avoid side-effects in your programs Example
Benefits
PRG-05: Avoid deep nesting of conditionals and loops PRG-06: Limit functions to a single RETURN statement in the executable section Example
Benefits
PRG-07: Use stored programs to implement code common to multiple triggers Example
Категории