Databases and Persistence

We all want to leave behind something that will outlast us, and Ruby processes are no exception. Every program you write leaves some record of its activity, even if its just data written to standard output. Most larger programs take this one step further: they store data from one run in a structured file, so that on another run they can pick up where they left off. There are a number of ways to persist data, from simple to insanely complex.

Simple persistence mechanisms like YAML let you write Ruby data structures to disk and load them back later. This is great for simple programs that don handle much data. Your program can store its entire state in a disk file, and load the file on its next invocation to pick up where it left off. If you never keep more data than can fit into memory, the simplest way to make it permanent is to store it with YAML, Marshal, or Madeleine, and reload it later (see Recipes 13.1, 13.2, and 13.3). Madeleine also lets you revisit the prior states of your data.

If your dataset won fit in memory, you need a database: a way of storing data on disk (usually in an indexed binary format) and retrieving parts of it quickly. The Berkeley database is the simplest database we cover: it operates like a hash, albeit a hash potentially much bigger than any you could keep in memory (Recipe 13.6).

But when most people think of a "database" they think of a relational database: MySQL, Postgres, Oracle, SQLite, or the like. A persistence mechanism stores data as Ruby data structures, and a Berkeley DB stores data as a hash of strings. But relational databases store data in the form of structured records with typed fields.

Because the tables of a relational database can have a complex structure and contain gigabytes of data, their contents are not accessed like normal Ruby data structures. Instead they e queried with SQL, a special programming language based on relational algebra. Most of the development time that goes into Ruby database libraries is spent trying to hide this fact. Several libraries hide the details of communication between a Ruby program and a SQL database; the balance of this chapter is devoted to showing how to use them.

Every relational database exposes a C API, and Ruby bindings to each API are available. We show you how to use the two most popular open source databases: MySQL (Recipe 13.9) and Postgres (Recipe 13.10).[1] But every database has different bindings, and speaks a slightly different variant of SQL. Fortunately, there are other libraries that hide these differences behind a layer of abstraction. Once you install the bindings, you can install abstraction layers atop them and rely on the abstraction layer to keep track of the differences between databases.

[1] SQLite deserves an honorable mention because, unlike other relational databases, it doesn require a server to run. The client code can directly query the database file. This makes things a lot easier to set up. Note that SQLite has two incompatible file formats (version 2 and version 3), and a gem exists for each version. You probably want the sqlite3-ruby gem.

Rubys simplest database abstraction library is DBI (its modeled after Perls DBI module). It does nothing more than provide a uniform interface to the different database bindings. You still have to write all the SQL yourself (and if you e serious about database neutrality, you must use the lowest common denominator of SQL), but you only have to learn a single binding API.

The more popular database abstraction libraries are ActiveRecord (the library of choice for Rails applications) and Og. Not only do these libraries hide the differences between databases, they hide most of the actual SQL. The database tables are represented as Ruby classes, the rows in the database tables as instances of those classes. You can find, create, and modify database rows by manipulating normal-looking Ruby objects. Neither Og nor ActiveRecord can do everything that raw SQL can, so you may also need to use DBI or one of the database-specific bindings.

One standard argument for database abstraction layers is that they make it easy to switch an applications underlying database without having to rewrite all the code. They certainly do make this easier, but it almost never happens.[2] The real advantage is that with abstraction layers, you don have to learn all the different database bindings. Even if you never change databases for any given project, throughout your career youll find yourself using different databases on different projects. Learning how to use a database abstraction layer can save you from having to learn multiple database-specific bindings.

[2] What does happen is that you may write a product designed to work with whatever database the user has installed. You can always require that your users run a specific database.

Whether you use ActiveRecord, Og, DBI, or database-specific bindings, youll need an actual database for your code to connect to. The recipes in this chapter assume youve got a database called cookbook and that you connect to it with the username "cookbook_user" and the password "password".

Heres how to set up cookbook as a MySQL database:

$ mysql -u root Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 6 to server version: 4.0.24_Debian-10-log Type help; or \h for help. Type \c to clear the buffer. mysql> create database cookbook; Query OK, 1 row affected (0.00 sec) mysql> grant all privileges on cookbook.* to cookbook_user@localhost identified by password; Query OK, 0 rows affected (0.00 sec)

Heres how to set cookbook up as a Postgres database (youll probably need to run these commands as the postgres user):

$ createuser Enter name of user to add: cookbook_user Enter password for new user: password Enter it again: password Shall the new user be allowed to create databases? (y/n) y Shall the new user be allowed to create more new users? (y/n) n CREATE USER $ createdb cookbook CREATE DATABASE

To avoid showing you the database connection code in every single recipe, weve factored it out into a library. If you want to run the code in this chapters recipes, you should put the following code in a file called cookbook_dbconnect.rb. Keep it in the directory where you keep the recipe code, or somewhere in your library include path, so that require cookbook_dbconnect will work.

This file defines database connection functions for DBI, ActiveRecord, and Og:

# cookbook_dbconnect.rb require ubygems require dbi require active_record require og

The with_db method gets a database connection through DBI and runs a code block in the context of that connection:

def with_db DBI.connect("dbi:Mysql:cookbook:localhost", "cookbook_user", "password") do |c| yield c end end

The activerecord_connect method only needs to be called once at the beginning of a program: after that, ActiveRecord will acquire database connections as needed.

def activerecord_connect ActiveRecord::Base.establish_connection(:adapter => "mysql", :host => "localhost", :username => "cookbook_user", :password => "password", :database => "cookbook") end

For your reference, this table presents the ActiveRecord adapter names for various kinds of databases.

Table 13-1.

Database

Adapter name

MySQL

mysql

PostgreSQL

postgresql

Oracle

oci

Microsoft SQL Server

sqlserver

SQLite 2

sqlite

SQLite 3

sqlite3

DB2

db2

The og_connect also needs to be called only once. One caveat: you must call it after youve defined the classes for your Og data model.

def og_connect Og.setup( { :destroy => false, :store => :mysql, :user => "cookbook_user", :password => "password", :name => "cookbook" } ) end

This version of cookbook_dbconnect assumes you e running against a MySQL database. For a different database, you just need to change the database name so that DBI, ActiveRecord, and Og know which adapter they should use.

Here are some resources for more information about databases in Ruby:

Категории