Compressing and Archiving Files with Gzip and Tar

Problem

You want to write compressed data to a file to save space, or uncompress the contents of a compressed file. If you e compressing data, you might want to compress multiple files into a single archive file.

Solution

The most common compression format on Unix systems is gzip. Rubys zlib library lets you read to and write from gzipped I/O streams as though they were normal files. The most useful classes in this library are GzipWriter and GzipReader.[3]

[3] The compressed strings in these examples are actually larger than the originals. This is because I used very short strings to save space in the book, and short strings don compress well. Any compression technique introduces some overhead; with gzip, you don actually save any space by compressing a text string of less than about 100 bytes.

Heres GzipWriter being used to create a compressed file, and GzipReader decompressing the same file:

require zlib file = compressed.gz Zlib::GzipWriter.open(file) do |gzip| gzip << "For my next trick, Ill be written to a compressed file." gzip.close end open(file, b) { |f| f.read(10) } # => "3721310002012766D0003" Zlib::GzipReader.open(file) { |gzip| gzip.read } # => "For my next trick, Ill be written to a compressed file."

Discussion

GzipWriter and GzipReader are most commonly used to write to files on disk, but you can wrap any file-like object in the appropriate class and automatically compress everything you write to it, or decompress everything you read from it.

The following code works the same way as the compression code in the Solution, but its more flexible: the File object thats passed into the Zlib::GzipWriter constructor could just as easily be a Socket or other file-like object.

open(compressed.gz, wb) do |file| gzip = Zlib::GzipWriter.new(file) gzip << "For my next trick, Ill be written to a compressed file." gzip.close end

If you need to compress or decompress a string, use the Zlib::Deflate or Zlib::Inflate classes rather than constructing a StringI0 object:

deflated = Zlib::Deflate.deflate("Im a compressed string.") # => "x234363T317UHTH…" Zlib::Inflate.inflate(deflated) # => "Im a compressed string."

Tar files

Gzip compresses a single file. What if you want to smash multiple files together into a single archive file? The standard archive format for Unix is tar, and tar files are sometimes called tarballs. A tarball might also be compressed with gzip to save space, but on Unix the archiving and the compression are separate steps (unlike on Windows, where a ZIP file both archives multiple files and compresses them).

The Minitar library is the simplest way to create tarballs in pure Ruby. Its available as the archive-tar-minitar gem.[4]

[4] The RubyGems package defines the Gem::Package::TarWriter and Gem::Package::TarReader classes, which expose an interface similar to Minitars. You can use these classes if you e fanatical about minimizing your dependencies, but I don recommend it. These classes only implement the bare-bones functionality necessary to pack and unpack gem-like tarballs, and they also make your code look like it has something to do with RubyGems.

Heres some code that creates a tarball containing two files and a directory. Note the Unix permission modes (0644, 0755, and 0600). These are the permissions the files will have when they e extracted, perhaps by the Unix tar command.

require ubygems require archive/tar/minitar open( arball.tar, wb) do |f| Archive::Tar::Minitar::Writer.open(f) do |w|

w.add_file(file1, :mode => 0644, :mtime => Time.now) do |stream, io| stream.write(This is file 1.) end w.mkdir(subdirectory, :mode => 0755, :mtime => Time.now) w.add_file(subdirectory/file2, :mode => 0600, :mtime => Time.now) do |stream, io| stream.write(This is file 2.) end end end

Heres a method that reads a tarball and print out its contents:

def browse_tarball(filename) open(filename, b) do |f| Archive::Tar::Minitar::Reader.open(f).each do |entry| puts %{I see a file "#{entry.name}" thats #{entry.size} bytes long.} end end end browse_tarball( arball.tar) # I see a file "file1" thats 15 bytes long. # I see a file "subdirectory" thats 0 bytes long. # I see a file "subdirectory/file2" thats 15 bytes long.

And heres a simple method for archiving a number of disk files into a compressed tarball. Note how the Minitar Writer is wrapped within a GzipWriter, which automatically compresses the data as its written. Minitar doesn have to know about the GzipWriter, because all file-like objects look more or less the same.

def make_tarball(destination, *paths) Zlib::GzipWriter.open(destination) do |gzip| out = Archive::Tar::Minitar::Output.new(gzip) paths.each do |file| puts "Packing #{file}" Archive::Tar::Minitar.pack_file(file, out) end out.close end end

This code creates some files and tars them up:

Dir.mkdir(colors) paths = [colors/burgundy, colors/beige, colors/clear] paths.each do |path| open(path, w) do |f| f.puts %{This is a dummy file.} end end make_tarball( ew_tarball.tgz, *paths) # Packing colors/burgundy # Packing colors/beige # Packing colors/clear # => #

See Also

Категории