Compressing and Archiving Files with Gzip and Tar
Problem
You want to write compressed data to a file to save space, or uncompress the contents of a compressed file. If you e compressing data, you might want to compress multiple files into a single archive file.
Solution
The most common compression format on Unix systems is gzip. Rubys zlib library lets you read to and write from gzipped I/O streams as though they were normal files. The most useful classes in this library are GzipWriter and GzipReader.[3]
[3] The compressed strings in these examples are actually larger than the originals. This is because I used very short strings to save space in the book, and short strings don compress well. Any compression technique introduces some overhead; with gzip, you don actually save any space by compressing a text string of less than about 100 bytes.
Heres GzipWriter being used to create a compressed file, and GzipReader decompressing the same file:
require zlib file = compressed.gz Zlib::GzipWriter.open(file) do |gzip| gzip << "For my next trick, Ill be written to a compressed file." gzip.close end open(file, b) { |f| f.read(10) } # => " 37213 10 002012766D 00 03" Zlib::GzipReader.open(file) { |gzip| gzip.read } # => "For my next trick, Ill be written to a compressed file."
Discussion
GzipWriter and GzipReader are most commonly used to write to files on disk, but you can wrap any file-like object in the appropriate class and automatically compress everything you write to it, or decompress everything you read from it.
The following code works the same way as the compression code in the Solution, but its more flexible: the File object thats passed into the Zlib::GzipWriter constructor could just as easily be a Socket or other file-like object.
open(compressed.gz, wb) do |file| gzip = Zlib::GzipWriter.new(file) gzip << "For my next trick, Ill be written to a compressed file." gzip.close end
If you need to compress or decompress a string, use the Zlib::Deflate or Zlib::Inflate classes rather than constructing a StringI0 object:
deflated = Zlib::Deflate.deflate("Im a compressed string.") # => "x234363T317UHTH…" Zlib::Inflate.inflate(deflated) # => "Im a compressed string."
Tar files
Gzip compresses a single file. What if you want to smash multiple files together into a single archive file? The standard archive format for Unix is tar, and tar files are sometimes called tarballs. A tarball might also be compressed with gzip to save space, but on Unix the archiving and the compression are separate steps (unlike on Windows, where a ZIP file both archives multiple files and compresses them).
The Minitar library is the simplest way to create tarballs in pure Ruby. Its available as the archive-tar-minitar gem.[4]
[4] The RubyGems package defines the Gem::Package::TarWriter and Gem::Package::TarReader classes, which expose an interface similar to Minitars. You can use these classes if you e fanatical about minimizing your dependencies, but I don recommend it. These classes only implement the bare-bones functionality necessary to pack and unpack gem-like tarballs, and they also make your code look like it has something to do with RubyGems.
Heres some code that creates a tarball containing two files and a directory. Note the Unix permission modes (0644, 0755, and 0600). These are the permissions the files will have when they e extracted, perhaps by the Unix tar command.
require ubygems require archive/tar/minitar open( arball.tar, wb) do |f| Archive::Tar::Minitar::Writer.open(f) do |w|
w.add_file(file1, :mode => 0644, :mtime => Time.now) do |stream, io| stream.write(This is file 1.) end w.mkdir(subdirectory, :mode => 0755, :mtime => Time.now) w.add_file(subdirectory/file2, :mode => 0600, :mtime => Time.now) do |stream, io| stream.write(This is file 2.) end end end
Heres a method that reads a tarball and print out its contents:
def browse_tarball(filename) open(filename, b) do |f| Archive::Tar::Minitar::Reader.open(f).each do |entry| puts %{I see a file "#{entry.name}" thats #{entry.size} bytes long.} end end end browse_tarball( arball.tar) # I see a file "file1" thats 15 bytes long. # I see a file "subdirectory" thats 0 bytes long. # I see a file "subdirectory/file2" thats 15 bytes long.
And heres a simple method for archiving a number of disk files into a compressed tarball. Note how the Minitar Writer is wrapped within a GzipWriter, which automatically compresses the data as its written. Minitar doesn have to know about the GzipWriter, because all file-like objects look more or less the same.
def make_tarball(destination, *paths) Zlib::GzipWriter.open(destination) do |gzip| out = Archive::Tar::Minitar::Output.new(gzip) paths.each do |file| puts "Packing #{file}" Archive::Tar::Minitar.pack_file(file, out) end out.close end end
This code creates some files and tars them up:
Dir.mkdir(colors)
paths = [colors/burgundy, colors/beige, colors/clear]
paths.each do |path|
open(path, w) do |f|
f.puts %{This is a dummy file.}
end
end
make_tarball(
ew_tarball.tgz, *paths)
# Packing colors/burgundy
# Packing colors/beige
# Packing colors/clear
# => #
See Also
- On Windows, both compression and archiving are usually handled with ZIP files; see the next recipe, Recipe 12.11, "Reading and Writing ZIP Files," for details
- Recipe 14.3, "Customizing HTTP Request Headers," uses zlib to decompress the gzipped body of a response from a web server
Категории