Compressing Whitespace in an XML Document
Problem
When REXML parses a document, it respects the original whitespace of the documents text nodes. You want to make the document smaller by compressing extra whitespace.
Solution
Parse the document by creating a REXML::Document out of it. Within the Document constructor, tell the parser to compress all runs of whitespace characters:
require
exml/document
text = %{
Discussion
Sometimes whitespace within a document is significant, but usually (as with HTML) it can be compressed without changing the meaning of the document. The resulting document takes up less space on the disk and requires less bandwidth to transmit.
Whitespace compression doesn have to be all-or-nothing. REXML gives two ways to configure it. Instead of passing :all as a value for :compress_whitespace, you can pass in a list of tag names. Whitespace will only be compressed in those tags:
REXML::Document.new(text, { :compress_whitespace => %w{a} }).to_s
# => "
You can also switch it around: pass in :respect_whitespace and a list of tag names whose whitespace you don want to be compressed. This is useful if you know that whitespace is significant within certain parts of your document.
REXML::Document.new(text, { :respect_whitespace => %w{a} }).to_s
# => "
What about text nodes containing only whitespace? These are often inserted by XML pretty-printers, and they can usually be totally discarded without altering the meaning of a document. If you add :ignore_whitespace_nodes => :all to the parser configuration, REXML will simply decline to create text nodes that contain nothing but whitespace characters. Heres a comparison of :compress_whitespace alone, and in conjunction with :ignore_whitespace_nodes:
text = %{ By itself, :compress_
whitespace shouldn make a document less human-readable, but :ignore_whitespace_nodes almost certainly will.
See Also
Категории