Java Cookbook, Second Edition
Recipe 10.14 Reading "Continued" Lines
Problem
You need to read lines that are continued with backslashes (\) or that are continued with leading spaces (such as email or news headers). Solution
Use my IndentContLineReader or EscContLineReader classes. Discussion
This functionality is likely to be reused, so it should be encapsulated in general-purpose classes. I offer the IndentContLineReader and EscContLineReader classes. EscContLineReader reads lines normally, but if a line ends with the escape character (by default, the backslash), the escape character is deleted and the following line is joined to the preceding line. So if you have lines like this in the input: Here is something I wanted to say:\ Try and Buy in every way. Go Team! and you read them using EscContLineReader's readLine( ) method, you get the following lines: Here is something I wanted to say: Try and Buy in every way. Go Team! Note in particular that my reader does provide a space character between the abutted parts of the continued line. An IOException is thrown if a file ends with the escape character. IndentContLineReader reads lines, but if a line begins with a space or tab, that line is joined to the preceding line. This is designed for reading email or Usenet news ("message") header lines. Here is an example input file: From: ian Tuesday, January 1, 2000 8:45 AM EST To: Book-reviewers List Received: by darwinsys.com (OpenBSD 2.6) from localhost at Tuesday, January 1, 2000 8:45 AM EST Subject: Hey, it's 2000 and MY computer is still up When read using an IndentContLineReader, this text comes out with the continued lines joined together into longer single lines: From: ian Tuesday, January 1, 2000 8:45 AM EST To: Book-reviewers List Received: by darwinsys.com (OpenBSD 2.6) from localhost at Tuesday, January 1, 2000 8:45 AM EST Subject: Hey, it's 2000 and MY computer is still up This class has a setContinueMode(boolean) method that lets you turn continuation mode off. This would normally be used to process the body of a message. Since the header and the body are separated by a null line in the text representation of messages, we can process the entire message correctly as follows: IndentContLineReader is = new IndentContLineReader( new StringReader(sampleTxt)); String aLine; // Print Mail/News Header System.out.println("----- Message Header -----"); while ((aLine = is.readLine( )) != null && aLine.length( ) > 0) { System.out.println(is.getLineNumber( ) + ": " + aLine); } // Make "is" behave like normal BufferedReader is.setContinuationMode(false); System.out.println( ); // Print Message Body System.out.println("----- Message Body -----"); while ((aLine = is.readLine( )) != null) { System.out.println(is.getLineNumber( ) + ": " + aLine); } Each of the Reader classes is subclassed from LineNumberReader so that you can use getLineNumber( ) . This is a very useful feature when reporting errors back to the user who prepared an input file; it can save them considerable hunting around in the file if you tell them the line number on which the error occurred. The Reader classes are actually subclassed from an abstract ContLineReader subclass, which I'll present first in Example 10-6. This class encapsulates the basic functionality for keeping track of lines that need to be joined together, and for enabling or disabling the continuation processing. Example 10-6. ContLineReader.java
import java.io.*; /** Subclass of LineNumberReader to allow reading of continued lines * using the readLine( ) method. The other Reader methods (readInt( )) etc.) * must not be used. Must subclass to provide the actual implementation * of readLine( ). */ public abstract class ContLineReader extends LineNumberReader { /** Line number of first line in current (possibly continued) line */ protected int firstLineNumber = 0; /** True if handling continuations, false if not; false == "PRE" mode */ protected boolean doContinue = true; /** Set the continuation mode */ public void setContinuationMode(boolean b) { doContinue = b; } /** Get the continuation mode */ public boolean getContinuationMode( ) { return doContinue; } /** Read one (possibly continued) line, stripping out the \ that * marks the end of each line but the last in a sequence. */ public abstract String readLine( ) throws IOException; /** Read one real line. Provided as a convenience for the * subclasses, so they don't embarass themselves trying to * call "super.readLine( )" which isn't very practical... */ public String readPhysicalLine( ) throws IOException { return super.readLine( ); } // Can NOT override getLineNumber in this class to return the # // of the beginning of the continued line, since the subclasses // all call super.getLineNumber... /** Construct a ContLineReader with the default input-buffer size. */ public ContLineReader(Reader in) { super(in); } /** Construct a ContLineReader using the given input-buffer size. */ public ContLineReader(Reader in, int sz) { super(in, sz); } // Methods that do NOT work - redirect straight to parent /** Read a single character, returned as an int. */ public int read( ) throws IOException { return super.read( ); } /** Read characters into a portion of an array. */ public int read(char[] cbuf, int off, int len) throws IOException { return super.read(cbuf, off, len); } public boolean markSupported( ) { return false; } } The ContLineReader class ends with code for handling the read( ) calls so that the class will work correctly. The IndentContLineReader class extends this to allow merging of lines based on indentation. Example 10-7 shows the code for the IndentContLineReader class. Example 10-7. IndentContLineReader.java
import java.io.*; /** Subclass of ContLineReader for lines continued by indentation of * following line (like RFC822 mail, Usenet News, etc.). */ public class IndentContLineReader extends ContLineReader { /** Line number of first line in current (possibly continued) line */ public int getLineNumber( ) { return firstLineNumber; } protected String prevLine; /** Read one (possibly continued) line, stripping out the '\'s that * mark the end of all but the last. */ public String readLine( ) throws IOException { String s; // If we saved a previous line, start with it. Else, // read the first line of possible continuation. // If non-null, put it into the StringBuffer and its line // number in firstLineNumber. if (prevLine != null) { s = prevLine; prevLine = null; } else { s = readPhysicalLine( ); } // save the line number of the first line. firstLineNumber = super.getLineNumber( ); // Now we have one line. If we are not in continuation // mode, or if a previous readPhysicalLine( ) returned null, // we are finished, so return it. if (!doContinue || s == null) return s; // Otherwise, start building a stringbuffer StringBuffer sb = new StringBuffer(s); // Read as many continued lines as there are, if any. while (true) { String nextPart = readPhysicalLine( ); if (nextPart == null) { // Egad! EOF within continued line. // Return what we have so far. return sb.toString( ); } // If the next line begins with space, it's continuation if (nextPart.length( ) > 0 && Character.isWhitespace(nextPart.charAt(0))) { sb.append(nextPart); // and add line. } else { // else we just read too far, so put in "pushback" holder prevLine = nextPart; break; } } return sb.toString( ); // return what's left } /* Constructors not shown */ // Built-in test case protected static String sampleTxt = "From: ian today now\n" + "Received: by foo.bar.com\n" + " at 12:34:56 January 1, 2000\n" + "X-Silly-Headers: Too Many\n" + "This line should be line 5.\n" + "Test more indented line continues from line 6:\n" + " space indented.\n" + " tab indented;\n" + "\n" + "This is line 10\n" + "the start of a hypothetical mail/news message, \n" + "that is, it follows a null line.\n" + " Let us see how it fares if indented.\n" + " also space-indented.\n" + "\n" + "How about text ending without a newline?"; // A simple main program for testing the class. public static void main(String argv[]) throws IOException { IndentContLineReader is = new IndentContLineReader( new StringReader(sampleTxt)); String aLine; // Print Mail/News Header System.out.println("----- Message Header -----"); while ((aLine = is.readLine( )) != null && aLine.length( ) > 0) { System.out.println(is.getLineNumber( ) + ": " + aLine); } // Make "is" behave like normal BufferedReader is.setContinuationMode(false); System.out.println( ); // Print Message Body System.out.println("----- Message Body -----"); while ((aLine = is.readLine( )) != null) { System.out.println(is.getLineNumber( ) + ": " + aLine); } is.close( ); } } |