Performing Random Access on Read-Once Input Streams

Problem

You have an IO object, probably a socket, that doesn't support random-access methods like seek, pos=, and rewind. You want to treat this object like a file on disk, where you can jump around and reread parts of the file.

Solution

The simplest solution is to read the entire contents of the socket (or as much as you're going to need) and put it into a StringIO object. You can then treat the StringIO object exactly like a file:

require 'socket' require 'stringio' sock = TCPSocket.open("www.example.com", 80) sock.write("GET / ") file = StringIO.new(sock.read) file.read(10) # => "

" " this web page "

Discussion

A socket is supposed to work just like a file, but sometimes the illusion breaks down. Since the data is coming from another computer over which you have no control, you can't just go back and reread data you've already read. That data has already been sent over the pipe, and the server doesn't care if you lost it or need to process it again.

If you have enough memory to read the entire contents of a socket, it's easy to put the results into a form that more closely simulates a file on disk. But you might not want to read the entire socket, or the socket may be one that keeps sending data until you close it. In that case you'll need to buffer the data as you read it. Instead of using memory for the entire contents of the socket (which may be infinite), you'll only use memory for the data you've actually read.

This code defines a BufferedIO class that adds data to an internal StringIO as it's read from its source:

class BufferedIO def initialize(io) @buff = StringIO.new @source = io @pos = 0 end def read(x=nil) to_read = x ? to_read = x+@buff.pos-@buff.size : nil _append(@source.read(to_read)) if !to_read or to_read > 0 @buff.read(x) end def pos=(x) read(x-@buff.pos) if x > @buff.size @buff.pos = x end def seek(x, whence=IO::SEEK_SET) case whence when IO::SEEK_SET then self.pos=(x) when IO::SEEK_CUR then self.pos=(@buff.pos+x) when IO::SEEK_END then read; self.pos=(@buff.size-x) # Note: SEEK END reads all the socket data. end pos end # Some methods can simply be delegated to the buffer. ["pos", "rewind", "tell"].each do |m| module_eval "def #{m} @buff.#{m} end" end private def _append(s) @buff << s @buff.pos -= s.size end end

Now you can seek, rewind, and generally move around in an input socket as if it were a disk file. You only have to read as much data as you need:

sock = TCPSocket.open("www.example.com", 80) sock.write("GET / ") file = BufferedIO.new(sock) file.read(10) # => "

0 file.read(10) # => " 90 file.read(15) # => " this web page " file.seek(-10, IO::SEEK_CUR) # => 95 file.read(10) # => " web page "

BufferedIO doesn't implement all the methods of IO, only the ones not implemented by socket-type IO objects. If you need the other methods, you should be able to implement the ones you need using the existing methods as guidelines. For instance, you could implement readline like this:

class BufferedIO def readline oldpos = @buff.pos line = @buff.readline unless @buff.eof? if !line or line[-1] != ? _append(@source.readline) # Finish the line @buff.pos = oldpos # Go back to where we were line = @buff.readline # Read the line again end line end end file.readline # => "by typing "example.com", "

 

See Also

Категории