Sockets
Before data is sent across the Internet from one host to another, it is split into packets of varying but finite size called datagrams . Datagrams range in size from a few dozen bytes to about 60,000 bytes. Anything larger, and often things smaller, must be split into smaller packets before it's transmitted. The advantage of this scheme is that if one packet is lost, it can be retransmitted without requiring redelivery of all other packets. Furthermore, if packets arrive out of order, they can be reordered at the receiving end of the connection.
Fortunately, packets are invisible to the Java programmer. The host's native networking software splits data into packets on the sending end and reassembles packets on the receiving end. Instead, the Java programmer is presented with a higher-level abstraction called a socket. The socket provides a reliable connection for the transmission of data between two hosts. It isolates you from the details of packet encodings, lost and retransmitted packets, and packets that arrive out of order. A socket performs four fundamental operations:
- Connects to a remote machine
- Sends data
- Receives data
- Closes the connection
A socket may not connect to more than one remote host. However, a socket may both send data to and receive data from the remote host it's connected to.
The java.net.Socket class is Java's interface to a network socket and allows you to perform all four fundamental socket operations. It provides raw, uninterpreted communication between two hosts. You can connect to remote machines; you can send data; you can receive data; you can close the connection. No part of the protocol is abstracted out, as is the case with URL and URLConnection. The programmer is completely responsible for the interaction between the client and the server.
To open a connection, call one of the Socket constructors, specifying the host to which you want to connect. Each Socket object is associated with exactly one remote host. To connect to a different host, you must create a new Socket object:
public Socket(String host, int port throws UnknownHostException, IOException public Socket(InetAddress address, int port) throws IOException public Socket(String host, int port, InetAddress localAddress, int localPort) throws IOException public Socket(InetAddress address, int port, InetAddress localAddress, int localPort) throws IOException
The host argument is a string like "www.oreilly.com" or "duke.poly.edu" that specifies the particular host to connect to. It may even be a numeric, dotted quad string such as "199.1.32.90". This argument may also be passed as a java.net.InetAddress object.
The port argument is the port on the remote host to connect to. A computer's network interface is logically subdivided into 65,536 different ports. As data traverses the Internet in packets, each packet carries both the address of the host it's going to and a port number on that host. A host reads the port number from each packet it receives to decide which program should receive that chunk of data. Many services run on well-known ports. For example, HTTP servers generally listen on port 80.
The optional localAddress and localPort arguments specify which address and port on the local host the socket connects from, assuming more than one is available. Most hosts have many available ports but only one address. These two arguments are optional. If they're left out, the constructor will choose reasonable values.
Data is sent across the socket via streams. These are the methods to get both streams for the socket:
public InputStream getInputStream( ) throws IOException public OutputStream getOutputStream( ) throws IOException
There's also a method to close the socket:
public void close( ) throws IOException
This closes the socket's input and output streams as well. Any attempt to read from or write to them after the socket is closed throws an IOException.
Example 5-3 is yet another program that connects to a web server and downloads a specified URL. However, since this one uses raw sockets, it needs to both send the HTTP request and read the headers in the response. These are not parsed away as they are by the URL and URLConnection classes; you use an output stream to send the request explicitly and an input stream to read the dataincluding HTTP headersback. Only HTTP URLs are supported.
Example 5-3. The SocketTyper program
import java.net.*; import java.io.*; public class SocketTyper { public static void main(String[] args) throws IOException { if (args.length != 1) { System.err.println("Usage: java SocketTyper url1"); return; } URL u = new URL(args[0]); if (!u.getProtocol( ).equalsIgnoreCase("http")) { System.err.println("Sorry, " + u.getProtocol( ) + " is not supported"); return; } String host = u.getHost( ); int port = u.getPort( ); String file = u.getFile( ); if (file == null) file = "/"; // default port if (port <= 0) port = 80; Socket s = null; try { s = new Socket(host, port); String request = "GET " + file + " HTTP/1.1 " + "User-Agent: SocketTyper " + "Accept: text/* " + "Host: " + host + " " + " "; byte[] b = request.getBytes("US-ASCII"); OutputStream out = s.getOutputStream( ); InputStream in = s.getInputStream( ); out.write(b); out.flush( ); for (int c = in.read(); c != -1; c = in.read( )) { System.out.write(c); } } finally { if (s != null && s.isConnected()) s.close( ); } } } |
For example, when SocketTyper connects to http://www.oreilly.com/, here is what you see:
$ java SocketTyper http://www.oreilly.com/ HTTP/1.1 200 OK Date: Mon, 23 May 2005 14:03:17 GMT Server: Apache/1.3.33 (Unix) PHP/4.3.10 mod_perl/1.29 P3P: policyref="http://www.oreillynet.com/w3c/p3p.xml",CP="CAO DSP COR CURa ADMa DEVa TAIa PSAa PSDa IVAa IVDa CONo OUR DELa PUBi OTRa IND PHY ONL UNI PUR COM N AV INT DEM CNT STA PRE" Last-Modified: Mon, 23 May 2005 08:20:30 GMT ETag: "20653-db8c-4291924e" Accept-Ranges: bytes Content-Length: 56204 Content-Type: text/html X-Cache: MISS from www.oreilly.com
...
Notice the header lines here, which you didn't see in Example 5-1. When you use the URL class to download a web page, the associated protocol handler consumes the HTTP header before you get a stream.