Java Network Programming, Third Edition

     

The java.net.HttpURLConnection class is an abstract subclass of URLConnection ; it provides some additional methods that are helpful when working specifically with http URLs:

public abstract class HttpURLConnection extends URLConnection

In particular, it contains methods to get and set the request method, decide whether to follow redirects, get the response code and message, and figure out whether a proxy server is being used. It also includes several dozen mnemonic constants matching the various HTTP response codes. Finally, it overrides the getPermission() method from the URLConnection superclass, although it doesn't change the semantics of this method at all.

Since this class is abstract and its only constructor is protected, you can't directly create instances of HttpURLConnection . However, if you construct a URL object using an http URL and invoke its openConnection( ) method, the URLConnection object returned will be an instance of HttpURLConnection . Cast that URLConnection to HttpURLConnection like this:

URL u = new URL("http://www.amnesty.org/"); URLConnection uc = u.openConnection( ); HttpURLConnection http = (HttpURLConnection) uc;

Or, skipping a step, like this:

URL u = new URL("http://www.amnesty.org/"); HttpURLConnection http = (HttpURLConnection) u.openConnection( );

There's another HttpURLConnection class in the undocumented sun.net.www.protocol.http package, a concrete subclass of java.net.HttpURLConnection that actually implements the abstract connect( ) method:

public class HttpURLConnection extends java.net.HttpURLConnection

There's little reason to access this class directly. It doesn't add any important methods that aren't already declared in java.net.HttpURLConnection or java.net.URLConnection . However, any URLConnection you open to an http URL will be an instance of this class.

15.11.1 The Request Method

When a web client contacts a web server, the first thing it sends is a request line. Typically, this line begins with GET and is followed by the name of the file that the client wants to retrieve and the version of the HTTP protocol that the client understands. For example:

GET /catalog/jfcnut/index.html HTTP/1.0

However, web clients can do more than simply GET files from web servers. They can POST responses to forms. They can PUT a file on a web server or DELETE a file from a server. And they can ask for just the HEAD of a document. They can ask the web server for a list of the OPTIONS supported at a given URL. They can even TRACE the request itself. All of these are accomplished by changing the request method from GET to a different keyword. For example, here's how a browser asks for just the header of a document using HEAD:

HEAD /catalog/jfcnut/index.html HTTP/1.1 User-Agent: Java/1.4.2_05 Host: www.oreilly.com Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2 Connection: close

By default, HttpURLConnection uses the GET method. However, you can change this with the setRequestMethod( ) method:

public void setRequestMethod(String method) throws ProtocolException

The method argument should be one of these seven case-sensitive strings:

  • GET

  • POST

  • HEAD

  • PUT

  • OPTIONS

  • DELETE

  • TRACE

If it's some other method, then a java.net.ProtocolException , a subclass of IOException , is thrown. However, it's generally not enough to simply set the request method. Depending on what you're trying to do, you may need to adjust the HTTP header and provide a message body as well. For instance, POSTing a form requires you to provide a Content-length header. We've already explored the GET and POST methods. Let's look at the other five possibilities.

Some web servers support additional, nonstandard request methods. For instance, Apache 1.3 also supports CONNECT, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, and UNLOCK. However, Java doesn't support any of these.

15.11.1.1 HEAD

The HEAD function is possibly the simplest of all the request methods. It behaves much like GET. However, it tells the server only to return the HTTP header, not to actually send the file. The most common use of this method is to check whether a file has been modified since the last time it was cached. Example 15-9 is a simple program that uses the HEAD request method and prints the last time a file on a server was modified.

Example 15-9. Get the time when a URL was last changed

import java.net.*; import java.io.*; import java.util.*; public class LastModified { public static void main(String args[]) { for (int i=0; i < args.length; i++) { try { URL u = new URL(args[i]); HttpURLConnection http = (HttpURLConnection) u.openConnection( ); http.setRequestMethod("HEAD"); System.out.println(u + "was last modified at " + new Date(http.getLastModified( ))); } // end try catch (MalformedURLException ex) { System.err.println(args[i] + " is not a URL I understand"); } catch (IOException ex) { System.err.println(ex); } System.out.println( ); } // end for } // end main } // end LastModified

Here's the output from one run:

D:\JAVA\JNP3\examples> java LastModified http://www.ibiblio.org/xml/ http://www.ibiblio.org/xml/was last modified at Thu Aug 19 06:06:57 PDT 2004

It wasn't absolutely necessary to use the HEAD method here. We'd have gotten the same results with GET. But if we used GET, the entire file at http://www.ibiblio.org/xml/ would have been sent across the network, whereas all we cared about was one line in the header. When you can use HEAD, it's much more efficient to do so.

15.11.1.2 OPTIONS

The OPTIONS request method asks what options are supported for a particular URL. If the request URL is an asterisk (*), the request applies to the server as a whole rather than to one particular URL on the server. For example:

OPTIONS /xml/ HTTP/1.1 User-Agent: Java/1.4.2_05 Host: www.ibiblio.org Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2 Connection: close

The server responds to an OPTIONS request by sending an HTTP header with a list of the commands allowed on that URL. For example, when the previous command was sent, here's what Apache responded:

Date: Thu, 21 Oct 2004 18:06:10 GMT Server: Apache/1.3.4 (Unix) PHP/3.0.6 mod_perl/1.17 Content-Length: 0 Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACE Connection: close

The list of legal commands is found in the Allow field. However, in practice these are just the commands the server understands, not necessarily the ones it will actually perform on that URL. For instance, let's look at what happens when you try the DELETE request method.

15.11.1.3 DELETE

The DELETE method removes a file at a specified URL from a web server. Since this request is an obvious security risk, not all servers will be configured to support it, and those that are will generally demand some sort of authentication. A typical DELETE request looks like this:

DELETE /javafaq/2004march.html HTTP/1.1 User-Agent: Java/1.4.2_05 Host: www.ibiblio.org Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2 Connection: close

The server is free to refuse this request or ask for identification. For example:

Date: Thu, 19 Aug 2004 14:32:15 GMT Server: Apache/1.3.4 (Unix) PHP/3.0.6 mod_perl/1.17 Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACE Connection: close Transfer-Encoding: chunked Content-Type: text/html content-length: 313 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>405 Method Not Allowed</TITLE> </HEAD><BODY> <H1>Method Not Allowed</H1> The requested method DELETE is not allowed for the URL /javafaq/2004march.html.<P> <HR> <ADDRESS>Apache/1.3.4 Server at www.ibiblio.org Port 80</ADDRESS> </BODY></HTML>

Even if the server accepts this request, its response is implementation-dependent. Some servers may delete the file; others simply move it to a trash directory. Others simply mark it as not readable. Details are left up to the server vendor.

15.11.1.4 PUT

Many HTML editors and other programs that want to store files on a web server use the PUT method. It allows clients to place documents in the abstract hierarchy of the site without necessarily knowing how the site maps to the actual local filesystem. This contrasts with FTP, where the user has to know the actual directory structure as opposed to the server's virtual directory structure.

Here's a how a browser might PUT a file on a web server:

PUT /hello.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.6 [en] (WinNT; I) Pragma: no-cache Host: www.ibiblio.org Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-Length: 364 <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Author" content="Elliotte Rusty Harold"> <meta name="GENERATOR" content="Mozilla/4.6 [en] (WinNT; I) [Netscape]"> <title>Mine</title> </head> <body> <b>Hello</b> </body> </html>

As with deleting files, allowing arbitrary users to PUT files on your web server is a clear security risk. Generally, some sort of authentication is required and the server must be specially configured to support PUT. The details are likely to vary from server to server. Most web servers do not include full support for PUT out of the box. For instance, Apache requires you to install an additional module just to handle PUT requests .

15.11.1.5 TRACE

The TRACE request method sends the HTTP header that the server received from the client. The main reason for this information is to see what any proxy servers between the server and client might be changing. For example, suppose this TRACE request is sent:

TRACE /xml/ HTTP/1.1 Hello: Push me User-Agent: Java/1.4.2_05 Host: www.ibiblio.org Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2 Connection: close

The server should respond like this:

Date: Thu, 19 Aug 2004 17:50:02 GMT Server: Apache/1.3.4 (Unix) PHP/3.0.6 mod_perl/1.17 Connection: close Transfer-Encoding: chunked Content-Type: message/http content-length: 169 TRACE /xml/ HTTP/1.1 Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2 Connection: close Hello: Push me Host: www.ibiblio.org User-Agent: Java/1.4.2_05

The first six lines are the server's normal response HTTP header. The lines from TRACE /xml/ HTTP/1.1 on are the echo of the original client request. In this case, the echo is faithful, although out of order. However, if there were a proxy server between the client and server, it might not be.

15.11.2 Disconnecting from the Server

Recent versions of HTTP support what's known as Keep-Alive . Keep-Alive enhances the performance of some web connections by allowing multiple requests and responses to be sent in a series over a single TCP connection. A client indicates that it's willing to use HTTP Keep-Alive by including a Connection field in the HTTP request header with the value Keep-Alive:

Connection: Keep-Alive

However, when Keep-Alive is used, the server can no longer close the connection simply because it has sent the last byte of data to the client. The client may, after all, send another request. Consequently, it is up to the client to close the connection when it's done.

Java marginally supports HTTP Keep-Alive, mostly by piggybacking on top of browser support. It doesn't provide any convenient API for making multiple requests over the same connection. However, in anticipation of a day when Java will better support Keep-Alive, the HttpURLConnection class adds a disconnect( ) method that allows the client to break the connection:

public abstract void disconnect( )

In practice, you rarely if ever need to call this.

15.11.3 Handling Server Responses

The first line of an HTTP server's response includes a numeric code and a message indicating what sort of response is made. For instance, the most common response is 200 OK, indicating that the requested document was found. For example:

HTTP/1.1 200 OK Date: Fri, 20 Aug 2004 15:33:40 GMT Server: Apache/1.3.4 (Unix) PHP/3.0.6 mod_perl/1.17 Last-Modified: Sun, 06 Jun 1999 16:30:33 GMT ETag: "28d907-657-375aa229" Accept-Ranges: bytes Content-Length: 1623 Connection: close Content-Type: text/html <HTML> <HEAD> rest of document follows...

Another response that you're undoubtedly all too familiar with is 404 Not Found, indicating that the URL you requested no longer points to a document. For example:

HTTP/1.1 404 Not Found Date: Fri, 20 Aug 2004 15:39:16 GMT Server: Apache/1.3.4 (Unix) PHP/3.0.6 mod_perl/1.17 Last-Modified: Mon, 20 Sep 1999 19:25:05 GMT ETag: "5-14ab-37e68a11" Accept-Ranges: bytes Content-Length: 5291 Connection: close Content-Type: text/html <html> <head> <title>Lost ... and lost</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body bgcolor="#FFFFFF"> <div align="left"> <h1>404 FILE NOT FOUND</h1> Rest of error message follows...

There are many other, less common responses. For instance, code 301 indicates that the resource has permanently moved to a new location and the browser should redirect itself to the new location and update any bookmarks that point to the old location. For example:

HTTP/1.1 301 Moved Permanently Date: Fri, 20 Aug 2004 15:36:44 GMT Server: Apache/1.3.4 (Unix) PHP/3.0.6 mod_perl/1.17 Location: http://www.ibiblio.org/javafaq/books/beans/index.html Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>301 Moved Permanently</TITLE> </HEAD><BODY> <H1>Moved Permanently</H1> The document has moved <A HREF="http://www.ibiblio.org/javafaq/books/beans/index .html">here</A>.<P> <HR> <ADDRESS>Apache/1.3.4 Server at www.ibiblio.org Port 80</ADDRESS> </BODY></HTML>

The first line of this response is called the response message . It will not be returned by the various getHeaderField( ) methods in URLConnection . However, HttpURLConnection has a method to read and return just the response message. This is the aptly named getResponseMessage() :

public String getResponseMessage( ) throws IOException

Often all you need from the response message is the numeric response code. HttpURLConnection also has a getResponseCode( ) method to return this as an int :

public int getResponseCode( ) throws IOException

HTTP 1.0 defines 16 response codes. HTTP 1.1 expands this to 40 different codes. While some numbers , notably 404, have become slang almost synonymous with their semantic meaning, most of them are less familiar. The HttpURLConnection class includes 36 named constants representing the most common response codes. These are summarized in Table 15-3.

Table 15-3. The HTTP 1.1 response codes

Code

Meaning

HttpURLConnection constant

1XX

Informational

 

100

The server is prepared to accept the request body and the client should send it; a new feature in HTTP 1.1 that allows clients to ask whether the server will accept a request before they send a large amount of data as part of the request.

N/A

101

The server accepts the client's request in the Upgrade header field to change the application protocol; e.g., from HTTP 1.0 to HTTP 1.1.

N/A

2XX

Request succeeded.

 

200

The most common response code. If the request method was GET or POST, the requested data is contained in the response along with the usual headers. If the request method was HEAD, only the header information is included.

HTTP_OK

201

The server has created a resource at the URL specified in the body of the response. The client should now attempt to load that URL. This code is sent only in response to POST requests.

HTTP_CREATED

202

This rather uncommon response indicates that a request (generally from POST) is being processed , but the processing is not yet complete, so no response can be returned. However, the server should return an HTML page that explains the situation to the user and provide an estimate of when the request is likely to be completed, and, ideally , a link to a status monitor of some kind.

HTTP_ACCEPTED

203

The resource representation was returned from a caching proxy or other local source and is not guaranteed to be up to date.

HTTP_NOT_AUTHORITATIVE

204

The server has successfully processed the request but has no information to send back to the client. This is normally the result of a poorly written form-processing program on the server that accepts data but does not return a response to the user.

HTTP_NO_CONTENT

205

The server has successfully processed the request but has no information to send back to the client. Furthermore, the client should clear the form to which the request is sent.

HTTP_RESET

206

The server has returned the part of the document the client requested using the byte range extension to HTTP, rather than the whole document.

HTTP_PARTIAL

3XX

Relocation and redirection.

 

300

The server is providing a list of different representations (e.g., PostScript and PDF) for the requested document.

HTTP_MULT_CHOICE

301

The resource has moved to a new URL. The client should automatically load the resource at this URL and update any bookmarks that point to the old URL.

HTTP_MOVED_PERM

302

The resource is at a new URL temporarily, but its location will change again in the foreseeable future; therefore, bookmarks should not be updated.

HTTP_MOVED_TEMP

303

Generally used in response to a POST form request, this code indicates that the user should retrieve a document other than the one requested (as opposed to a different location for the requested document).

HTTP_SEE_OTHER

304

The If-Modified-Since header indicates that the client wants the document only if it has been recently updated. This status code is returned if the document has not been updated. In this case, the client should load the document from its cache.

HTTP_NOT_MODIFIED

305

The Location header field contains the address of a proxy that will serve the response.

HTTP_USE_PROXY

307

Almost the same as code 303, a 307 response indicates that the resource has moved to a new URL, although it may move again to a different URL in the future. The client should automatically load the page at this URL.

N/A

4XX

Client error.

 

400

The client request to the server used improper syntax. This is rather unusual in normal web browsing but more common when debugging custom clients.

HTTP_BAD_REQUEST

401

Authorization, generally a username and password, is required to access this page. Either a username and password have not yet been presented or the username and password are invalid.

HTTP_UNAUTHORIZED

402

Not used today, but may be used in the future to indicate that some sort of digital cash transaction is required to access the resource.

HTTP_PAYMENT_REQUIRED

403

The server understood the request, but is deliberately refusing to process it. Authorization will not help. This might be used when access to a certain page is denied to a certain range of IP addresses.

HTTP_FORBIDDEN

404

This most common error response indicates that the server cannot find the requested resource. It may indicate a bad link, a document that has moved with no forwarding address, a mistyped URL, or something similar.

HTTP_NOT_FOUND

405

The request method is not allowed for the specified resource; for instance, you tried to PUT a file on a web server that doesn't support PUT or tried to POST to a URI that only allows GET.

HTTP_BAD_METHOD

406

The requested resource cannot be provided in a format the client is willing to accept, as indicated by the Accept field of the request HTTP header.

HTTP_NOT_ACCEPTABLE

407

An intermediate proxy server requires authentication from the client, probably in the form of a username and password, before it will retrieve the requested resource.

HTTP_PROXY_AUTH

408

The client took too long to send the request, perhaps because of network congestion.

HTTP_CLIENT_TIMEOUT

409

A temporary conflict prevents the request from being fulfilled; for instance, two clients are trying to PUT the same file at the same time.

HTTP_CONFLICT

410

Like a 404, but makes a stronger assertion about the existence of the resource. The resource has been deliberately deleted (not moved) and will not be restored. Links to it should be removed.

HTTP_GONE

411

The client must but did not send a Content-length field in the client request HTTP header.

HTTP_LENGTH_REQUIRED

412

A condition for the request that the client specified in the request HTTP header is not satisfied.

HTTP_PRECON_FAILED

413

The body of the client request is larger than the server is able to process at this time.

HTTP_ENTITY_TOO_LARGE

414

The URI of the request is too long. This is important to prevent certain buffer overflow attacks.

HTTP_REQ_TOO_LONG

415

The server does not understand or accept the MIME content-type of the request body.

HTTP_UNSUPPORTED_TYPE

416

The server cannot send the byte range the client requested.

N/A

417

The server cannot meet the client's expectation given in an Expect-request header field.

N/A

5XX

Server error.

 

500

An unexpected condition occurred that the server does not know how to handle.

HTTP_SERVER_ERROR

HTTP_INTERNAL_ERROR

501

The server does not have a feature that is needed to fulfill this request. A server that cannot handle POST requests might send this response to a client that tried to POST form data to it.

HTTP_NOT_IMPLEMENTED

502

This code is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request.

HTTP_BAD_GATEWAY

503

The server is temporarily unable to handle the request, perhaps due to overloading or maintenance.

HTTP_UNAVAILABLE

504

The proxy server did not receive a response from the upstream server within a reasonable amount of time, so it can't send the desired response to the client.

HTTP_GATEWAY_TIMEOUT

505

The server does not support the version of HTTP the client is using (e.g., the as-yet-nonexistent HTTP 2.0).

HTTP_VERSION

Example 15-10 is a revised source viewer program that now includes the response message. The lines added since SourceViewer2 are in bold.

Example 15-10. A SourceViewer that includes the response code and message

import java.net.*; import java.io.*; import javax.swing.*; import java.awt.*; public class SourceViewer3 { public static void main (String[] args) { for (int i = 0; i < args.length; i++) { try { //Open the URLConnection for reading URL u = new URL(args[i]); HttpURLConnection uc = (HttpURLConnection) u.openConnection( ); int code = uc.getResponseCode( ); String response = uc.getResponseMessage( ); System.out.println("HTTP/1.x " + code + " " + response); for (int j = 1; ; j++) { String header = uc.getHeaderField(j); String key = uc.getHeaderFieldKey(j); if (header == null key == null) break; System.out.println(uc.getHeaderFieldKey(j) + ": " + header); } // end for InputStream in = new BufferedInputStream(uc.getInputStream( )); // chain the InputStream to a Reader Reader r = new InputStreamReader(in); int c; while ((c = r.read( )) != -1) { System.out.print((char) c); } } catch (MalformedURLException ex) { System.err.println(args[0] + " is not a parseable URL"); } catch (IOException ex) { System.err.println(ex); } } // end if } // end main } // end SourceViewer3

The only thing this program doesn't read that the server sends is the version of HTTP the server is using. There's currently no method to return that. If you need it, you'll just have to use a raw socket instead. Consequently, in this example, we just fake it as "HTTP/1.x", like this:

% java SourceViewer3 http://www.oreilly.com HTTP/1.x 200 OK Server: WN/1.15.1 Date: Mon, 01 Nov 1999 23:39:19 GMT Last-modified: Fri, 29 Oct 1999 23:40:06 GMT Content-type: text/html Title: www.oreilly.com -- Welcome to O'Reilly &amp; Associates! -- computer books, software, online publishing Link: <mailto:webmaster@ora.com>; rev="Made" <HTML> <HEAD> ...

15.11.3.1 Error conditions

On occasion, the server encounters an error but returns useful information in the message body nonetheless. For example, when a client requests a nonexistent page from the www.ibiblio.org web site, rather than simply returning a 404 error code, the server sends the search page shown in Figure 15-2 to help the user figure out where the missing page might have gone.

Figure 15-2. IBiblio's 404 page

The getErrorStream( ) method returns an InputStream containing this data or null if no error was encountered or no data returned:

public InputStream getErrorStream( ) // Java 1.2

In practice, this isn't necessary. Most implementations will return this data from getInputStream() as well.

15.11.3.2 Redirects

The 300-level response codes all indicate some sort of redirect; that is, the requested resource is no longer available at the expected location but it may be found at some other location. When encountering such a response, most browsers automatically load the document from its new location. However, this can be a security risk, because it has the potential to move the user from a trusted site to an untrusted one, perhaps without the user even noticing.

By default, an HttpURLConnection follows redirects. However, the HttpURLConnection class has two static methods that let you decide whether to follow redirects:

public static boolean getFollowRedirects( ) public static void setFollowRedirects(boolean follow)

The getFollowRedirects( ) method returns true if redirects are being followed, false if they aren't. With an argument of true, the setFollowRedirects( ) method makes HttpURLConnection objects follow redirects. With an argument of false , it prevents them from following redirects. Since these are static methods, they change the behavior of all HttpURLConnection objects constructed after the method is invoked. The setFollowRedirects( ) method may throw a SecurityException if the security manager disallows the change. Applets especially are not allowed to change this value.

Java has two methods to configure redirection on an instance-by-instance basis. These are:

public boolean getInstanceFollowRedirects( ) // Java 1.3 public void setInstanceFollowRedirects(boolean followRedirects) // Java 1.3

If setInstanceFollowRedirects( ) is not invoked on a given HttpURLConnection , that HttpURLConnection simply follows the default behavior as set by the class method HttpURLConnection.setFollowRedirects( ) .

15.11.4 Proxies

Many users behind firewalls or using AOL or other high-volume ISPs access the web through proxy servers. The usingProxy( ) method tells you whether the particular HttpURLConnection is going through a proxy server:

public abstract boolean usingProxy( ) // Java 1.3

It returns true if a proxy is being used, false if not. In some contexts, the use of a proxy server may have security implications.

15.11.5 Streaming Mode

Every request sent to an HTTP server has an HTTP header. One field in this header is the Content-length; that is, the number of bytes in the body of the request. The header comes before the body. However, to write the header you need to know the length of the body, which you may not have yet. Normally the way Java solves this Catch-22 is by caching every thing you write onto the OutputStream retrieved from the HttpURLConnection until the stream is closed. At that point, it knows how many bytes are in the body so it has enough information to write the Content-length header.

This scheme is fine for small requests sent in response to typical web forms. However, it's burdensome for responses to very long forms or some SOAP messages. It's very wasteful and slow for medium-to-large documents sent with HTTP PUT. It's much more efficient if Java doesn't have to wait for the last byte of data to be written before sending the first byte of data over the network. Java 1.5 offers two solutions to this problem. If you know the size of your datafor instance, you're uploading a file of known size using HTTP PUTyou can tell the HttpURLConnection object the size of that data. If you don't know the size of the data in advance, the you can use chunked transfer encoding instead. In chunked transfer encoding, the body of the request is sent in multiple pieces, each with its own separate content length. To turn on chunked transfer encoding, just pass the size of the chunks you want to the setChunkedStreamingMode( ) method before you connect the URL.

public void setChunkedStreamingMode(int chunkLength) // Java 1.5

Java will then use a slightly different form of HTTP than the examples in this book. However, to the Java programmer the difference is irrelevant. As long as you're using the URLConnection class instead of raw sockets and as long as the server supports chunked transfer encoding, it should all just work without any further changes to your code. However, not all servers support chunked encoding, though most of the late-model, major ones do. Even more importantly, chunked transfer encoding does get in the way of authentication and redirection. If you're trying to send chunked files to a redirected URL or one that requires password authentication, an HttpRetryException will be thrown. You'll then need to retry the request at the new URL or at the old URL with the appropriate credentials; and this all needs to be done manually without the full support of the HTTP protocol handler you normally have. Therefore, don't use chunked transfer encoding unless you really need it. As with most performance advice, this means you shouldn't implement this optimization until measurements prove the non-streaming default is a bottleneck.

If you do happen to know the size of the request data in advance, Java 1.5 lets you optimize the connection by providing this information to the HttpURLConnection object. If you do this Java can start streaming the data over the network immediately. Otherwise, it has to cache everything you write in order to determine the content length, and only send it over the network after you've closed the stream. If you know exactly how big your data is, pass that number to the setFixedLengthStreamingMode( ) method:

public void setFixedLengthStreamingMode(int contentLength)

Java will use this number in the HTTP Content-length HTTP header field. However, if you then try to write more or less than the number of bytes given here, Java will throw an IOException . Of course, that will happen later, when you're writing data, not when you first call this method. The setFixedLengthStreamingMode( ) method itself will throw an IllegalArgumentException if you pass in a negative number, or an IllegalStateException if the connection is connected or has already been set to chunked transfer encoding. (You can't use both chunked transfer encoding and fixed-length streaming mode on the same request.)

Fixed-length streaming mode is transparent on the server side. Servers neither know nor care how the Content-length was set as long as it's correct. However, like chunked transfer encoding, streaming mode does interfere authentication and redirection. If either of these is required for a given URL, an HttpRetryException will be thrown; you have to manually retry. Therefore, don't use this mode unless you really need it.

Категории