JavaScript by Example (2nd Edition)

2017-07-07 02:10:07

C.2.1 The HTTP Server

On the Internet, communication is also handled by a TCP/IP connection. The Web is based on this model. The server side responds to client (browser) requests and provides feedback by sending back a document, by executing a CGI program, or by issuing an error message. The network protocol that is used by the Web so that the server and client know how to talk to each other is the Hypertext Transport Protocol, or HTTP. This does not preclude the TCP/IP protocol being implemented. HTTP objects are mapped onto the transport data units, a process that is beyond the scope of this discussion; it is a simple, straightforward process that is unnoticed by the typical Web user . (See www.cis.ohio-state.edu/cgi-bin/rfc/rfc2068.html for a technical description of HTTP.) The HTTP protocol was built for the Web to handle hypermedia information; it is object-oriented and stateless. In object-oriented terminology, the documents and files are called objects and the operations that are associated with the HTTP protocol are called methods . When a protocol is stateless, neither the client nor the server stores information about each other, but manages its own state information.

Once a TCP/IP connection is established between the Web server and client, the client will request some service from the server. Web servers are normally located at well-known TCP port 80. The client tells the server what type of data it can handle by sending Accept statements with its requests. For example, one client may accept only HTML text, whereas another client might accept sounds and images as well as text. The server will try to handle the request (requests and responses are in ASCII text) and send back whatever information it can to the client (browser).

Example C.1

( Client's (Browser) Request ) GET /pub HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.0 Gold Host: severname.com Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,*/*

Example C.2

( Server's Response ) HTTP/1.1 200 OK Server: Apache/1.2b8 Date: Mon, 22 Jan 2001 13:43:22 GMT Last-modified: Mon, 01 Dec 2000 12:15:33 Content-length: 288 Accept-Ranges: bytes Connection: close Content-type: text/html <HTML><HEAD><TITLE>Hello World!</TITLE> ---continue with body--- </HTML> Connection closed by foreign host.

The response confirms what HTTP version was used, the status code describing the results of the server's attempt (did it succeed or fail?), a header, and data. The header part of the message indicates whether the request is okay, what type of data is being returned (for example, the content type may be html/text ), and how many bytes are being sent. The data part contains the actual text being sent.

The user then sees a formatted page on the screen, which may contain highlighted hyperlinks to some other page. Regardless of whether the user clicks on a hyperlink, once the document is displayed, that transaction is completed and the TCP/IP connection will be closed. Once closed, a new connection will be started if there is another request. What happened in the last transaction is of no interest to either client or server; in other words, the protocol is stateless.

HTTP is also used to communicate between browsers, proxies, and gateways to other Internet systems supported by FTP, Gopher, WAIS, and NNTP protocols.

C.2.2 HTTP Status Codes and the Log Files

When the server responds to the client, it sends information that includes the way it handled the request. Most Web browsers handle these codes silently if they fall in the range between 100 and 300. The codes within the 100 range are informational, indicating that the server's request is being processed . The most common status code is 200, indicating success, which means the information requested was accepted and fulfilled.

Check your server's access log to see what status codes were sent by your server after a transaction was completed. ^[1] The following example consists of excerpts taken from the Apache server's access log. This log reports information about a request handled by the server and the status code generated as a result of the request. The error log contains any standard error messages that the program would ordinarily send to the screen, such as syntax or compiler errors.

^[1] For more detailed information on status codes, see www.w3.org/Protocols/HTTP/HTRESP.html

Table C.1. HTTP status codes

Status	Code Message
100	Continue
200	Success, OK
204	No Content
301	Document Moved
400	Bad Request
401	Unauthorized
403	Forbidden
404	Not Found
500	Internal Server Error
501	Not Implemented
503	Service Unavailable

Example C.3

(From Apache's Access log) 1 susan - - [06/Jul/1997:14:32:23 -0700] "GET /cgi-bin/hello.cgi HTTP/1.0" 500 633 2 susan - - [16/Jun/1997:11:27:32 -0700] "GET /cgi-bin/hello.cgi HTTP/1.0" 200 1325 3 susan - - [07/Jul/1997:09:03:20 -0700] "GET /htdocs/index.html HTTP/1.0" 404 170

EXPLANATION

The server hostname is susan , followed by two dashes indicating unknown values, such as user ID and password. The time the request was logged, the type of request is GET (see "The GET Method" on page 632), and the file accessed was hello.cgi . The protocol is HTTP/1.0. The status code sent by the server was 500, Internal Server Error, meaning that there was some internal error, such as a syntax error in the program, hello.cgi . The browser's request was not fulfilled. The number of bytes sent was 633.

Status code 200 indicates success! The request was fullfilled.

Status code 404, Not Found , means that the server found nothing matching the URL requested.

C.2.3 The URL (Uniform Resource Locator)

URLs are what you use to get around on the Web. You click on a hotlink and you are transported to some new page, or you type a URL in the browser's Location box and a file opens up or a script runs. It is a virtual address that specifies the location of pages, objects, scripts, etc. It refers to an existing protocol such as HTTP, Gopher, FTP, mailto, file, Telnet, or news (see Table C.2). A typical URL for the popular Web HTTP protocol looks like this:

http://www.comp.com/dir/text.html

Table C.2. Web protocols.

Protocol	Function	Example
http:	Hyper Text Transfer Protocol	http://www.nnic.noaa.gov/cgi-bin/netcast.cgi open Web page or start CGI script
ftp:	File Transfer Protocol	ftp://jague.gsfc.nasa.gov/pub
mailto:	Mail protocol by e-mail address	mailto:debbiej@aol.com
file:	Open a local file	file://opt/apache/htdocs/file.html
telnet:	Open a Telnet session	telnet://nickym@netcom.com
news:	Opens a news session by news server	news:alt.fan.john-lennon Name or Address

The two basic pieces of information provided in the URL are the protocol http and the data needed by the protocol, www.comp.com/dir/files/text.html . The parts of the URL are further defined in Table C.3.

Table C.3. Parts of a URL.

Part	Description
protocol	Service such as HTTP, Gopher, FTP, Telnet, news, etc.
host/IP number	DNS host name or its IP number
port	TCP port number used by server, normally port 80
path	Path and filename reference for the object on a server
parameters	Specific parameters used by the object on a server
query	The query string for a CGI script
fragment	Reference to subset of the object

The default HTTP network port is 80; if an HTTP server resides on a different network port, say 12345 on www.comp.com , then the URL becomes

http://www.comp.com.12345/dir/text.html

Not all parts of a URL are necessary. If you are searching for a document in the Locator box in the Netscape browser, the URL may not need the port number, parameters, query, or fragment parts. If the URL is part of a hotlink in the HTML document, it may contain a relative path to the next document, that is, relative to the root directory of the server. If the user has filled in a form, the URL line may contain information appended to a question mark in the URL line. The appearance of the URL really depends on what protocol you are using and what operation you are trying to accomplish.

Example C.4

1 http://www.cis.ohio-state.edu/htbin/rfc2068.html 2 http://127.0.0.1/Sample.html 3 ftp://oak.oakland.edu/pub/ 4 file://opt/apache_1.2b8/htdocs/index.html 5 http://susan/cgi-bin/form.cgi?string=hello+there

EXPLANATION

The protocol is http .

The hostname www.cis.ohio-state.edu/htbin/rfc2068.html consists of the following parts: ^[a]

^[a] Most Web severs run on hostnames starting with www, but this is only a convention.

The hostname translated to an IP address by the Domain Name Service, DNS.

The domain name is ohio-state.edu .

The top-level domain name is edu.

The directory where the HTML file is stored is htbin.

The file to be retrieved is rfc20868.html, an HTML document.

The protocol is http .

The IP address is used instead of the hostname; this is the IP address for a local host.

The file is in the server's document root. The file consists of HTML text.

The protocol is ftp .

The host oak.oakland .

The top-level domain is edu .

The directory is pub .

The protocol is file. A local file will be opened.

The hostname is missing. It then refers to the local host.

The full path to the file index.html is listed.

The information after the question mark is the query part of the URL, which may have resulted from submitting input into a form. The query string is URL encoded. In this example, a plus sign has replaced the space between hello and there . The server stores this query in an environment variable called QUERY_STRING . It will be passed on to a CGI program called from the HTML document. (See "The GET Method" on page 632.)

File URLs and the Server's Root Directory

If the protocol used in the URL is file , the server assumes that file is on the local machine. A full pathname followed by a filename is included in the URL. When the protocol is followed by a server name, all pathnames are relative to the document root of the server. The document root is the directory defined in the server's configuration file as the main directory for your Web server. The leading slash that precedes the path is not really part of the path as with the UNIX absolute path, which starts at the root directory. Rather, the leading slash is used to separate the path from the hostname. An example of a URL leading to documents in the server's root directory:

http://www.myserver/index.html

The full UNIX pathname for this might be

/usr/bin/myserver/htdocs/index.html

A shorthand method for linking to a document on the same server is called a partial or relative URL. For example, if a document at http://www.myserver/stories/webjoke.html contains a link to images/webjoke.gif , this is a relative URL. The browser will expand the relative URL to its absolute URL, http://www.myserver/stories/images/webjoke.gif , and make a request for that document if asked.