Scalable Internet Architectures
There is another method to find the closest server on the network that copes well with the nature of the Internet by leveraging the fundamentals of efficient network routing (that is, delivering packets to an IP address over the shortest path). By giving two different servers on the Internet the same IP address and requesting the networks to which we are attached to announce the routes to those IP addresses (as is normally done), we employ the technique now called Anycast. The tricky part about using Anycast is that, at any moment in time, routes can change. This means that the next packet sent to that IP address might very well find its way to a different host. What does this mean in terms of IP? Given that all DNS traffic and web serving happens over IP, this is an important question. If host image-2-1.example.com in Germany and image-3-1.example.com in Japan share the same IP address, the following scenario is possible. A client attempts to establish a TCP connection to images.example.com. The client first resolves the name to an IP address to which the client sends a SYN packet (sent as the first step in establishing a TCP connection) that finds its way to image-2-1.example.com (in Germany). The ACK packet is sent back to the client, and the client then sends back the first data packet containing the http request. All is well until now. Then a route flaps somewhere on the Internet, and the closest path from that client to the destination IP address now delivers packets to image-3-1.example.com (in Japan). image-2-1 returns a data packet to the client, and it gets there because the shortest path from image-2-1 to the client does not lead the packet astray (there is only one machine on the Internet with that client's IP address). However, when the client responds to that packet, it goes to Japan (the new shortest path back to the server IP). This is where things go sour. When the packet arrives at image-3-1, it is part of a preexisting TCP session with image-2-1 of which image-3-1 knows nothing. The only reasonable response to this is to send a TCP RST packet, which aborts the TCP session, and that's no good at all. So, what good is Anycast? Well, we've demonstrated the shortcomings with respect to TCP. But these shortcomings hinge on the connectedness of that transport protocol. UDP, on the other hand, is a connectionless protocol. Services such as DNS typically only require a single request and response UDP packet to accomplish a task. So, where does this leave us? We know that each node in each image cluster needs a unique IP address to avoid the problem described previously. If we place DNS servers next to each image cluster and all DNS servers share the same IP address (via Anycast) and each DNS server offers the IP addresses of the image cluster nodes nearest to it, we achieve our proximity objectives. Figure 6.10 shows our globally distributed system based on Anycast. Figure 6.10. Geographically separate image serving clusters with ideal access patterns.
Anycast ensures that a client's DNS requests will be answered by the DNS server closest to him on the network. Because the DNS server handed back the IP addresses associated with the image server to which it is adjacent, we know that (as the DNS request traversed the Internet) this image cluster was the closest to the client. That's pretty powerful mojo. |
Категории