Networking Concepts and Technology: A Designers Resource
< Day Day Up > |
In 1994, Netscape Communications proposed SSL V.1 and shipped the first products with SSL V.2. SSL V3 was introduced to address some of the limitations of SSL V2 in the area of cryptographic security limitations and functionality. Transport Layer Security (TLS) was created to allow an open standard to prevent any one company from controlling this important technology. However, it turns out that even though Netscape was granted a patent for SSL, SSL is now the defacto standard for secured Web transactions. This section provides a brief overview of the SSL protocol, and it then describes strategies for deploying SSL processing in the design of data center network architectures. SSL Protocol Overview
The basic operation of SSL includes the following phases:
A typical Web request can span many HTTP requests, requiring that each HTTP session establish an individual SSL session. The resulting higher performance impact might not outweigh the marginal incremental security benefit. Hence, a technique called SSL resumption can be exploited to save the session information for a particular client connection that has already been authenticated at least once. SSL is composed of two sublayers:
FIGURE 4-15 illustrates an overview of the SSL-condensed protocol exchanges. Figure 4-15. High-Level Condensed Protocol Overview
Once the first set of messages is successfully completed, an encrypted communication channel is established. The following sections describe the differences between using a pure software solution and an SSL accelerator appliance in terms of packet processing and throughput. We will not be discussing SSL in depth. The purpose of this section is to describe the different network architectural deployment scenarios you can apply to SSL processing. The following sections describe various approaches to scaling SSL processing capabilities from a network architecture perspective. SSL Acceleration Deployment Considerations
One of the fundamental limitations of SSL is performance. When SSL is added to a Web server, performance drops dramatically because of the strain on the CPU caused by the mathematical computations and the number of sessions that constantly need to be set up. There are three common SSL approaches:
There are several deployment options for SSL acceleration. This section describes where it makes sense to deploy different SSL acceleration options. It is important to consider certain characteristics, including:
Software-SSL Libraries Packet Flow
FIGURE 4-16 shows the packet flow for a software-based approach to SSL processing. Although the path seems direct, the SSL processing is bottlenecked by millions of CPU cycles consumed in the processing of cryptographic algorithms such RSA and 3DES. Figure 4-16. Packet Flow for Software-based Approach to SSL Processing
The Crypto Accelerator Board Packet Flow
FIGURE 4-17 shows the packet flow using a PCI accelerator card for SSL acceleration. In this case, the incoming encrypted packet reaches the SSL libraries. The SSL libraries maintain various session information and security associations, but the mathematical computations are offloaded to the PCI accelerator card, which contains an ASIC that can compute the cryptographic algorithms in very few clock cycles. However, there is a an overhead of transferring data to the card, as the PCI bus must first be arbitrated and traversed. Note that in the case of small data transfers, the overhead of PCI transfers might not outweigh the benefit of the cryptographic computation acceleration offered by the card. Further, it is important to make sure the PCI slot used is 64 bit and 66 MHz. Using a 32-bit slot could have a performance impact. Figure 4-17. PCI Accelerator Card Approach to SSL Processing Partial Offload
SSL Accelerator Appliance Packet Flow
FIGURE 4-18 illustrates how a typical SSL accelerator appliance can be exploited to reduce load on servers by offloading front-end client SSL processing. Commercial SSL accelerators at the time of this writing are all PC-based boxes with PCI accelerator cards. The operating systems and network protocol stack are optimized for SSL processing. The major benefit to the backend servers is that CPU cycles are freed up by not having to process thousands of client SSL transactions. The accelerator can either offload all SSL processing and forward cleartext to the server or terminate all client SSL connections and maintain only one SSL connection to the target server, depending on the customer's requirements. Figure 4-18. SSL Appliance Offloads Frontend Client SSL Processing
SSL Performance Tests
To gain a better understanding of the trade-offs of the three approaches to SSL acceleration, we ran various tests using the Sun Crypto Accelerator 1000 board, Netscaler 9000 SSL Accelerator appliance, and ArrayNetworks SSL accelerator appliance. Due to limited time and resources, tests were selected that enabled us to compare key attributes among approaches. In the first test, we compare raw SSL processing differences between SSL libraries and an appliance. Test 1: SSL Software Libraries versus SSL Accelerator Appliance Netscaler 9000
In this test, we looked closely at CPU utilization and network traffic with a software solution. We found a tremendous load on the CPU, which completely pinned the CPU. It took over two minutes to complete 100 SSL transactions. We then looked at CPU utilization and network traffic using an SSL appliance with the exact same server and client used in the first example. With this setup, it took under one second to complete 100 SSL transactions. The main reason the SSL appliance is so much faster is that the appliance maintains few long-lived SSL connections on the server. Hence the server is less burdened with recalculating cryptographic computations, which are CPU intensive as is setting up and tearing down SSL sessions. The appliance terminates the SSL session between the client and the appliance and then reuses the SSL connection at the backend with the servers. Figure 4-19. SSL Test Setup with No Offload
Test 1 (A) Software-SSL Libraries
We used an industry standard benchmark load generator on the client to generate SSL traffic. Both tests ran the same tests on the same 100-megabyte server file. 100 requests were injected into the SSL Web server in concurrency of 10 requests. Test 1 (B) SSL Accelerator Appliance
We used the Netscaler 9000 SSL Accelerator device on the client to generate SSL traffic. Both tests ran on the same 100-megabyte server file. The performance gains using the SSL offload device were significant. Some of the key reasons include:
We ran the benchmark load generator on client (deepak2). The client points to the VIP on the Netscaler, which terminates one side of the SSL connection. The Netscaler then reuses the backend SSL connection. This is also more secure because the client is unaware of the backend servers and hence can do less damage: #abc -n 100 -c 10 -v 4 http://129.146.138.52:443/100m.file1 >./netscaler100mfel1n100c10.softwareonly 612 packets were transferred to complete 100 SSL handshakes in less that one second! Test 2: Sun Crypto Accelerator 1000 Board
In this test set, we leveraged the work done by the Performance Availability and Engineering group regarding performance tests of the Sun Crypto Accelerator 1000 board. The test setup consisted of a Sun Fire™ 6800, using eight 900-MHz UltraSPARC™ III processors, and a single Sun Crypto Accelerator 1000 board. FIGURE 4-20 shows that the throughput increases linearly as the number of processors increases on the software approach versus the near-constant performance at 500 Mbit/sec using the Sun Crypto Accelerator 1000 board. Tests show that the ideal benefit of the accelerator board results when the minimum message size exceeds 1000 bytes. If the messages are too small, the benefit of the card acceleration does not outweigh the overhead of diverting SSL computations from the CPU to the board and back. Figure 4-20. Throughput Increases Linearly with More Processors
Test 3: SSL Software Libraries versus SSL Accelerator Appliance Array Networks
In this set of tests, we performed more detailed tests to better understand not only the value of the SSL appliance, but the impact of threads and file size. FIGURE 4-21 shows the basic test setup for the SSL software test, where an Sun Enterprise™ 450 server used as the client was sufficient to saturate the Sun Blade™ server. Figure 4-21. SSL Test Setup for SSL Software Libraries
FIGURE 4-22 shows the SSL appliance tests. Larger clients were required to saturate the servers. We used two additional Sun Fire 3800 servers in addition to the Enterprise 450 server. The reason for this was that the SSL appliance terminated the SSL connection, performed all SSL processing, and maintained very few socket connections to the backend servers, thereby reducing the load on the servers. Figure 4-22. SSL Test Setup for an SSL Accelerator Appliance
FIGURE 4-23 suggests that there is a sweet spot for the number of threads to be used for the client load generator. After a certain point, performance drops. This suggests that the SSL processing of software only approaches benefits from increased threads up to a certain maximum point. These are initial tests and not comprehensive by any means. Our intent is to show that this is one potentially important configuration consideration, which might be beyond the scope of pure design. Figure 4-23. Effect of Number of Threads on SSL Performance
FIGURE 4-24 shows the impact of file size on SSL performance. Note that these are SSL encrypted bulk files. The SSL appliance has a dramatic impact on increasing performance of SSL throughput for large files. However, the number of transactions decreases in direct proportion to the file size. The link was a 1-gigabit pipe, which can support 125 MByte/sec throughput. The results show that the limiting factor actually is not the network pipe. Figure 4-24. Effect of File Size on SSL Performance
Conclusions Drawn from the Tests
The software solution is best used in situations that require relatively few SSL transactions throughput, which is typical for sign-on and credit card Web-based transactions, where only certain aspects require SSL encryption. The PCI accelerator card dramatically increases performance at relatively low cost. The PCI card also offers true end-to-end security and is often desirable for highly secure environments. The accelerator device can be installed in an existing infrastructure and can offer very good performance. The servers do not need to be modified. Hence, only one device must be managed for SSL acceleration. Another benefit is that the appliance exploits the fact that not every server will be loaded with SSL at the same time. Hence, from a utilization standpoint, an appliance is more economically feasible. |
< Day Day Up > |