TISC Insight, Volume 3, Issue 9

Welcome to Volume 3, Issue 9 of The Internet Security Conference Newsletter, Insight. Insight provides commentaries and educational columns, authored by some of the best minds in the security community.

TISC is about sharing clue. So is the newsletter. We promise to provide something useful each issue. If we don't, flame me.

Enjoy, and be safe,

Dave


Today's Column

If you spend any time in a conversation about VPNs, the word performance eventually becomes a central topic for discussion. Crypto costs. IPsec's slow. Any serious effort to scale an ecommerce site involves SSL acceleration. Ever ask "Where and how does crypto cost?" It's the software. It's the (IPsec) protocol. Is all the hoopla put to rest when you employ encryption chips? Maybe. Maybe not. In today's Issue, Ray Savarda and Matt Karash examine the manifold issues surrounding performance and cryptography as it relates to VPNs.

Happy Reading!


Explaining the Gap between Specification and
Actual Performance for IPsec VPN Systems

Ray Savarda & Matt Karash

Introduction

As is common with most network equipment, real-world performance for virtual private networking (VPN) systems often falls significantly short of specification or best case performance claims. Encryption associated with the VPN protocols like IPsec exacts a considerable performance toll when performed in software, and the trend is to remove crypto processing to hardware for security gateways that must process at wire speeds in the hundreds of megabits per second ranges. Today, a typical high-end IPsec VPN system may use a single off-the-shelf encryption chip that claims 200-300 Mbps encryption. These VPN systems generally claim performance of 100-300 Mbps, but in real-world operation they often provide performance of less than 100 Mbps. Part of the 150-200 Mbps performance "lost" is explained by system design issues. Such design issues include the inability of IPsec software to fully use the potential of the encryption chips. Even with optimal use of encryption chips, unavoidable limitations inherent in the IPsec protocol and commonly employed bus technologies degrade real-world performance significantly below specification (best case) performance.

Three elements account for most of the performance degradation in VPN systems today:

  1. Cryptographic algorithm overhead related to padding.
  2. IPsec packet formatting overhead related to the various IPsec modes.
  3. PCI bus limitations.

Cryptographic Algorithm Overhead: Padding

Cryptographic algorithm overhead is created by padding that must be added to packets for encryption and authentication algorithms before processing. Each of the common encryption/decryption (DES, 3DES, AES) and hashing (SHA-1, MD5) algorithms used for IPsec is a block-based algorithm that operates on specific size blocks of data (Figure 1).

Figure 1: Block Sizes for Common IPsec Algorithms

Algorithm
Block Size (in bits)
DES, 3DES
64
AES
128
SHA-1
512
MD5
512

When data including minimum padding are not divisible by these block sizes, padding must be added to reach the desired block size prior to algorithmic processing. For example, SHA-1 and MD5 require 512-bit blocks (64 bytes). When considering the implied 64-bit length field, the real limit is 448 bits. If a packet came in at 456 bits (57 bytes), 504 bits (63 bytes) of padding would be added to "right size" the data to the appropriate block size. For randomly sized packets, padding as a percent of throughput increases as packet size decreases. In extreme situations where a steady stream of non-optimal size data is being encrypted, padding can nearly double packet size and therefore decrease performance by nearly 50 percent (Figure 2). (See also detailed graphs of performance by packet size at NetOctave web site.) Only in cases where all data are evenly divisible by the required block size is algorithmic overhead zero.

Figure 2: Worst Case Padding Overhead by Packet Size

Algorithm
Block Size
(in bits)
Small Packets
(40 bytes)
Average Packets
(350 bytes)
Large Packets
(1500 bytes)
Packet Sizes All Divisible by Block Size
DES, 3DES
64
15%
2%
0.5%
0%
AES
128
23%
4%
1%
0%
SHA-1
512
49%
14%
4%
0%
MD5
512
49%
14%
4%
0%

* Calculations assume worst case packet sizes requiring maximum padding for all packets.
Percentages calculated based on smallest packet size that is greater than size in column header
and divisible by required block size.

Cryptographic Algorithm Overhead-HMAC

In IPSEC data packet handling, SHA-1 and MD5 are used in a Hashed Message Authentication Code (HMAC) algorithm to derive keyed one-way hashes of large byte strings to ensure that packets are not tampered with during transit. HMAC processing adds overhead beyond that already incurred by padding requirements, as follows. The HMAC effectively involves applying the SHA-1 or MD5 hash three times for any given packet, a result of two combined elements of the HMAC specification. The first element is the requirement that instead of processing the given byte stream directly, it must first be padded by adding a single '1' bit followed by enough zeros to make the overall bit size 64 bits less than a multiple of 512 bits (64 bytes). Then a 64-bit representation of the overall message length is appended to the message. The second element is that to add keying to the SHA/MD5 function the algorithm adds two additional short hash operations to the computations required. The result is that almost three separate hashes must be applied to small packets when authentication is accomplished with HMAC. So for IPsec VPN hardware that performs SHA-1 or MD5 at a certain rate, use of the HMAC version of these algorithms drops the effective throughput of such a system to as little as one-third for small packets.

IPsec Mode Overhead

In addition to cryptographic algorithm overhead, IPsec incurs significant overhead caused by the addition of headers and message authentication bytes. The IPsec protocol requires that IPsec headers be added on top of the IP header. This overhead can be termed network protocol overhead because it is actually overhead related to packet "bloat" as IPsec packets leave the VPN system and reenter the network. The overhead varies across the four common IPsec "modes." These modes are tunnel mode Encapsulated Security Protocol (ESP), transport mode ESP, tunnel mode Authentication Header (AH) and transport mode AH.

AH provides robust authentication of IP packets without confidentiality (encryption). ESP provides both confidentiality and authentication. Transport mode is designed for host-to-host communication; tunnel mode can operate a) host to host, b) gateway to gateway, c) or gateway to host. Tunnel mode adds a new 20-byte IP header in front of the transported IP packet. ESP adds an additional 8-byte ESP header, a 0 to 16-byte Initialization Vector (IV), and a 16-byte ESP Trailer. AH adds a 24-byte AH header. The result is significant overhead related to the various modes (Figure 3).

Figure 3: Header/Trailer Overhead by IPsec Mode

   
AH
ESP
 
IP Tunnel Header
24 bytes
24-40 bytes
Tunnel
20 bytes
44 bytes
44-60 bytes
Transport
0 bytes
24 bytes
24-40 bytes

Especially for small packets, this overhead can cause a significant reduction in throughput over a fixed size communication link (Figure 4).

Figure 4: Worst Case Overhead by Packet Size for Various IPsec Modes

Packet Size
Transport Mode ESP 3DES/SHA-1
Transport Mode ESP AES/SHA-1
Transport Mode AH SHA-1
Tunnel Mode ESP 3DES/SHA-1
Tunnel Mode ESP AES/SHA-1
Tunnel Mode AH SHA-1
46
61%
70%
43%
104%
113%
87%
512
5%
6%
4%
9%
10%
8%
1500
2%
2%
1%
3%
3%
3%

PCI Bus Overhead

Another source of throughput bottlenecks in real-world systems is the bus architecture used to connect the system processor and crypto acceleration elements. The PCI bus is commonly used to connect security co-processors to other integrated circuits. At 64 bits and 66 MHz, it has a theoretical maximum throughput of 4 Gbps. This potential is reduced in several ways in real-world systems. First, since the bus is a single shared resource, the packets to be transformed in IPsec must travel over the PCI bus twice, once to get from the host processor to the crypto accelerator and once back again to the control processor. This immediately cuts bus throughput in half. An additional problem is the latency incurred in acquiring the bus to make the packet transfers. At 64 bits and 66 MHz, assuming the most efficient burst transfers, a series of small packets (say 120 bytes) will take 272 nanoseconds (ns) to transfer one-way. However, acquiring the bus for each or those transfers can take several microseconds (us). So in this example the total transfer time is actually 3us + 272ns =3.272 us (3 us is estimate of time to access the PCI bus). To move the packet back to the host processor from the crypto accelerator takes another 3.272us. Thus the effective throughput is actually only 120 bytes every 6.544us, or 147 Mbps. This is a far cry from the 4 Gbps potential.

Summary

A number of factors limit the accuracy of performance specifications for IPsec VPN systems and components. In many cases, the performance impact will be negligible, but it does have the potential to be substantial. Specification performance numbers for IPsec VPN equipment and components should be used only as an initial guide and as a gauge of relative performance among systems. They should not be considered a final answer. Rather, customers must understand the characteristics of the data that they are securing and the flavor of IPsec that they will be using to secure the data. With this information, IPsec VPN customers can better ask intelligent questions of IPsec VPN system and component sellers to understand performance for their specific applications and make purchase decisions based on actual performance for their application.


© 2001 Core Competence & Mactivity, Inc.