jbp.io Archive
01 July 2019

TLS performance: rustls versus OpenSSL

There are quite a few dimensions to how performance can vary between TLS libraries.

Handshake performance covers how quickly new TLS sessions can be set up. There are broadly two kinds of TLS handshake: full and resumed. Full handshake performance will be dominated by the expense of public key crypto – certificate validation, authentication and key exchange. Resumed handshakes require no or few public key operations, so are much quicker.

Bulk performance covers how quickly application data can be transferred over an already set-up session. Performance here will be dominated by symmetric crypto performance – the name of the game is for the TLS library to stay out of the way and minimise overhead in the main data path. The data rates concerned are typically many times a typical network link speed.

A TLS library will represent separate sessions in memory while they are in use. How much memory these sessions use will dictate how many sessions can be concurrently terminated on a given server.

This series of blog posts measures and compares the performance of rustls (a TLS library in rust) and OpenSSL.

Reproducibility

We’ll measure current master for rustls (6a47cd5c) and OpenSSL (fdbb3a86).

OpenSSL was built from source with default options, using gcc 8.3.0. rustls was built from source using rustc 1.35.0.

All measurements below were obtained on the same i5-6500 Skylake at 3.2GHz, on Debian linux. CPU scaling was turned off.

All benchmarking tools take an environment variable BENCH_MULTIPLIER which extends each test’s duration by some multiple. In results below, this is set to BENCH_MULTIPLIER=16, meaning each test takes significant time in an effort to make the effect of cold CPU and page caches irrelevent.

rustls

The code used is in kept alongside rustls. Build and run with:

$ cargo test --no-run --example bench --release
$ BENCH_MULTIPLIER=16 make -f admin/bench-measure.mk measure
$ make -f admin/bench-measure.mk memory

OpenSSL

All the code used is in ctz/openssl-bench as of the linked commit. It expects to find a built OpenSSL tree in ../openssl/. Then just run:

$ BENCH_MULTIPLIER=16 make measure
$ make memory

Results

Performance results are presented in other blog posts as follows:

See those posts for details and analysis. To summarise the results, though, we can say approximately: