v3 flag enabled
Log In
Log In

Fuzzy testing

Blog

Fuzz Testing

Introduction

What is fuzz testing? Fuzzing is a testing technique that injects random pieces of data to a software function to uncover crashes and vulnerabilities. It helps improving code security and reliability, since it can trigger edge cases that went unnoticed during unit testing.

How does it work? Fuzz testing relies on a fuzzing engine, a library that runs your code in a loop, injecting different inputs at each iteration. The fuzzing engine will instrument your code to measure coverage, and use this information to drive the generation of samples. Most of the samples will contain malformed input, and will test your code’s tolerance to ill-formed inputs.

Which kind of errors does fuzzing detect? The fuzzing engine will monitor your code for crashes. Fuzzing is often used with the address and undefined sanitizers. In short, fuzzing will make sure that your code doesn’t crash, leak or incur in undefined behavior, regardless of how malformed the input is. A lot of vulnerabilities in C++ code are related to the former kind of errors, so fuzzing can make your code more secure.

Should I use it? Fuzz testing is specially relevant for libraries that process potentially untrusted, user-controlled input, like network data. Libraries that implement parsers, decoders or network protocols usually benefit from fuzz testing.

Which Boost libraries use it? Libraries like https://www.boost.org/libs/json">Boost.Json, https://www.boost.org/libs/url">Boost.URL and https://www.boost.org/libs/mysql">Boost.Mysql use this technique - if you’re about to implement it in your library, have a look at what these libraries do.

Should I still write unit tests? Yes. Absolutely. Fuzzing does not replace unit tests, but complements them. Unit tests verify that your code produces the intended results by providing known inputs and running assertions on the outputs. In fuzz testing, inputs are generated randomly by the fuzzing engine, so no assertions are usually run on the outputs - fuzzing will only monitor for crashes and memory errors.

How can I add fuzzing to my library? We recommend using https://llvm.org/docs/LibFuzzer.html">LibFuzzer, since it’s the easiest fuzzing engine to use, and the one that other Boost libraries use. You can use other fuzzing engines if you prefer.

LibFuzzer Basics

Quoting documentation, "LibFuzzer is an in-process, coverage-guided, evolutionary fuzzing engine". LibFuzzer will run your code multiple times with different, random inputs. It will instrument your code to measure coverage, and will attempt to generate inputs that maximize it, effectively trying to discover new paths in your code.

LibFuzzer is included in clang, so you don’t need to install anything to get started.

Let’s say we want to fuzz a function that parses JSON data, like parse_json(string_view input). We will create a source file with the following code:

#include <string_view>
#include <your/parsing/function.hpp>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)
{
    // The range [data, data+size) contains the data generated by the fuzzer
    std::string_view input_data (reinterpret_cast<const char*>(data), size);
    parse_json(input_data);
    return 0;
}

We can build a fuzzer executable by adding -fsanitize=fuzzer to clang’s compile and link flags. This will automatically link LibFuzzer to your code. It’s advised to also enable the address and undefined sanitizers, which increases the range of errors detected by the fuzzer. We recommend building in release mode with debug symbols enabled, so crashes are symbolized correctly.

From the command line:

clang++ -g -O3 -fsanitize=fuzzer,address,undefined -o fuzzer fuzzer.cpp

As a Jamfile target:

exe fuzzer : fuzzer.cpp : requirements
    <debug-symbols>on
    <optimization>speed
    <address-sanitizer>norecover
    <undefined-sanitizer>norecover
    <cxxflags>-fsanitize=fuzzer
    <linkflags>-fsanitize=fuzzer
;

Or as a CMake target:

add_executable(fuzzer fuzzer.cpp)
target_compile_options(
    fuzzer
    PRIVATE
    -fsanitize=fuzzer,address,undefined
    -fno-sanitize-recover=address,undefined
    -g
    -O3
)
target_link_options(
    fuzzer
    PRIVATE
    -fsanitize=fuzzer,address,undefined
    -fno-sanitize-recover=address,undefined
)

Note that you must not define a main function - LibFuzzer will do it for you. The LLVMFuzzerTestOneInput function will be invoked repeatedly, with different input ranges.

You can run your fuzzer with no arguments, which will fuzz until you stop it with Ctrl+C. The executable will print a lot of messages to stdout. https://llvm.org/docs/LibFuzzer.html#output">This section contains a reference to what they mean, if you’re curious.

To run the fuzzer for a limited period of time (for example, 30 seconds), use:

./fuzzer -max_total_time=30

Corpus

A corpus is a collection of input samples to be used by the fuzzer. LibFuzzer uses these samples to create random mutations to use as new inputs. If a newly created sample triggers extra coverage, this sample is stored in the corpus.

Until now, we’ve been running our fuzzer without an initial corpus. The fuzzer will try random inputs, without any guidance, and will generate a corpus. Doing this is not advisable, though, since it reduces the effectiveness of your fuzzing - the fuzzer may fail to find some relevant inputs.

We always advise to provide an initial corpus (often called a seed corpus) to the fuzzer, to provide some guidance. The seed corpus should contain a variety of valid and invalid samples. You can reuse samples from your unit tests. In our JSON example, we could create a seedcorpus directory and copy all JSON files we use for unit testing.