Fuzzing for developers

Stockholm C++ 20190314

Paul Dreik

About me

What is fuzzing?

Wikipedia:
...providing invalid, unexpected, or random data...
typewriter monkey

Why bother to fuzz?

  • finds corner cases
  • finds stuff you couldn't think of
  • automatic
  • someone else will

Google fuzz track record

ClusterFuzz has found more than 16,000 bugs in Chrome and more than 11,000 bugs in over 160 open source projects integrated with OSS-Fuzz.
source: google security blog

Curl

6 out of 12 security issues were found by fuzzing (2018)
source: daniel.haxx.se blog entry

Today's talk

  • overview
  • demonstration
  • tips and tricks

Goal

You should be able to start fuzzing *your* code!

What do we want?

Find input that causes
  • crash
  • assert() to trigger
  • undefined behaviour
  • use after free
  • memory leaks
  • etc. etc.

The simplest fuzzer

basic fuzzing
  • Generational fuzzing
  • works!
  • inefficient
  • How do you choose N?

Generation based fuzzing - improved

  • use knowledge of the input format
  • example: csmith
  • nice, but still never better than the tool!

Improvement: add corpus

basic fuzzing with corpus
  1. supply set of interesting files
  2. corrupt them

Mutational fuzzing

Mutational fuzzing

basic fuzzing with corpus
  • much better!
  • no source needed!
  • radamsa
  • has found lots of bugs

Use the source, Luke!

Improve crash probability

  • sprinkle with assert()
  • c++20 contracts?
  • hardening flags
  • sanitizers

Use a sensitive memory allocator

guard malloc

electric fence

libdislocator

So far

  • Execution reaches far into the program
  • More likely to crash

FEEDBACK

The biggest improvement!

  • Crash/no crash is too blunt
  • Monitor execution path
  • Feedback interesting input to the corpus

Feedback

feedback
  • Monitor
  • Feedback

Monitoring techniques

  • debugger
  • hardware support
  • virtual machine
  • compiler instrumentation

State of the art fuzzers

Both use feedback, compiler instrumentation

Fuzzer comparison

AFL LibFuzzer
Status stable stable and improving
Performance "slow" fast
UI great ok

Disclaimer: this is *my opinion* :-)

DEMO

bdecode, by Arvid Norberg

parses a tree from a serialized format

From a bdecode unit test:

char b[] = "i12453e";
bdecode_node e;
error_code ec;
int ret = bdecode(b, b + sizeof(b)-1, e, ec);

Install the tools

$apt install afl g++ afl-clang clang++-6.0
        

entrypoint_afl_plain.cpp

#include "bdecode.hpp"
#include <array>
#include <cassert>
#include <fstream>

int
main(int argc, char* argv[]) {
  std::ifstream ifs(argv[1]);
  std::array<char, 8192> buf;
  ifs.read(buf.data(), buf.size());
  auto nbytes = ifs.gcount();
  bdecode_node e;
  error_code ec;
  int ret = bdecode(buf.begin(),buf.begin() + nbytes, e, ec);
}

Compile

afl-g++ entrypoint_afl_plain.cpp \
bdecode.cpp -lboost_system -o afl_plain

Initial corpus

mkdir -p corpus
echo "" >corpus/empty

Note: putting more effort into this can pay off well

Output directory

mkdir -p output-afl-plain

Start fuzzing!

afl-fuzz -i corpus/ -o output-afl-plain/ -- ./afl_plain @@

(switch to terminal)

Decoding the status screen

http://lcamtuf.coredump.cx/afl/status_screen.txt

LibFuzzer

  • part of llvm
  • in-process
  • very fast
  • works very well with sanitizers
  • no cool user interface

entrypoint_libfuzzer.cpp

#include "bdecode.hpp"

extern "C" int
LLVMFuzzerTestOneInput(const uint8_t* Data, size_t Size)
{
  bdecode_node e;
  error_code ec;
  auto b = reinterpret_cast<const char*>(Data);
  int ret = bdecode(b, b + Size, e, ec);

  return 0;
}

Compile

clang++-6.0 -std=c++14 -g -O3  -fsanitize=fuzzer \
entrypoint_libfuzzer.cpp bdecode.cpp -lboost_system -o libfuzzer_plain

Run

mkdir -p output-libfuzzer-plain

./libfuzzer_plain output-libfuzzer-plain/

(switch to terminal)

make run-libfuzzer-plain

Decoding the output

#1450971 REDUCE cov: 96 ft: 295 corp: 112/5838b \
exec/s: 483657 rss: 33Mb L: 293/1072 MS: 1 EraseBytes-
https://llvm.org/docs/LibFuzzer.html#output

reproducer

Minimal program to replay an input

For running in a debugger, valgrind

There are caveats!

Sanitizer+reproducer demo

speed

  • Expect 1000 executions a second.
  • Don't use exceptions if you can

Fast+Slow

  • Fast: libfuzzer, multicore, -DNDEBUG=, no sanitizers
  • grows your corpus fast
  • Slow: sanitizers+hardening+asserts
  • finds the errors

Tips and tricks

  • use several fuzzers
  • don't mix unrelated things

Resources

foxglove (tutorial)
fuzzing-project.org by Hanno Böck
erlend oftedal, NDC tech town (using AFL)
Craig Young, using AFL
Kostya Serebryany 2015 and 2017
reddit (research often posted here)

Thank you!

https://www.pauldreik.se/