exception_handling.md 5.72 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
The goal of this document is to give an overview of the exception handling
options in breakpad.

# Basics

Exception handling is a mechanism designed to handle the occurrence of
exceptions, special conditions that change the normal flow of program execution.

`SetUnhandledExceptionFilter` replaces all unhandled exceptions when Breakpad is
enabled. TODO: More on first and second change and vectored v. try/catch.

There are two main types of exceptions across all platforms: in-process and
out-of-process.

# In-Process

In process exception handling is relatively simple since the crashing process
handles crash reporting. It is generally considered unsafe to write a minidump
from a crashed process. For example, key data structures could be corrupted or
the stack on which the exception handler runs could have been overwritten. For
this reason all platforms also support some level of out-of-process exception
handling.

## Windows

In-process exception handling Breakpad creates a 'handler head' that waits
infinitely on a semaphore at start up. When this thread is woken it writes the
minidump and signals to the excepting thread that it may continue. A filter will
tell the OS to kill the process if the minidump is written successfully.
Otherwise it continues.

# Out-of-Process

Out-of-process exception handling is more complicated than in-process exception
handling because of the need to set up a separate process that can read the
state of the crashing process.

## Windows

Breakpad uses two abstractions around the exception handler to make things work:
`CrashGenerationServer` and `CrashGenerationClient`. The constructor for these
takes a named pipe name.

During server start up a named pipe and registers callbacks for client
connections are created. The named pipe is used for registration and all IO on
the pipe is done asynchronously. `OnPipeConnected` is called when a client
attempts to connect (call `CreateFile` on the pipe). `OnPipeConnected` does the
state machine transition from `Initial` to `Connecting` and on through
`Reading`, `Reading_Done`, `Writing`, `Writing_Done`, `Reading_ACK`, and
`Disconnecting`.

When registering callbacks, the client passes in two pointers to pointers: 1. A
pointer to the `EXCEPTION_INFO` pointer 1. A pointer to the `MDRawAssertionInfo`
which handles various non-exception failures like assertions

The essence of registration is adding a "`ClientInfo`" object that contains
handles used for synchronization with the crashing process to an array
maintained by the server. This is how we can keep track of all the clients on
the system that have registered for minidumps. These handles are: *
`server_died(mutex)` * `dump_requested(Event)` * `dump_generated(Event)`

The server registers asynchronous waits on these events with the `ClientInfo`
object as the callback context. When the `dump_requested` event is set by the
client, the `OnDumpRequested()` callback is called. The server uses the handles
inside `ClientInfo` to communicate with the child process. Once the child sets
the event, it waits for two objects: 1. the `dump_generated` event 1. the
`server_died` mutex

In the end handles are "duped" into the client process, and the clients use
`SetEvent` to request events, wait on the other event, or the `server_died`
mutex.

## Linux

### Current Status

As of July 2011, Linux had a minidump generator that is not entirely
out-of-process. The minidump was generated from a separate process, but one that
shared an address space, file descriptors, signal handles and much else with the
crashing process. It worked by using the `clone()` system call to duplicate the
crashing process, and then uses `ptrace()` and the `/proc` file system to
retrieve the information required to write the minidump. Since then Breakpad has
updated Linux exception handling to provide more benefits of out-of-process
report generation.

### Proposed Design

#### Overview

Breakpad would use a per-user daemon to write out a minidump that does not have,
interact with or depend on the crashing process. We don't want to start a new
separate process every time a user launches a Breakpad-enabled process. Doing
one daemon per machine is unacceptable for security concerns around one user
being able to initiate a minidump generation for another user's process.

#### Client/Server Communication

On Breakpad initialization in a process, the initializer would check if the
daemon is running and, if not, start it. The race condition between the check
and the initialization is not a problem because multiple daemons can check if
the IPC endpoint already exists and if a server is listening. Even if multiple
copies of the daemon try to `bind()` the filesystem to name the socket, all but
one will fail and can terminate.

This point is relevant for error handling conditions. Linux does not clean the
file system representation of a UNIX domain socket even if both endpoints
terminate, so checking for existence is not strong enough. However checking the
process list or sending a ping on the socket can handle this.

Breakpad uses UNIX domain sockets since they support full duplex communication
(unlike Windows, named pipes on Linux are half) and the kernal automatically
creates a private channel between the client and server once the client calls
`connect()`.

#### Minidump Generation

Breakpad could use the current system with `ptrace()` and `/proc` within the
daemon executable.

Overall the operations look like: 1. Signal from OS indicating crash 1. Signal
Handler suspends all threads except itself 1. Signal Handler sends
`CRASH_DUMP_REQUEST` message to server and waits for response 1. Server inspects
1. Minidump is asynchronously written to disk by the server 1. Server responds
indicating inspection is done

## Mac OSX

Out-of-process exception handling is fully supported on Mac.