a tale of two stacks: optimizing gin’s panic recovery handler

A programming article by Efron Licht

more articles

more articles

gin’s panic handler

The popular go web framework Gin has a middleware that allows you to recover from and log panics while serving HTTP.

Here’s an example program that will panic whenever we hit the route GET /panic

package main

import (
   "fmt"
   "net/http"

   "github.com/gin-gonic/gin"
)

func main() {
   engine := gin.New()
   engine.Use(gin.Recovery())
   engine.GET("/panic", func(c *gin.Context) {fmt.Fprintf(c.Writer, "%s", f())})
   http.ListenAndServe(":8080", engine)
}
func f() string {
    panic("this function panics!")
}

We run the program:

go run main.go

and in another shell provoke the panic:

curl http://localhost:8080

Getting this nicely formatted result:P

2023/01/17 13:45:01 [Recovery] 2023/01/17 - 13:45:01 panic recovered:
GET /panic HTTP/1.1
Host: localhost:8080
Accept: */*
User-Agent: curl/7.81.0


this function panics!
/home/efron/go/src/gitlab.com/efronlicht/gin-ex/main.go:19 (0x7200a6)
        f: panic("this function panics!")
/home/efron/go/src/gitlab.com/efronlicht/gin-ex/main.go:14 (0x720094)
        main.func1: fmt.Fprintf(c.Writer, "%s", f())
/home/efron/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173 (0x71a601)
        (*Context).Next: c.handlers[c.index](c)
/home/efron/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/recovery.go:101 (0x71a5ec)
        CustomRecoveryWithWriter.func1: c.Next()
/home/efron/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173 (0x719470)
        (*Context).Next: c.handlers[c.index](c)
/home/efron/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:616 (0x7190d8)
        (*Engine).handleHTTPRequest: c.Next()
/home/efron/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:572 (0x718d9c)
        (*Engine).ServeHTTP: engine.handleHTTPRequest(c)
/usr/local/go/src/net/http/server.go:2947 (0x620a6b)
        serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
/usr/local/go/src/net/http/server.go:1991 (0x61d366)
        (*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
/usr/local/go/src/runtime/asm_amd64.s:1594 (0x4672e0)
        goexit: BYTE    $0x90   // NOP

As well as the usual elements of a stack trace: file, line, program counter, function name; Gin’s recovery handler somehow gives you the actual line of the source code where that frame of the stack trace happened.

For programmers more used to dynamic languages, this seems pretty natural: you need the source code to run the file. But Go is a statically compiled language that doesn’t carry around it’s source code.

I had two thoughts when I saw this:

Let’s take a look into the implementation to find out how Gin does it, and see if we can do it better. When we’re done, let’s run some benchmarks to compare the solutions, and then talk about if this source-code magic is a good idea in the first place.

Investigating Gin

Luckily, the stack trace itself gives us nearly all the clues we need:

`/home/efron/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/recovery.go:101 (0x71a5ec)

    CustomRecoveryWithWriter.func1: c.Next()`

which contains a function stack(), which is simple and clearly documented. From now on, I’m going to call it slowstack(), and we’ll call our eventual optimized version faststack(). Starting at skip, we walk frame-by-frame up the stack.

for each frame, we use runtime.Caller to get the stack information for each one, using the file name to look up up a file with the same name on the host system.

We read the entire file into memory, caching the most-recent file, so multiple calls to the same function don’t have to re-read the file, and print out the annotated stack frame.

// stack returns a nicely formatted stack frame, skipping skip frames.
func stack(skip int) []byte { // from now on, called `slowstack()`
 // As we loop, we open files and read them. These variables record the currently
 // loaded file.
 var lines [][]byte
 var lastFile string
 for i := skip; ; i++ { // Skip the expected number of frames
  pc, file, line, ok := runtime.Caller(i)
  if !ok {
   break
  }
  // Print this much at least.  If we can't find the source, it won't show.
  fmt.Fprintf(buf, "%s:%d (0x%x)\n", file, line, pc)
  if file != lastFile {
   data, err := os.ReadFile(file)
   if err != nil {
    continue
   }
   lines = bytes.Split(data, []byte{'\n'})
   lastFile = file
  }
  fmt.Fprintf(buf, "\t%s: %s\n", function(pc), source(lines, line))
 }
 return buf.Bytes()
}

func source(lines [][]byte, n int) {
 n-- // in stack trace, lines are 1-indexed but our array is 0-indexed
 if n < 0 || n >= len(lines) {
  return dunno
 }
 return bytes.TrimSpace(lines[n])
}
// function returns, if possible, the name of the function containing the PC.
func function(pc uintptr) []byte{ // body omitted
}

What are the performance limitations of slowstack?

unnecessary work in the runtime package

runtime.Caller() is a specalized invocation of runtime.CallersFrames for a single function: it ascends the callstack to skip using runtime magic, builds a runtime.Frame struct, and gives you a couple of that struct’s fields:

Let’s look at the source:

type Frame struct {
 // PC is the program counter for the location in this frame.
 // For a frame that calls another frame, this will be the
 // program counter of a call instruction. Because of inlining,
 // multiple frames may have the same PC value, but different
 // symbolic information.
 PC uintptr

 // Func is the Func value of this call frame. This may be nil
 // for non-Go code or fully inlined functions.
 Func *Func

 // Function is the package path-qualified function name of
 // this call frame. If non-empty, this string uniquely
 // identifies a single function in the program.
 // This may be the empty string if not known.
 // If Func is not nil then Function == Func.Name().
 Function string

 // File and Line are the file name and line number of the
 // location in this frame. For non-leaf frames, this will be
 // the location of a call. These may be the empty string and
 // zero, respectively, if not known.
 File string
 Line int

 // Entry point program counter for the function; may be zero
 // if not known. If Func is not nil then Entry ==
 // Func.Entry().
 Entry uintptr
 // contains filtered or unexported fields
}
// package runtime
func Caller(skip int) (pc uintptr, file string, line int, ok bool) {
 rpc := make([]uintptr, 1)
 n := callers(skip+1, rpc[:])
 if n < 1 {
  return
 }
 frame, _ := CallersFrames(rpc).Next() // a runtime.Frame
 return frame.PC, frame.File, frame.Line, frame.PC != 0
}

slowstack then calls function(pc) on the retuned program counter to lookup the *runtime.Func, then get it’s Name.

We could could simply grab the (runtime.Frame).Function field directly. Better yet, we can call runtime.Callers() and runtime.CallersFrames() exactly once and ascend the callstack by repeatedly calling frames.Next(), which avoids having to repeatedly jump up and down the stack.

// stack returns a nicely formatted stack frame, skipping skip frames.
func stack_01(skip int) []byte {
    // grab the stack frame
    pc := make([]uintptr, 64)
    n := runtime.Callers(skip, pc)
    if n == 0 { // no callers: e.g, skip > len(callstack).
        return nil
    }
    pc = pc[:n] // pass only valid pcs to runtime.Caller
    buf := new(bytes.Buffer)
    frames := runtime.CallersFrames(pc)
    for {
        frame, more := frames.Next()
        fmt.Fprintf(buf, "%s:%d (0x%x)\n", frame.File, frame.Line, frame.PC)
    if file != lastFile {
        data, err := os.ReadFile(file)
        if err != nil {
            continue
        }
        lines = bytes.Split(data, []byte{'\n'})
        lastFile = file
         fmt.Fprintf(buf, "\t%s: %s\n", frame.Function, source(lines, line))
    }
    if !more {
        return buf.Bytes()
    }
    }
    panic("unreachable!)
}

This approach has one limitation: we have to specify a max stack size to pass to runtime.Callers. I think this is a good thing, since it places bounds on the resources this function can use, but it is a limitation. You could mitigate this by making the max depth a parameter of the function and configuring it at runtime, e.g, via an environment variable, but 64 seems like a good number to me.

always reads a whole file instead of a single line

Even with an incredibly simple handler, like ours, slowstack reads 7 files into memory, totalling 276KiB.

The longest lines of a .go or .s file are roughly 200 bytes, so this is using roughly 1300x the bytes it needs to.

reading a line at a time

We can read one line at a time instead by using bufio.Scanner. We’ll need a new scanner per file, but we can share a buffer between them. This has two benefits: first, we allocate less memory. Secondly, we can stop reading a file once we’ve hit the appropriate line without having to read to the end.

func stack_02(skip int) []byte {
    // grab the stack frame
    pc := make([]uintptr, 64)
    n := runtime.Callers(skip, pc)
    if n == 0 { // no callers: e.g, skip > len(callstack).
        return nil
    }
    pc = pc[:n] // pass only valid pcs to runtime.Caller
    buf := new(bytes.Buffer)
    frames := runtime.CallersFrames(pc)
    scanBuf := make([]byte, 0, 256)
    FRAME:
    for {
        frame, more := frames.Next()
        fmt.Fprintf(buf, "%s:%d (0x%x)\n", file, line, pc)
        f, err := os.Open(file)
        if err != nil {
            continue
        }
        f, err := os.Open(frame.File)
        if err != nil {
            continue FRAME
        }
        scanner := bufio.NewScanner(f)
        scanner.Buffer(scanBuf[:0], bufio.MaxScanTokenSize)
        var source []byte
        for i := 0; scanner.Scan() && i < Line; i++ {
            if i == line {
                fmt.Fprintf(buf, "\t%s: %s\n", function(pc),  bytes.TrimSpace(scanner.Bytes())
                f.Close()
                continue FRAME
            }
        }
        // hit EOF early:
        fmt.Fprintf(buf, "\t%s: %s\n", function(pc),  "???")
        f.Close()
    }
    return buf.Bytes()
}

Two caveats: while stack_02 uses less memory per file than stack_01 or slowstack we still have to open the same file each time it appears on the callstack. slowstack had an optimization for a common case: if the same file appeared twice in a row on the callstack, it would re-use the memory it had previously read.

Second, stack_02 has a hidden if unlikely bug: a bufio.Scanner can panic if it hits too many empty tokens in a row (in this case, dozens of newlines). In our next iteration, we’ll wrap this in a recover() to protect ourselves from weird sourcefiles: we don’t want to panic during a panic recovery handler! We’ll handle this in our final solution.

reading each file exactly once

While our solution uses less memory per file, we still have to open the same file each time it appears on the callstack. Pretty bad if we hit a recursive function:


func FibStack(n int) int {
    if n < 0 {
        panic("expected n >= 0")
    }
    if n == 0 || n == 1 {
        fmt.Println(string(stack(0)))
        return 1
    } else {
        return FibStack(n-1) + FibStack(n-2)
    }
    default:
        return
    }

But slowstack’s solution was no prize, either: callstacks that rapidly bounce between the same files with at least one different file in-between get no savings from that approach.

We can build our own pathological example (we’ll use this for benchmarking later)

// ping.go
func PingStack(n int) []byte {
    if n < 0 {
        panic("expected n >= 0")
    }
    if n == 0 {
        return stack(0)
    }
    return Pong(n-1)
}
// in pong.go
func Pong(n int) []byte {return Ping(n)}

We’d like to open each file exactly once, which means grouping our stack frames by filename. There’s a few approaches you could use, but I prefer this one: If we sort the frames by file-and-line, we can populate each frame with the source in exactly one pass thorough each file. We can then format the annotated frames.

This will require a new datastructure:

type debugInfo struct {
        *runtime.Frame // hold a pointer rather than allocate a new one
        Source string // the line of source code holding the frame
        Depth int // the frame's original depth in the stack
}

This requires a few structural changes: rather than formatting the frames as we ascend the callstack, we put them all in a slice, populate them all, resort-them with their original order, and then format them into our output buffer.

If there are few or no repeated files, this will use more peak memory than the previous approach, but it’s a solid solution that saves us from the pathological cases we outlined above. And since our minimal test case proves any Gin handler will have some repeated cases, we always save some work.

func stack_03(skip int) (formatted []byte) {
    // grab the stack frame
    pc := make([]uintptr, 64)
    n := runtime.Callers(skip, pc)
    if n == 0 { // no callers: e.g, skip > len(callstack).
        return nil
    }
    pc = pc[:n] // pass only valid pcs to runtime.Caller
    frames := runtime.CallersFrames(pc)
    // allocate a 4KiB reusable buffer, exactly once. we will use this both to read the input and format the output.
    buf := make([]byte, 0, 4096)
    type debugInfo struct {
        *runtime.Frame
        Source string
        Depth int
    }
    // populate the debuginfo with the lines of code that appear in the stack  trace.
    di := make([]debugInfo, 0, n)
    for depth := 0; ; depth++ {
        frame, more := frames.Next()
        di = append(di, debugInfo{Frame: &frame, Depth: depth})
        if !more {
            break
        }
    }

    // group the debuginfo by file and line: we'll resort them by depth later.
    sort.Slice(di, func(i, j int) bool {
        return di[i].File < di[j].File || (di[i].File == di[j].File && di[i].Line < di[j].Line)
    })
    // populate debug info with source.
    func() {
        line := 0
        lastFile := di[0].File
        f, err := os.Open(lastFile)
        defer func() {
            // scanner.Scan() can panic if it hits too many newlines.
            // in that case, we immediately abandon trying to get more sourceInfo and just write as much as we can.
            if p := recover(); p != nil {
                log.Println("panic while formatting stack: too many empty lines in sourcefile?", p, debug.Stack())
                f.Close()
            }

        }()

        scanner := bufio.NewScanner(f)
        scanner.Buffer(buf[:0], bufio.MaxScanTokenSize)
        for i := range di {
            if di[i].File != lastFile {
                if err == nil {
                    f.Close()
                }
                // reset: we're at the beginning of a new file
                line = 0
                f, err = os.Open(di[i].File)
                if err != nil {
                    continue
                }
                scanner = bufio.NewScanner(f)
                scanner.Buffer(buf[:0], bufio.MaxScanTokenSize)
            }
            if err != nil {
                continue
            }
            // it's possible that we have multiple calls to the same function in the stack, such as during recursion,
            // so we check that we haven't gone past BEFORE we advance the scanner.
            for ; line < di[i].Line; line++ {
                scanner.Scan()
            }
            di[i].Source = scanner.Text()

            i++
        }
        _ = f.Close()
    }()

    // put the debuginfo back in depth-first order
    sort.Slice(di, func(i, j int) bool { return di[i].Depth < di[j].Depth })
    // format it all into the buffer. we're safe to reuse buf, since we're done with all the scanners.
    out := bytes.NewBuffer(buf[:0])
    for i := range di {
        fmt.Fprintf(out, "%s:%d (0x%x)\t%s:%s\n", di[i].File, di[i].Line, di[i].PC, trimFunction(di[i].Function), strings.TrimSpace(di[i].Source))

    }

    return out.Bytes()
}

Pretty good! We’re nearly done. When there are files to read, we read them efficiently.

what if we can’t read the files?

There’s plenty of situations where we can’t read the files at all. Off the top of my head:

In this case, our implementation does a ton of work that’s completely unneccessary, juggling around a ton of data just before it’s inevitable failure.

If we had some kind of heuristic for “can read source files”, we’d know whther or not it was worth the attempt. We could then build an fast-path for that case.

Here, we use a sync.Once to lazily-check whether or not we can access the file where we defined faststack():

// local source is not always available. for example, the executable may be running on a system without the source
 // or the -trimpath buildflag could have been provided to go tool.
 // if we can't find the source for THIS file, we are unlikely to be able to find it for any file.
var localSourceOnce sync.Once
var localSourceNotFound bool
func localSourceUnavailable() bool {
        localSourceOnce.Do(func() {
            _, file, _, _ := runtime.Caller(3)
            if _, err := os.Lstat(file); err != nil {
                localSourceNotFound = true
            }

        })
        return localSourceNotFound
    }

and then we can put it all together:

finished faststack

// stack returns a formatted stack frame, skipping debug frames
func faststack(skip int) (formatted []byte) {
    // grab the stack frame
    pc := make([]uintptr, 64)
    n := runtime.Callers(skip, pc)
    if n == 0 { // no callers: e.g, skip > len(callstack).
        return nil
    }
    pc = pc[:n] // pass only valid pcs to runtime.Caller
    frames := runtime.CallersFrames(pc)
    // allocate a 4KiB reusable buffer, exactly once. we will use this both to read the input and format the output.
    buf := make([]byte, 0, 4096)
    if localSourceUnavailable() {
        // fast path: just format the frames in the order they occur without looking up the source.
        out := bytes.NewBuffer(buf)
        for {
            frame, more := frames.Next()
            fmt.Fprintf(out, "%s:%d (0x%x)\t%s:%s\n", frame.File, frame.Line, frame.PC, trimFunction(frame.Function), "")
            if !more {
                return out.Bytes()
            }
        }
    }
    // slow path: at least some local source is available, so we want to populate the debuginfo with the lines of code that appear in the stack  trace.
    di := make([]debugInfo, 0, n)
    for depth := 0; ; depth++ {
        frame, more := frames.Next()
        di = append(di, struct {
            *runtime.Frame
            Source string // line of code where the call appeared
            Depth  int
        }{Frame: &frame, Depth: depth})
        if !more {
            break
        }
    }

    // group the debuginfo by file and line: we'll resort them by depth later.
    sort.Slice(di, func(i, j int) bool {
        return di[i].File < di[j].File || (di[i].File == di[j].File && di[i].Line < di[j].Line)
    })
    // populate debug info with source.
    func() {
        line := 0
        lastFile := di[0].File
        f, err := os.Open(lastFile)
        defer func() {
            // scanner.Scan() can panic if it hits too many newlines.
            // in that case, we immediately abandon trying to get more sourceInfo and just write as much as we can.
            if p := recover(); p != nil {
                log.Println("panic while formatting stack: too many empty lines in sourcefile?", p, debug.Stack())
                f.Close()
            }

        }()

        scanner := bufio.NewScanner(f)
        scanner.Buffer(buf[:0], bufio.MaxScanTokenSize)
        for i := range di {
            if di[i].File != lastFile {
                if err == nil {
                    f.Close()
                }
                // reset: we're at the beginning of a new file
                line = 0
                f, err = os.Open(di[i].File)
                if err != nil {
                    continue
                }
                scanner = bufio.NewScanner(f)
                scanner.Buffer(buf[:0], bufio.MaxScanTokenSize)
            }
            if err != nil {
                continue
            }
            // it's possible that we have multiple calls to the same function in the stack, such as during recursion,
            // so we check that we haven't gone past BEFORE we advance the scanner.
            for ; line < di[i].Line; line++ {
                scanner.Scan()
            }
            di[i].Source = scanner.Text()

            i++
        }
        _ = f.Close()
    }()

    // put the debuginfo back in depth-first order
    sort.Slice(di, func(i, j int) bool { return di[i].Depth < di[j].Depth })
    // format it all into the buffer. we're safe to reuse buf, since we're done with all the scanners.
    out := bytes.NewBuffer(buf[:0])
    for i := range di {
        fmt.Fprintf(out, "%s:%d (0x%x)\t%s:%s\n", di[i].File, di[i].Line, di[i].PC, trimFunction(di[i].Function), strings.TrimSpace(di[i].Source))

    }

    return out.Bytes()
}

benchmarks

This behavior is hard to benchmark, since how it performs depends wildly on the structure of the callstack and actual files it’s reading, which could be wildly different. Ideally, we’d collect a wide sample set of real Go code, insert calls to faststack and slowstack, and run it on dozens or hundreds of combinations of hardware, operating system, etc, etc.

I’m way too lazy for that and this article’s already taken eight hours longer than I thought it would, so instead we’ll make a couple simple synthetic benchmarks at a variety of stack depths and try and draw some conclusions.

We’ll run the benchmarks on the same computer under four different circumstances:

linux (wsl), source code on SSD linux (wsl), source code on SSD, compiled with -trimpath

Please see the source code in bench_test.go, [ping_test.go](ping_test.go), and [pong_test.go](pong_test.go) for additional details.

bench: caveats

These are synthetic benchmarks and probably don’t reflect real-world performance.

Let’s examine the first case:

(These benchmarks are formatted using [fmtbench], a tool I created for my simple byte hacking article)

bench: linux(wsl): SSD

name runs ns/op %/max bytes %/max allocs %/max
FastStack/depth=16-24 7.76e+03 1.42e+05 0.0624 1.66e+04 0.0194 242 0.42
FastStack/depth=32-24 4.93e+03 2.18e+05 0.0959 3.3e+04 0.0385 403 0.7
FastStack/depth=64-24 4.21e+03 3.04e+05 0.133 4.59e+04 0.0536 651 1.13
FastStack/depth=128-24 3.85e+03 2.98e+05 0.131 4.59e+04 0.0536 651 1.13
FastStack/depth=256-24 4.3e+03 2.77e+05 0.122 4.59e+04 0.0536 651 1.13
FastStack/depth=512-24 4.31e+03 2.75e+05 0.121 4.59e+04 0.0536 651 1.13
FastStack/depth=1024-24 4.19e+03 3.06e+05 0.134 4.59e+04 0.0536 651 1.13
FastStack/depth=2048-24 3.45e+03 2.92e+05 0.128 4.59e+04 0.0536 651 1.13
FastStack/depth=4096-24 4.03e+03 2.98e+05 0.131 4.59e+04 0.0536 651 1.13
SlowStack/depth=0016-24 4.95e+03 2.42e+05 0.106 5.32e+05 0.62 313 0.543
SlowStack/depth=0032-24 2.74e+03 3.99e+05 0.175 8.66e+05 1.01 538 0.934
SlowStack/depth=0064-24 1.72e+03 7.07e+05 0.311 1.53e+06 1.79 988 1.71
SlowStack/depth=0128-24 814 1.44e+06 0.631 2.87e+06 3.35 1.89e+03 3.28
SlowStack/depth=0256-24 384 3.29e+06 1.44 5.54e+06 6.46 3.68e+03 6.4
SlowStack/depth=0512-24 152 7.44e+06 3.27 1.09e+07 12.7 7.28e+03 12.6
SlowStack/depth=1024-24 52 2.07e+07 9.1 2.16e+07 25.2 1.45e+04 25.1
SlowStack/depth=2048-24 18 6.6e+07 29 4.29e+07 50.1 2.88e+04 50.1
SlowStack/depth=4096-24 5 2.28e+08 100 8.57e+07 100 5.76e+04 100

We note the following:

bench: Windows (HDD)

name runs ns/op %/max bytes %/max allocs %/max
FastStack/depth=16-24 861 1.34e+06 0.29 3e+04 0.0346 250 0.434
FastStack/depth=32-24 532 2.25e+06 0.487 4.66e+04 0.0538 410 0.711
FastStack/depth=64-24 322 3.63e+06 0.784 8e+04 0.0922 653 1.13
FastStack/depth=128-24 324 3.61e+06 0.78 8e+04 0.0923 653 1.13
FastStack/depth=256-24 328 3.69e+06 0.798 8e+04 0.0923 653 1.13
FastStack/depth=512-24 326 3.58e+06 0.774 8e+04 0.0923 653 1.13
FastStack/depth=1024-24 338 3.61e+06 0.78 8e+04 0.0923 653 1.13
FastStack/depth=2048-24 322 3.64e+06 0.786 8e+04 0.0923 653 1.13
FastStack/depth=4096-24 327 3.67e+06 0.794 8e+04 0.0923 653 1.13
SlowStack/depth=0016-24 752 1.6e+06 0.346 5.38e+05 0.621 317 0.55
SlowStack/depth=0032-24 427 2.84e+06 0.613 8.76e+05 1.01 542 0.94
SlowStack/depth=0064-24 232 5.11e+06 1.11 1.55e+06 1.79 992 1.72
SlowStack/depth=0128-24 120 1.02e+07 2.22 2.9e+06 3.35 1.89e+03 3.28
SlowStack/depth=0256-24 57 2.02e+07 4.37 5.61e+06 6.47 3.69e+03 6.4
SlowStack/depth=0512-24 31 4e+07 8.66 1.1e+07 12.7 7.28e+03 12.6
SlowStack/depth=1024-24 13 8.56e+07 18.5 2.18e+07 25.2 1.45e+04 25.1
SlowStack/depth=2048-24 6 1.93e+08 41.8 4.35e+07 50.1 2.89e+04 50.1
SlowStack/depth=4096-24 3 4.62e+08 100 8.68e+07 100 5.76e+04 100

This is slower, but not by as much as we might expect. Probably the operating system’s in-memory read cache is doing the heavy lifting here.

Only the truly deep calls start taking nearly a second (though that’s really slow for a http response!)

Is it worth optimizing a panic recovery handler?

Everything comes at a cost. In this case, faststack cuts down on the memory usage and clock time significantly by using a more efficient algorithm and limiting the total stack depth, at the cost of doubling the length of the code and increasing it’s complexity. slowstack was trivial to understand, and faststack is definitely not; it requires a new data structure, global variables, lazy initialization, it’s own panic recovery handler, and two separate sorts.

Why do we have a panic recovery handler in the first place? To provide continuous service, even in the presence of software bugs. That is, a panic recovery handler is supposed to be a last-ditch protection against a bug that never should have made it into production. These panics are supposed to be rare; prevented by a suite of tests, CI, etc. In practice, well, all software is buggy, and Go programs can panic a lot. When I worked at an ISP, I had production bugs that triggered 20K recovery handlers, per minute, per server (on ~5 servers). If those stacks have 20 frames each, on average, and the files those frames came from average 20K, 8GiB of allocations a minute per server, or 40GiB of allocations per minute: - that’s enough to bring most containers to a crawl, and even a relatively beefy modern PC might struggle to clean up all that garbage. This server only has 256MiB of RAM (as of my last edit.) Admittedly, that’s a pretty niche scenario.

More worryingly, there’s two performance problems that faststack can’t help with: first, most storage mediums can’t do concurrent IO. If something else on the server needs to touch the disks the panic handler is reading, then they’ll block each other for as long as those reads take.

Secondly, most operating systems place a limit on the number of file handles a single process can open (1024 by default on linux, iirc). Concurrent calls to faststack() might quickly eat up this limit. While these conditions are rare, they’re likely to occur for at least some of Gin’s users, who may struggle to diagnose the problem.

There are technical solutions for those problems: for example, you could cut down on IO by havin a concurrency-safe cache for common file:line couples, and you could limit the total memory usage of that by having it be a LRU cache, but that’s adding even more complexity.

Which leads us to an important question: what benefit are we getting? Is having a single line of source code annotation really helpful?

closing thoughts

How much benefit are we really getting from this implementation? How likely is a single line of source code to help us to fix a bug, if we already have the file, line, and function name? Generally speaking, we’ll need more context than a single line to diagnose the bug: we’ll have to look at the source itself, not just a pinhole window into it. While it’s a cool trick, this source code annotation is just that: a neat parlor trick, not a particularly helpful debugging tool. And there’s one corner case where it could be really damaging: if the source code present on the host system differs from the version used to compile the executable, the source code annotations will be wrong.

At some point, it’s important to take a step back and ask yourself whether or not all this work is worth it in the first place. I don’t think slowstack or faststack solve a problem that needs to be solved. I think think the Gin project would be better served by just using runtime.Stack() and forgetting about source code annotation. Sometimes simpler is better.

We’ll talk about panics, logging, and recovery more in a later article: stay tuned.

Like this article? Need help making great software, or just want to save a couple hundred thousand dollars on your cloud bill? Hire me, or bring me in to consult. Professional enquiries at efron.dev@gmail.com or linkedin