基本原语panic和recover源码剖析

前言

本节将分析 Go 语言中两个经常成对出现的两个关键字 — panic 和 recover。这两个关键字与上一节提到的 defer 有紧密的联系，它们都是 Go 语言中的内置函数，也提供了互补的功能。

panic 能够改变程序的控制流，调用 panic 后会立刻停止执行当前函数的剩余代码，并在当前 Goroutine 中递归执行调用方的 defer；
recover 可以中止 panic 造成的程序崩溃。它是一个只能在 defer 中发挥作用的函数，在其他作用域中调用不会发挥作用；

现象

我们先通过几个例子了解一下使用 panic 和 recover 关键字时遇到的现象，部分现象也与上一节分析的 defer 关键字有关：

panic 只会触发当前 Goroutine 的 defer；
recover 只有在 defer 中调用才会生效；
panic 允许在 defer 中嵌套多次调用；

跨协程失效

首先要介绍的现象是 panic 只会触发当前 Goroutine 的延迟函数调用，我们可以通过如下所示的代码了解该现象：

1
2
3
4
5
6
7
8
9


func main() {
	defer println("in main")
	go func() {
		defer println("in goroutine")
		panic("")
	}()

	time.Sleep(1 * time.Second)
}

1
2
3
4


$ go run main.go
in goroutine
panic:
...

当我们运行这段代码时会发现 main 函数中的 defer 语句并没有执行，执行的只有当前 Goroutine 中的 defer。

前面我们曾经介绍过 defer 关键字对应的 runtime.deferproc 会将延迟调用函数与调用方所在 Goroutine 进行关联。所以当程序发生崩溃时只会调用当前 Goroutine 的延迟调用函数也是非常合理的。

如上图所示，多个 Goroutine 之间没有太多的关联，一个 Goroutine 在 panic 时也不应该执行其他 Goroutine 的延迟函数。

失效的崩溃恢复

初学 Go 语言的读者可能会写出下面的代码，在主程序中调用 recover 试图中止程序的崩溃，但是从运行的结果中我们也能看出，下面的程序没有正常退出。

1
2
3
4
5
6
7
8


func main() {
	defer fmt.Println("in main")
	if err := recover(); err != nil {
		fmt.Println(err)
	}

	panic("unknown err")
}

1
2
3
4
5
6
7
8


$ go run main.go
in main
panic: unknown err

goroutine 1 [running]:
main.main()
	...
exit status 2

仔细分析一下这个过程就能理解这种现象背后的原因，recover 只有在发生 panic 之后调用才会生效。然而在上面的控制流中，recover 是在 panic 之前调用的，并不满足生效的条件，所以我们需要在 defer 中使用 recover 关键字。

嵌套崩溃

Go 语言中的 panic 是可以多次嵌套调用的。一些熟悉 Go 语言的读者很可能也不知道这个知识点，如下所示的代码就展示了如何在 defer 函数中多次调用 panic：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


func main() {
	defer fmt.Println("in main")
	defer func() {
		defer func() {
			panic("panic again and again")
		}()
		panic("panic again")
	}()

	panic("panic once")
}

1
2
3
4
5
6
7
8
9


$ go run main.go
in main
panic: panic once
	panic: panic again
	panic: panic again and again

goroutine 1 [running]:
...
exit status 2

从上述程序输出的结果，我们可以确定程序多次调用 panic 也不会影响 defer 函数的正常执行，所以使用 defer 进行收尾工作一般来说都是安全的。

汇编代码

最好的方式当然是了解编译器究竟做了什么事情：

1
2
3
4
5
6
7
8


package main

func main() {
	defer func() {
		recover()
	}()
	panic(nil)
}

其汇编的形式为：

1
2
3
4
5
6
7


TEXT main.main(SB) /Users/changkun/dev/go-under-the-hood/demo/7-lang/panic/main.go
  (...)
  main.go:7		0x104e05b		0f57c0			XORPS X0, X0
  main.go:7		0x104e05e		0f110424		MOVUPS X0, 0(SP)
  main.go:7		0x104e062		e8d935fdff		CALL runtime.gopanic(SB)
  main.go:7		0x104e067		0f0b			UD2
  (...)

可以看到 panic 这个关键词本质上只是一个 runtime.gopanic 调用。

而与之对应的 recover 则：

1
2
3
4
5
6


TEXT main.main.func1(SB) /Users/changkun/dev/go-under-the-hood/demo/7-lang/panic/main.go
  (...)
  main.go:5		0x104e09d		488d442428		LEAQ 0x28(SP), AX
  main.go:5		0x104e0a2		48890424		MOVQ AX, 0(SP)
  main.go:5		0x104e0a6		e8153bfdff		CALL runtime.gorecover(SB)
  (...)

其实也只是一个 runtime.gorecover 调用。

数据结构

panic 关键字在 Go 语言的源代码是由数据结构 runtime._panic 表示的。每当我们调用 panic 都会创建一个如下所示的数据结构存储相关信息：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


// A _panic holds information about an active panic.
//
// A _panic value must only ever live on the stack.
//
// The argp and link fields are stack pointers, but don't need special
// handling during stack growth: because they are pointer-typed and
// _panic values only live on the stack, regular stack pointer
// adjustment takes care of them.
// _panic 保存了一个活跃的 panic
//
// 这个标记了 go:notinheap 因为 _panic 的值必须位于栈上
//
// argp 和 link 字段为栈指针，但在栈增长时不需要特殊处理：因为他们是指针类型且
// _panic 值只位于栈上，正常的栈指针调整会处理他们。
//
//go:notinheap
type _panic struct {
	// argp 是指向 defer 调用时参数的指针；
	// panic 期间 defer 调用参数的指针; 无法移动 - liblink 已知
	argp      unsafe.Pointer // pointer to arguments of deferred call run during panic; cannot move - known to liblink
	// arg 是调用 panic 时传入的参数；
	arg       interface{}    // argument to panic
	// link 指向了更早调用的 runtime._panic 结构；
	// link 链接到更早的 panic
	link      *_panic        // link to earlier panic
	pc        uintptr        // where to return to in runtime if this panic is bypassed
	sp        unsafe.Pointer // where to return to in runtime if this panic is bypassed
	// recovered 表示当前 runtime._panic 是否被 recover 恢复
	// 表明 panic 是否结束
	recovered bool           // whether this panic is over
	// aborted 表示当前的 panic 是否被强行终止；
	// 表明 panic 是否忽略
	aborted   bool           // the panic was aborted
	goexit    bool
}

从数据结构中的 link 字段我们就可以推测出以下的结论：panic 函数可以被连续多次调用，它们之间通过 link 可以组成链表。

结构体中的 pc、sp 和 goexit 三个字段都是为了修复 runtime.Goexit 带来的问题引入的。runtime.Goexit 能够只结束调用该函数的 Goroutine 而不影响其他的 Goroutine，但是该函数会被 defer 中的 panic 和 recover 取消，引入这三个字段就是为了保证该函数的一定会生效。

panic

这里先介绍分析 panic 函数是终止程序的实现原理。在处理 panic 期间，会先判断当前 panic 的类型，确定 panic 是否可恢复。编译器会将关键字 panic 转换成 runtime.gopanic，该函数的执行过程包含以下几个步骤：

创建新的 runtime._panic 并添加到所在 Goroutine 的_panic 链表的最前面；
在循环中不断从当前 Goroutine 的 _defer 中链表获取 runtime._defer 并调用 runtime.reflectcall 运行延迟调用函数；
如果recovered=true,执行程序的恢复;
调用 runtime.fatalpanic 中止整个程序；

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214


// The implementation of the predeclared function panic.
func gopanic(e interface{}) {
	gp := getg()
	// 判断在系统栈上还是在用户栈上
	// 如果执行在系统或信号栈时，getg() 会返回当前 m 的 g0 或 gsignal
	// 因此可以通过 gp.m.curg == gp 来判断所在栈
	// 系统栈上的 panic 无法恢复
	if gp.m.curg != gp {
		print("panic: ") // 打印
		printany(e)      // 打印
		print("\n")      // 继续打印，下同
		throw("panic on system stack")
	}

	// 如果正在进行 malloc 时发生 panic 也无法恢复
	if gp.m.mallocing != 0 {
		print("panic: ")
		printany(e)
		print("\n")
		throw("panic during malloc")
	}

	// 在禁止抢占时发生 panic 也无法恢复
	if gp.m.preemptoff != "" {
		print("panic: ")
		printany(e)
		print("\n")
		print("preempt off reason: ")
		print(gp.m.preemptoff)
		print("\n")
		throw("panic during preemptoff")
	}

	// 在 g 锁在 m 上时发生 panic 也无法恢复
	if gp.m.locks != 0 {
		print("panic: ")
		printany(e)
		print("\n")
		throw("panic holding locks")
	}
	// 其他情况，panic 可以从运行时进行恢复，这时候会创建一个 _panic 实例。_panic 类型 定义了一个 _panic 链表
	var p _panic
	p.arg = e
	p.link = gp._panic
	gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))

	atomic.Xadd(&runningPanicDefers, 1)

	// By calculating getcallerpc/getcallersp here, we avoid scanning the
	// gopanic frame (stack scanning is slow...)
	addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
	// 接下来开始逐一调用当前 goroutine 的 defer 方法， 检查用户态代码是否需要对 panic 进行恢复：
	// 首先，当 panic 发生时，如果错误是可恢复的错误，那么 会逐一遍历该 goroutine 对应 defer 链表中的 defer 函数链表，直到 defer 遍历完毕、 或者再次进入调度循环（recover 的 mcall 调用） 后才会停止。
	for {
		// 开始逐个取当前 goroutine 的 defer 调用
		d := gp._defer
		// 如果没有 defer 调用，则跳出循环
		if d == nil {
			break
		}

		// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
		// take defer off list. An earlier panic will not continue running, but we will make sure below that an
		// earlier Goexit does continue running.
		// 如果 defer 是由早期的 panic 或 Goexit 开始的（并且，因为我们回到这里，这引发了新的 panic），
		// 则将 defer 带离链表。更早的 panic 或 Goexit 将无法继续运行。
		if d.started {
			if d._panic != nil {
				d._panic.aborted = true
			}
			d._panic = nil
			if !d.openDefer {
				// For open-coded defers, we need to process the
				// defer again, in case there are any other defers
				// to call in the frame (not including the defer
				// call that caused the panic).
				d.fn = nil
				gp._defer = d.link
				freedefer(d)
				continue
			}
		}

		// Mark defer as started, but keep on list, so that traceback
		// can find and update the defer's argument frame if stack growth
		// or a garbage collection happens before reflectcall starts executing d.fn.
		// 如果栈增长或者垃圾回收在 reflectcall 开始执行 d.fn 前发生
		// 标记 defer 已经开始执行，但仍将其保存在列表中，从而 traceback 可以找到并更新这个 defer 的参数帧
		d.started = true

		// Record the panic that is running the defer.
		// If there is a new panic during the deferred call, that panic
		// will find d in the list and will mark d._panic (this panic) aborted.
		// 记录正在运行 defer 的 panic。如果在 defer 调用期间出现新的 panic，该 panic 将在列表中
		// 找到 d 并标记 d._panic（该 panic）中止。
		d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))

		done := true
		if d.openDefer {
			done = runOpenDeferFrame(gp, d)
			if done && !d._panic.recovered {
				addOneOpenDeferFrame(gp, 0, nil)
			}
		} else {
			p.argp = unsafe.Pointer(getargp(0))
			reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
		}
		p.argp = nil

		// reflectcall did not panic. Remove d.
		// reflectcall 不会 panic. 移出 d.
		if gp._defer != d {
			throw("bad defer entry in panic")
		}
		d._panic = nil

		// trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic
		//GC()

		pc := d.pc
		// 必须是指针，以便在栈复制期间进行调整
		sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
		if done {
			d.fn = nil
			gp._defer = d.link
			freedefer(d)
		}
		// 执行延迟调用函数，可能会设置 p.recovered = true
		if p.recovered {
			gp._panic = p.link
			if gp._panic != nil && gp._panic.goexit && gp._panic.aborted {
				// 从 runtime._defer 中取出了程序计数器 pc 和栈指针 sp 并调用 runtime.recovery 函数触发 Goroutine 的调度，调度之前会准备好 sp、pc 以及函数的返回值：
				// A normal recover would bypass/abort the Goexit.  Instead,
				// we return to the processing loop of the Goexit.
				gp.sigcode0 = uintptr(gp._panic.sp)
				gp.sigcode1 = uintptr(gp._panic.pc)
				mcall(recovery)
				throw("bypassed recovery failed") // mcall should not return
			}
			atomic.Xadd(&runningPanicDefers, -1)

			// Remove any remaining non-started, open-coded
			// defer entries after a recover, since the
			// corresponding defers will be executed normally
			// (inline). Any such entry will become stale once
			// we run the corresponding defers inline and exit
			// the associated stack frame.
			d := gp._defer
			var prev *_defer
			if !done {
				// Skip our current frame, if not done. It is
				// needed to complete any remaining defers in
				// deferreturn()
				prev = d
				d = d.link
			}
			for d != nil {
				if d.started {
					// This defer is started but we
					// are in the middle of a
					// defer-panic-recover inside of
					// it, so don't remove it or any
					// further defer entries
					break
				}
				if d.openDefer {
					if prev == nil {
						gp._defer = d.link
					} else {
						prev.link = d.link
					}
					newd := d.link
					freedefer(d)
					d = newd
				} else {
					prev = d
					d = d.link
				}
			}

			gp._panic = p.link
			// Aborted panics are marked but remain on the g.panic list.
			// Remove them from the list.
			// 忽略的 panic 会被标记，但仍然保留在 g.panic 列表中
			// 这里将它们移出列表
			for gp._panic != nil && gp._panic.aborted {
				gp._panic = gp._panic.link
			}
			if gp._panic == nil { // must be done with signal	// 必须由 signal 完成
				gp.sig = 0
			}
			// Pass information about recovering frame to recovery.
			// 传递关于恢复帧的信息
			gp.sigcode0 = uintptr(sp)
			gp.sigcode1 = pc
			// 调用 recover，并重新进入调度循环，不再返回
			mcall(recovery)
			// 如果无法重新进入调度循环，则无法恢复错误
			throw("recovery failed") // mcall should not return
		}
	}
	// 当然如果所有的 defer 都没有指明显式的 recover，那么这时候则直接在运行时抛出 panic 信息：
	// 消耗完所有的 defer 调用，保守地进行 panic
	// 因为在冻结之后调用任意用户代码是不安全的，所以我们调用 preprintpanics 来调用
	// 所有必要的 Error 和 String 方法来在 startpanic 之前准备 panic 字符串。
	// ran out of deferred calls - old-school panic now
	// Because it is unsafe to call arbitrary user code after freezing
	// the world, we call preprintpanics to invoke all necessary Error
	// and String methods to prepare the panic strings before startpanic.
	preprintpanics(gp._panic)

	fatalpanic(gp._panic) // should not return	// 不应该返回
	*(*int)(nil) = 0      // not reached	// 无法触及
}

reflectcall

defer 并非简单的遍历，每个在 panic 和 recover 之间的 defer 都会在这里通过 reflectcall 执行。

1
2
3
4
5
6
7
8


// reflectcall 使用 arg 指向的 n 个参数字节的副本调用 fn。
// fn 返回后，reflectcall 在返回之前将 n-retoffset 结果字节复制回 arg+retoffset。
// 如果重新复制结果字节，则调用者应将参数帧类型作为 argtype 传递，以便该调用可以在复制期间执行适当的写障碍。
// reflect 包传递帧类型。在 runtime 包中，只有一个调用将结果复制回来，即 cgocallbackg1，
// 并且它不传递帧类型，这意味着没有调用写障碍。参见该调用的页面了解相关理由。
//
// 包 reflect 通过 linkname 访问此符号
func reflectcall(argtype *_type, fn, arg unsafe.Pointer, argsize uint32, retoffset uint32)

preprintpanics

至于 preprintpanics 和 fatalpanic 无非是一些错误输出，不再赘述：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


// Call all Error and String methods before freezing the world.
// Used when crashing with panicking.
// 在停止前调用所有的 Error 和 String 方法
func preprintpanics(p *_panic) {
	defer func() {
		if recover() != nil {
			throw("panic while printing panic value")
		}
	}()
	for p != nil {
		switch v := p.arg.(type) {
		case error:
			p.arg = v.Error()
		case stringer:
			p.arg = v.String()
		}
		p = p.link
	}
}

至于 preprintpanics 和 fatalpanic 无非是一些错误输出，不再赘述：

fatalpanic

Go 语言在 1.14 通过 runtime: ensure that Goexit cannot be aborted by a recursive panic/recover 提交解决了递归 panic 和 recover 与 runtime.Goexit 的冲突。

runtime.fatalpanic 实现了无法被恢复的程序崩溃，它在中止程序之前会通过 runtime.printpanics 打印出全部的 panic 消息以及调用时传入的参数：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


// fatalpanic 实现了不可恢复的 panic。类似于 fatalthrow，
// 要求如果 msgs != nil，则 fatalpanic 仍然能够打印 panic 的消息并在 main 在退出时候减少 runningPanicDefers。
// fatalpanic implements an unrecoverable panic. It is like fatalthrow, except
// that if msgs != nil, fatalpanic also prints panic messages and decrements
// runningPanicDefers once main is blocked from exiting.
//
//go:nosplit
func fatalpanic(msgs *_panic) {
	pc := getcallerpc()
	sp := getcallersp()
	gp := getg()
	var docrash bool
	// Switch to the system stack to avoid any stack growth, which
	// may make things worse if the runtime is in a bad state.
	// 切换到系统栈来避免栈增长，如果运行时状态较差则可能导致更糟糕的事情
	systemstack(func() {
		if startpanic_m() && msgs != nil {
			// There were panic messages and startpanic_m
			// says it's okay to try to print them.

			// startpanic_m set panicking, which will
			// block main from exiting, so now OK to
			// decrement runningPanicDefers.
			// 有 panic 消息和 startpanic_m 则可以尝试打印它们

			// startpanic_m 设置 panic 会从阻止 main 的退出，
			// 因此现在可以开始减少 runningPanicDefers 了
			atomic.Xadd(&runningPanicDefers, -1)

			printpanics(msgs)
		}

		docrash = dopanic_m(gp, pc, sp)
	})

	if docrash {
		// By crashing outside the above systemstack call, debuggers
		// will not be confused when generating a backtrace.
		// Function crash is marked nosplit to avoid stack growth.
		// 通过在上述 systemstack 调用之外崩溃，调试器在生成回溯时不会混淆。
		// 函数崩溃标记为 nosplit 以避免堆栈增长。
		crash()
	}
	// 从系统栈退出
	systemstack(func() {
		exit(2)
	})
	// 不可达
	*(*int)(nil) = 0 // not reached
}

打印崩溃消息后会调用 runtime.exit 退出当前程序并返回错误码 2，程序的正常退出也是通过 runtime.exit 实现的。

recovery

当 reflectcall 执行完毕后，这时如果一个 panic 是可恢复的，p.recovered 已经被标记为 true，从而会通过 mcall 的方式来执行 recovery 函数来重新进入调度循环：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


// 在发生 panic 后 defer 函数调用 recover 后展开栈。然后安排继续运行，
// 就像 defer 函数的调用方正常返回一样。
func recovery(gp *g) {
	// 传递到 G 结构的 defer 信息
	sp := gp.sigcode0
	pc := gp.sigcode1
	// 使 deferproc 为此 d 返回
	// 这时候返回 1。调用函数将跳转到标准的返回尾声
	gp.sched.sp = sp
	gp.sched.pc = pc
	gp.sched.lr = 0
	gp.sched.ret = 1
	gogo(&gp.sched)
}

当我们在调用 defer 关键字时，调用时的栈指针 sp 和程序计数器 pc 就已经存储到了 runtime._defer 结构体中，这里的 runtime.gogo 函数会跳回 defer 关键字调用的位置。

runtime.recovery 在调度过程中会将函数的返回值设置成 1。从 runtime.deferproc 的注释中我们会发现，当 runtime.deferproc 函数的返回值是 1 时，编译器生成的代码会直接跳转到调用方函数返回之前并执行 runtime.deferreturn：

1
2
3
4


func deferproc(siz int32, fn *funcval) {
	...
	return0()
}

跳转到 runtime.deferreturn 函数之后，程序就已经从 panic 中恢复了并执行正常的逻辑，而 runtime.gorecover 函数也能从 runtime._panic 结构中取出了调用 panic 时传入的 arg 参数并返回给调用方。

gorecover

到这里我们已经掌握了 panic 退出程序的过程，接下来将分析 defer 中的 recover 是如何中止程序崩溃的。编译器会将关键字 recover 转换成 runtime.gorecover：

如果reflectcall调起的某个defer函数包含了 recover 的调用（即 gorecover 调用）被执行，这时 _panic 实例 p.recovered 会被标记为 true：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


// The implementation of the predeclared function recover.
// Cannot split the stack because it needs to reliably
// find the stack segment of its caller.
//
// TODO(rsc): Once we commit to CopyStackAlways,
// this doesn't need to be nosplit.
//go:nosplit
func gorecover(argp uintptr) interface{} {
	// Must be in a function running as part of a deferred call during the panic.
	// Must be called from the topmost function of the call
	// (the function used in the defer statement).
	// p.argp is the argument pointer of that topmost deferred function call.
	// Compare against argp reported by caller.
	// If they match, the caller is the one who can recover.
	// 必须在 panic 期间作为 defer 调用的一部分在函数中运行。
	// 必须从调用的最顶层函数（ defer 语句中使用的函数）调用。
	// p.argp 是最顶层 defer 函数调用的参数指针。
	// 比较调用方报告的 argp，如果匹配，则调用者可以恢复。
	gp := getg()
	p := gp._panic
	if p != nil && !p.goexit && !p.recovered && argp == uintptr(p.argp) {
		p.recovered = true
		return p.arg
	}
	return nil
}

同时 recover() 这个函数还会返回 panic 的保存相关信息 p.arg。恢复的原则取决于 gorecover 这个方法调用方报告的 argp 是否与 p.argp 相同，仅当相同才可恢复。

该函数的实现很简单，如果当前 Goroutine 没有调用 panic，那么该函数会直接返回 nil，这也是崩溃恢复在非 defer 中调用会失效的原因。在正常情况下，它会修改 runtime._panic 的 recovered 字段，runtime.gorecover 函数中并不包含恢复程序的逻辑，程序的恢复是由 runtime.gopanic 函数负责的

小结

分析程序的崩溃和恢复过程比较棘手，代码不是特别容易理解。我们在本节的最后还是简单总结一下程序崩溃和恢复的过程：

编译器会负责做转换关键字的工作；
1. 将 panic 和 recover 分别转换成 runtime.gopanic 和 runtime.gorecover；
2. 将 defer 转换成 runtime.deferproc 函数；
3. 在调用 defer 的函数末尾调用 runtime.deferreturn 函数；
在运行过程中遇到 runtime.gopanic 方法时，会从 Goroutine 的链表依次取出 runtime._defer 结构体并执行；
如果调用延迟执行函数时遇到了 runtime.gorecover 就会将_panic.recovered 标记成 true 并返回 panic 的参数；
1. 在这次调用结束之后，runtime.gopanic 会从 runtime._defer 结构体中取出程序计数器 pc 和栈指针 sp 并调用 runtime.recovery 函数进行恢复程序；
2. runtime.recovery 会根据传入的 pc 和 sp 跳转回 runtime.deferproc；
3. 编译器自动生成的代码会发现 runtime.deferproc 的返回值不为 0，这时会跳回 runtime.deferreturn 并恢复到正常的执行流程；
如果没有遇到 runtime.gorecover 就会依次遍历所有的 runtime._defer，并在最后调用 runtime.fatalpanic 中止程序、打印 panic 的参数并返回错误码 2；

从 panic 和 recover 这对关键字的实现上可以看出，可恢复的 panic 必须要 recover 的配合。而且，这个 recover 必须位于同一 goroutine 的直接调用链上（例如，如果 A 依次调用了 B 和 C，而 B 包含了 recover，而 C 发生了 panic，则这时 B 的 panic 无法恢复 C 的 panic；又例如 A 调用了 B 而 B 又调用了 C，那么 C 发生 panic 时，如果 A 要求了 recover 则仍然可以恢复），否则无法对 panic 进行恢复。

当一个 panic 被恢复后，调度并因此中断，会重新进入调度循环，进而继续执行 recover 后面的代码，包括比 recover 更早的 defer（因为已经执行过得 defer 已经被释放，而尚未执行的 defer 仍在 goroutine 的 defer 链表中），或者 recover 所在函数的调用方。

参考

5.4 panic 和 recover

9.3 恐慌与恢复内建函数

文章目录

前言

现象