并发原语sync-Once源码剖析

Once

很多人认为实现一个 Once 一样的并发原语很简单,只需使用一个 flag 标记是否初始化过即可,最多是用 atomic 原子操作这个 flag,但是,这个实现有一个很大的问题,就是如果参数 f 执行很慢的话,后续调用 Do 方法的 goroutine 虽然看到 done 已经设置为执行过了,但是获取某些初始化资源的时候可能会得到空的资源,因为 f 还没有执行完。

所以,一个正确的 Once 实现要使用一个互斥锁,这样初始化的时候如果有并发的 goroutine,就会进入doSlow 方法。互斥锁的机制保证只有一个 goroutine 进行初始化,同时利用双检查的机制(double-checking),再次判断 o.done 是否为 0,如果为 0,则是第一次执行,执行完毕后,就将 o.done 设置为 1,然后释放锁。

即使此时有多个 goroutine 同时进入了 doSlow 方法,因为双检查的机制,后续的 goroutine 会看到 o.done 的值为 1,也不会再次执行 f。

这样既保证了并发的 goroutine 会等待 f 完成,而且还不会多次执行 f。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81


// Once is an object that will perform exactly one action.
// Once 对象可以保证一个动作的绝对一次执行。
type Once struct {
	// done indicates whether the action has been performed.
	// It is first in the struct because it is used in the hot path.
	// The hot path is inlined at every call site.
	// Placing done first allows more compact instructions on some architectures (amd64/x86),
	// and fewer instructions (to calculate offset) on other architectures.
	// done 表明某个动作是否被执行
	// 由于其使用频繁（热路径），故将其放在结构体的最上方
	// 热路径在每个调用点进行内嵌
	// 将 done 放在第一位，在某些架构下（amd64/x86）能获得更加紧凑的指令，
	// 而在其他架构下能更少的指令（用于计算其偏移量）。
	done uint32
	m    Mutex
}

// Do calls the function f if and only if Do is being called for the
// first time for this instance of Once. In other words, given
// 	var once Once
// if once.Do(f) is called multiple times, only the first call will invoke f,
// even if f has a different value in each invocation. A new instance of
// Once is required for each function to execute.
//
// Do is intended for initialization that must be run exactly once. Since f
// is niladic, it may be necessary to use a function literal to capture the
// arguments to a function to be invoked by Do:
// 	config.once.Do(func() { config.init(filename) })
//
// Because no call to Do returns until the one call to f returns, if f causes
// Do to be called, it will deadlock.
//
// If f panics, Do considers it to have returned; future calls of Do return
// without calling f.
//
// Do 当且仅当第一次调用时，f 会被执行。换句话说，给定
// 	var once Once
// 如果 once.Do(f) 被多次调用则只有第一次会调用 f，即使每次提供的 f 不同。
// 每次执行必须新建一个 Once 实例。
//
// Do 用于变量的一次初始化，由于 f 是无参数的，因此有必要使用函数字面量来捕获参数：
// 	config.once.Do(func() { config.init(filename) })
//
// 因为该调用无返回值，因此如果 f 调用了 Do，则会导致死锁。
//
// 如果 f 发生 panic，则 Do 认为 f 已经返回；之后的调用也不会调用 f。
//
func (o *Once) Do(f func()) {
	// Note: Here is an incorrect implementation of Do:
	//
	//	if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
	//		f()
	//	}
	//
	// Do guarantees that when it returns, f has finished.
	// This implementation would not implement that guarantee:
	// given two simultaneous calls, the winner of the cas would
	// call f, and the second would return immediately, without
	// waiting for the first's call to f to complete.
	// This is why the slow path falls back to a mutex, and why
	// the atomic.StoreUint32 must be delayed until after f returns.
	// 原子读取 Once 内部的 done 属性，是否为 0，是则进入慢速路径，否则直接调用
	if atomic.LoadUint32(&o.done) == 0 {
		// Outlined slow-path to allow inlining of the fast-path.
		o.doSlow(f)
	}
}

func (o *Once) doSlow(f func()) {
	// 注意，我们只使用原子读读取了 o.done 的值，这是最快速的路径执行原子操作，即 fast-path
	// 但当我们需要确保在并发状态下，是不是有多个人读到 0，因此必须加锁，这个操作相对昂贵，即 slow-path
	o.m.Lock()
	defer o.m.Unlock()
	// 双检查
	// 正好我们有一个并发的 goroutine 读到了 0，那么立即执行 f 并在结束时候调用原子写，将 o.done 修改为 1
	if o.done == 0 {
		defer atomic.StoreUint32(&o.done, 1)
		f()
	}
	// 当 o.done 为 0 的 goroutine 解锁后，其他人会继续加锁，这时会发现 o.done 已经为了 1 ，于是 f 已经不用在继续执行了
}

有的同学不太理解为什么要使用Mutex。

使用Mutex并不会影响这个数据结构的性能。因为Mutex的逻辑(也就是doSlow方法)只会在初始化时并发的情况下发生，一旦初始化完成，后续的goroutine在调用Do方法时并不会请求锁。

所以，使用Mutex主要处理并发初始化的问题。

假设Once对象的Do方法还没有被初次调用。这个时候有goroutine g2和goroutine g3同时调用Do方法。碰巧，g2和g3可能原子读取done变量会等于0,所以这两个goroutine可能都会同时进入doSlow方法(可能在同一个CPU上，也可能在不同的cpu上)。

这个时候我们就需要Mutex限制只允许一个goroutine并发执行，也就是将并行变成了串行。假设g2运气好先执行，那么它就会进行初始化，并且执行完毕后把o.done设置为1,再释放锁。

锁释放后，g3开始执行，这个时候还会执行double checking,再一次检查done字段。这一步是必须的，因为不双检查的话，它又会执行f一次。这里正确地使用了双检查，发现done已经被设置成了1,所以不需要初始化了，就直接返回。

如果一个goroutine在双检查的时候如果发现done=0,说明还没有goroutine执行过初始化，这种重担压在了自己的身上，就像g2一样，它就会执行初始化函数f。

所以，这里使用了Mutex,保护并发的初始化。

我们知道，现代的CPU都是支持乱序执行的。那么最后两行defer atomic.StoreUint32(&o.done, 1)和f()如果乱序了怎么办，那不也是还是没有初始化完毕就把done设置为1了吗？

而且有些人，包括Russ Cox review这段代码的时候也提出，defer atomic.StoreUint32(&o.done, 1) (源代码和此有所不同)能不能改成o.done=1。

Go的标准库的代码质量是非常高的，而且都经过大神的仔细review,所以这样设计肯定是有它的用处的。

首先，第15行的defer atomic.StoreUint32(&o.done, 1)可以确保执行完第16行的f才将done设置为1。

Ian Lance Taylor曾经在论坛中说:

In C++ memory model terms I believe that the sync/atomic Load operations are memory_order_acquire, and I believe that the sync/atomic Store operations are memory_order_release. It’s possible that if we ever document it we will go for stronger memory ordering, but I believe that these operations must at least carry those guarantees.

I’m somewhat less certain of the memory order guarantees of the Swap, CompareAndSwap, and Add functions. I guess that Swap and CompareAndSwap are probably at least memory_order_acq_rel, but Add may be memory_order_relaxed.

Russ Cox曾经回答过问题,他把go的atomic 操作定位sequential consistency的，这是一个更严格的memory ordering。它们之前的读写操作,不会重排在Load/Store之后，它们之后的读写操作也不会重排在Load/Store之前，所以建立了一个内存屏障(Memory barrier)。

rsc 2019年7月16日上午9:12:01

Although there’s been no official resolution to the issue, I think the actual path forward is what I posted a while back: “Go’s atomics guarantee sequential consistency among the atomic variables (behave like C/C++’s seqconst atomics), and that you shouldn’t mix atomic and non-atomic accesses for a given memory word.”

至少目前，我们可以按照他们的解答进行理解。

这样的话，Go可以保证第15行defer atomic.StoreUint32(&o.done, 1)肯定会在第16行f()之后执行，这样就不会出现未初始化完成就将done设置为1的问题。

另一个问题，第14行为什么不使用atomic?

因为Mutex的happend before关系， g2设置o.done=1之后才释放锁，这个时候g3才获取到锁，所以当g3获取到锁之后，o.done肯定就已经是1了，所以这个时候访问o.done肯定得到1的结果，不会在g2设置o.done=1 g3看不到o.done=1这个write。

第6行没有Mutex等的保护，所以通过atomic可以保证在o.done设置为1之后能看到这个设置的结果，避免总是落入到doSlow逻辑中。