指数backoff

当我们连接到一个失败的后端时,通常希望不要立即重试(以避免泛滥的网络或服务器的请求),而是做某种形式的指数backoff。

我们有几个参数:

  1. INITIAL_BACKOFF (第一次失败重试前后需等待多久)
  2. MULTIPLIER (在失败的重试后乘以的倍数)
  3. JITTER (随机抖动因子).
  4. MAX_BACKOFF (backoff上限)
  5. MIN_CONNECT_TIMEOUT (最短重试间隔)

建议backoff算法

以指数形式返回连接尝试的起始时间,达到MAX_BACKOFF的极限,并带有抖动。

1
2
3
4
5
6
7
ConnectWithBackoff()
  current_backoff = INITIAL_BACKOFF
  current_deadline = now() + INITIAL_BACKOFF
  while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT))!= SUCCESS)
    SleepUntil(current_deadline)
    current_backoff = Min(current_backoff *MULTIPLIER, MAX_BACKOFF)
current_deadline = now() + current_backoff + UniformRandom(-JITTER* current_backoff, JITTER * current_backoff)

参数默认值:

  • MIN_CONNECT_TIMEOUT=20sec
  • INITIAL_BACKOFF=1sec
  • MULTIPLIER=1.6
  • MAX_BACKOFF=120sec
  • JITTER=0.2

根据的确切的关注点实现(例如最小化手机的唤醒次数)可能希望使用不同的算法,特别是不同的抖动逻辑。

备用的实现必须确保连接退避在同一时间开始分散,并且不得比上述算法更频繁地尝试连接。

重置backoff

backoff应在某个时间点重置为INITIAL_BACKOFF,以便重新连接行为是一致的,不管连接的是新开始的还是先前断开的连接。

当接收到SETTINGS帧时重置backoff,在那个时候,我们确定这个连接被服务器已经接受了。

grpc/backoff

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// See internal/backoff package for the backoff implementation. This file is
// kept for the exported types and API backward compatibility.

package grpc

import (
	"time"

	"google.golang.org/grpc/backoff"
)

// DefaultBackoffConfig uses values specified for backoff in
// https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md.
//
// Deprecated: use ConnectParams instead. Will be supported throughout 1.x.
var DefaultBackoffConfig = BackoffConfig{
	MaxDelay: 120 * time.Second,
}

// BackoffConfig defines the parameters for the default gRPC backoff strategy.
//
// Deprecated: use ConnectParams instead. Will be supported throughout 1.x.
type BackoffConfig struct {
	// MaxDelay is the upper bound of backoff delay.
	MaxDelay time.Duration
}

// ConnectParams defines the parameters for connecting and retrying. Users are
// encouraged to use this instead of the BackoffConfig type defined above. See
// here for more details:
// https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md.
//
// Experimental
//
// Notice: This type is EXPERIMENTAL and may be changed or removed in a
// later release.
type ConnectParams struct {
	// Backoff specifies the configuration options for connection backoff.
	Backoff backoff.Config
	// MinConnectTimeout is the minimum amount of time we are willing to give a
	// connection to complete.
	MinConnectTimeout time.Duration
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Package backoff provides configuration options for backoff.
//
// More details can be found at:
// https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md.
//
// All APIs in this package are experimental.
package backoff

import "time"

// Config defines the configuration options for backoff.
type Config struct {
	// BaseDelay is the amount of time to backoff after the first failure.
	BaseDelay time.Duration
	// Multiplier is the factor with which to multiply backoffs after a
	// failed retry. Should ideally be greater than 1.
	Multiplier float64
	// Jitter is the factor with which backoffs are randomized.
	Jitter float64
	// MaxDelay is the upper bound of backoff delay.
	MaxDelay time.Duration
}

// DefaultConfig is a backoff configuration with the default values specfied
// at https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md.
//
// This should be useful for callers who want to configure backoff with
// non-default values only for a subset of the options.
var DefaultConfig = Config{
	// 第一次失败之后的延迟时间.
	BaseDelay:  1.0 * time.Second,
	// 多次失败之后的时间乘数.
	Multiplier: 1.6,
	// 随机因子.
	Jitter:     0.2,
	// 最大延迟时间.
	MaxDelay:   120 * time.Second,
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Strategy defines the methodology for backing off after a grpc connection
// failure.
type Strategy interface {
	// Backoff returns the amount of time to wait before the next retry given
	// the number of consecutive failures.
	Backoff(retries int) time.Duration
}

// DefaultExponential is an exponential backoff implementation using the
// default values for all the configurable knobs defined in
// https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md.
var DefaultExponential = Exponential{Config: grpcbackoff.DefaultConfig}

// Exponential implements exponential backoff algorithm as defined in
// https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md.
type Exponential struct {
	// Config contains all options to configure the backoff algorithm.
	Config grpcbackoff.Config
}

// Backoff returns the amount of time to wait before the next retry given the
// number of retries.
func (bc Exponential) Backoff(retries int) time.Duration {
	// 当重试次数为0时直接返回BaseDelay,为1秒.
	if retries == 0 {
		return bc.Config.BaseDelay
	}
	backoff, max := float64(bc.Config.BaseDelay), float64(bc.Config.MaxDelay)
	for backoff < max && retries > 0 {
		// 当backoff小于max且重试次数大于0时不断的乘以Multiplier.
		backoff *= bc.Config.Multiplier
		retries--
	}
	if backoff > max {
		backoff = max
	}
	// Randomize backoff delays so that if a cluster of requests start at
	// the same time, they won't operate in lockstep.
	// 对时间加上一个随机数.
	backoff *= 1 + bc.Config.Jitter*(grpcrand.Float64()*2-1)
	if backoff < 0 {
		return 0
	}
	return time.Duration(backoff)
}

如果默认的backoff算法不满足需求的时候,还可以自定义backoff算法,通过实现backoffStrategy接口。

1
2
3
4
5
6
func withBackoff(bs backoffStrategy) DialOption {
    return func(o *dialOptions) {
        o.bs = bs
    }
}
grpc.Dial(addr, grpc.withBackoff(mybackoff))

参考

grpc-go 连接backoff协议

gRPC系列之连接异常机制