背景

最近在做一个 web 版的展示大屏,前端靠 HTTP(S)+JSON 和后端交互,部分图形是密集的地理点位和时间序列,HTTP 返回数据量较大,公网上加载速度不佳。

调研

考虑压缩 HTTP 返回,选了 gzip 这个常规选项。

gzip 在压缩时,得考虑几点:

  • 内容类型是否对压缩友好:例如 JPEG 本身已经压缩过,再次压缩收益太小,费力不讨好;
  • 内容大小是否值得压缩:太小的内容压缩效果不好(甚至有可能比原文还大,毕竟有字典等开销),另外小内容本身传输时间短,直接放过去还能节约 CPU 和内存。

Nginx

  • Nginx 的压缩一般是用 ngx_http_gzip_module;
  • 根据 Content-Type 判断内容类型是否对压缩友好,要压缩的类型由用户定义;
  • 根据返回的 Content-Length 判断内容大小是否值得压缩;

当返回的 JSON 大于 2KB 时,Golang 会使用 chunk 的形式传输,这个时候没有 Content-Length,ngx_http_gzip_module 不会进行内容压缩。

gin-contrib/gzip

  • 根据扩展名判断内容类型是否对压缩友好;
  • 根据 Path 判断是否启用压缩;
  • 不支持判断内容大小是否值得压缩。

根据 Path 来判断确实可行,但太死板,业务和中间件耦合了。

nanmu42/gzip

  • 支持 Gin 和 标准库 net/http ;
  • 支持基于 Content-Type、Content-Length、扩展名判断是否压缩;
  • 启用压缩的阈值用户可以自定义;
  • 压缩级别用户可以自定义;
  • 不压缩已经压缩过的返回,不压缩 Head 请求的返回,不影响 HTTP Upgrade ;
  • 中间件初始化简单,集成容易;

更进一步

  • 还记得返回的 JSON 大于 2KB 时,Golang 会使用 chunk 的形式传输,这个时候没有 Content-Length 带来无法判断内容是否值得压缩的问题吗?

  • 我取了个巧,如果 Content-Length 不存在,中间件会去观察 http.ResponseWriter.Write(data []byte) 的第一次调用时的 len(data),如果此时 len(data) 已经大于启用压缩的阈值,那么可以安全地开始压缩。

效率

  • 当返回体积不大时,Handler会智能地跳过压缩,这个过程带来的代价可以忽略不记;
  • 当返回体积足够大时,Handler会进行gzip压缩,这个过程有着合理的代价。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ go test -benchmem -bench .
goos: linux
goarch: amd64
pkg: github.com/nanmu42/gzip
BenchmarkSoleGin_SmallPayload-4                          4104684               276 ns/op              64 B/op          2 allocs/op
BenchmarkGinWithDefaultHandler_SmallPayload-4            1683307               707 ns/op              96 B/op          3 allocs/op
BenchmarkSoleGin_BigPayload-4                            4198786               274 ns/op              64 B/op          2 allocs/op
BenchmarkGinWithDefaultHandler_BigPayload-4                44780             27636 ns/op             190 B/op          5 allocs/op
PASS
ok      github.com/nanmu42/gzip 6.373s

局限性

  • 你应该总是在返回中提供Content-Type。虽然Handler会在Content-Type缺失时使用http.DetectContentType()进行猜测,但是效果并没有那么好;

  • 返回的Content-Length 缺失时,Handler可能会缓冲返回的报文数据以决定报文是否大到值得进行压缩,如果MinContentLength设置得太大,这个过程可能会带来内存压力。Handler针对这个情况做了一些优化,例如查看http.ResponseWriter.Write(data []byte)在首次调用时的 len(data),以及资源复用。

AC 自动机

原本我使用 Strings.Contains() 配合循环来判断文件后缀 /MIME 是否在支持压缩的列表中,但 benchmark 下来效果不太好。做了一些搜索后发现 Cloudflare 实现了一个 AC 自动机来做这个事情。和维护者聊了聊之后,我用了它的一个 fork: https://github.com/signalsciences/ac

Sync.Pool

Sync.Pool 用来做对象重用,以降低系统内存分配和 Go 垃圾回收的压力,一开始我只对 gzip.Writer 做了对象重用,但发现中间件对内存的影响还有一些大,后来我用了第二个 Sync.Pool 重用 wrapper,内存使用量和 CPU 时间都有了可观的改善。

两个调优之后,CPU 时间下降为调优前的 40%,内存使用量下降为原先的一半。

使用示例

默认设置DefaultHandler()可以满足大部分场景。

Gin

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
package main

import (
	"fmt"
	"log"
	"net/http"
	"strings"

	"github.com/gin-gonic/gin"
	"github.com/nanmu42/gzip"
)

func main() {
	g := gin.Default()

	g.Use(gzip.DefaultHandler().Gin)

	g.GET("/", func(c *gin.Context) {
		c.JSON(http.StatusOK, map[string]interface{}{
			"code": 0,
			"msg":  "hello",
			"data": "GET /short and GET /long to have a try!",
		})
	})

	// short response will not be compressed
	g.GET("/short", func(c *gin.Context) {
		c.JSON(http.StatusOK, map[string]interface{}{
			"code": 0,
			"msg":  "This content is not long enough to be compressed.",
			"data": "short!",
		})
	})

	// long response that will be compressed by gzip
	g.GET("/long", func(c *gin.Context) {
		c.JSON(http.StatusOK, map[string]interface{}{
			"code": 0,
			"msg":  "This content is compressed",
			"data": fmt.Sprintf("l%sng!", strings.Repeat("o", 1000)),
		})
	})

	const port = 3000

	log.Printf("Service is litsenning on port %d...", port)
	log.Println(g.Run(fmt.Sprintf(":%d", port)))
}

net/http

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package main

import (
	"fmt"
	"io"
	"log"
	"net/http"
	"strings"

	"github.com/nanmu42/gzip"
)

func main() {
	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		writeString(w, "GET /short and /long to have a try!")
	})
	mux.HandleFunc("/short", func(w http.ResponseWriter, r *http.Request) {
		writeString(w, "This content is not long enough to be compressed.")
	})
	mux.HandleFunc("/long", func(w http.ResponseWriter, r *http.Request) {
		writeString(w, fmt.Sprintf("This content is compressed: l%sng!", strings.Repeat("o", 1000)))
	})

	const port = 3001

	log.Printf("Service is litsenning on port %d...", port)
	log.Println(http.ListenAndServe(fmt.Sprintf(":%d", port), gzip.DefaultHandler().WrapHandler(mux)))
}

func writeString(w http.ResponseWriter, payload string) {
	w.Header().Set("Content-Type", "text/plain; charset=utf8")
	_, _ = io.WriteString(w, payload+"\n")
}

定制Handler

使用NewHandler()可以定制参数以满足你的特殊需要:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import github.com/nanmu42/gzip

handler := gzip.NewHandler(gzip.Config{
    // gzip压缩等级
	CompressionLevel: 6,
    // 触发gzip的最小body体积,单位:byte
	MinContentLength: 1024,
    // 请求过滤器基于请求来判断是否对这条请求的返回启用gzip,
    // 过滤器按其定义顺序执行,下同。
	RequestFilter: []RequestFilter{
	    NewCommonRequestFilter(),
	    DefaultExtensionFilter(),
	},
    // 返回header过滤器基于返回的header判断是否对这条请求的返回启用gzip
	ResponseHeaderFilter: []ResponseHeaderFilter{
		NewSkipCompressedFilter(),
		DefaultContentTypeFilter(),
	},
})

RequestFilter 和 ResponseHeaderFilter 是 interface. 你可以实现你自己的过滤器。

源码

handler

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
package gzip

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"net"
	"net/http"
	"sync"

	"github.com/gin-gonic/gin"
	"github.com/klauspost/compress/gzip"
)

// These constants are copied from the gzip package
const (
	NoCompression      = gzip.NoCompression
	BestSpeed          = gzip.BestSpeed
	BestCompression    = gzip.BestCompression
	DefaultCompression = gzip.DefaultCompression
	HuffmanOnly        = gzip.HuffmanOnly
	// Stateless will do compression but without maintaining any state
	// between Write calls, so long running responses will not take memory.
	// There will be no memory kept between Write calls,
	// but compression and speed will be suboptimal.
	// Because of this, the size of actual Write calls will affect output size.
	Stateless = gzip.StatelessCompression
)

// Config is used in Handler initialization
type Config struct {
	// gzip compression level to use,
	// valid value: -3 => 9.
	//
	// see https://golang.org/pkg/compress/gzip/#NewWriterLevel
	CompressionLevel int
	// Minimum content length to trigger gzip,
	// the unit is in byte.
	//
	// When `Content-Length` is not available, handler may buffer your writes to
	// decide if its big enough to do a meaningful compression.
	// A high `MinContentLength` may bring memory overhead,
	// although the handler tries to be smart by reusing buffers
	// and testing if `len(data)` of the first
	// `http.ResponseWriter.Write(data []byte)` calling suffices or not.
	MinContentLength int64
	// Filters are applied in the sequence here
	RequestFilter []RequestFilter
	// Filters are applied in the sequence here
	ResponseHeaderFilter []ResponseHeaderFilter
}

// Handler implement gzip compression for gin and net/http
type Handler struct {
	compressionLevel     int
	minContentLength     int64
	requestFilter        []RequestFilter
	responseHeaderFilter []ResponseHeaderFilter
	gzipWriterPool       sync.Pool
	wrapperPool          sync.Pool
}

// NewHandler initialized a costumed gzip handler to take care of response compression.
//
// config must not be modified after calling on NewHandler()
func NewHandler(config Config) *Handler {
	if config.CompressionLevel < Stateless || config.CompressionLevel > BestCompression {
		panic(fmt.Sprintf("gzip: invalid CompressionLevel: %d", config.CompressionLevel))
	}
	if config.MinContentLength <= 0 {
		panic(fmt.Sprintf("gzip: invalid MinContentLength: %d", config.MinContentLength))
	}

	handler := Handler{
		compressionLevel:     config.CompressionLevel,
		minContentLength:     config.MinContentLength,
		requestFilter:        config.RequestFilter,
		responseHeaderFilter: config.ResponseHeaderFilter,
	}

	handler.gzipWriterPool.New = func() interface{} {
		writer, _ := gzip.NewWriterLevel(ioutil.Discard, handler.compressionLevel)
		return writer
	}
	handler.wrapperPool.New = func() interface{} {
		return newWriterWrapper(handler.responseHeaderFilter, handler.minContentLength, nil, handler.getGzipWriter, handler.putGzipWriter)
	}

	return &handler
}

var defaultConfig = Config{
	CompressionLevel: 6,
	MinContentLength: 1 * 1024,
	RequestFilter: []RequestFilter{
		NewCommonRequestFilter(),
		DefaultExtensionFilter(),
	},
	ResponseHeaderFilter: []ResponseHeaderFilter{
		NewSkipCompressedFilter(),
		DefaultContentTypeFilter(),
	},
}

// DefaultHandler creates a gzip handler to take care of response compression,
// with meaningful preset.
func DefaultHandler() *Handler {
	return NewHandler(defaultConfig)
}

func (h *Handler) getGzipWriter() *gzip.Writer {
	return h.gzipWriterPool.Get().(*gzip.Writer)
}

func (h *Handler) putGzipWriter(w *gzip.Writer) {
	if w == nil {
		return
	}

	_ = w.Close()
	w.Reset(ioutil.Discard)
	h.gzipWriterPool.Put(w)
}

func (h *Handler) getWriteWrapper() *writerWrapper {
	return h.wrapperPool.Get().(*writerWrapper)
}

func (h *Handler) putWriteWrapper(w *writerWrapper) {
	if w == nil {
		return
	}

	w.FinishWriting()
	w.OriginWriter = nil
	h.wrapperPool.Put(w)
}

type ginGzipWriter struct {
	wrapper      *writerWrapper
	originWriter gin.ResponseWriter
}

// interface guard
var _ gin.ResponseWriter = (*ginGzipWriter)(nil)

func (g *ginGzipWriter) WriteHeaderNow() {
	g.wrapper.WriteHeaderNow()
}

func (g *ginGzipWriter) Hijack() (net.Conn, *bufio.ReadWriter, error) {
	return g.originWriter.Hijack()
}

func (g *ginGzipWriter) CloseNotify() <-chan bool {
	return g.originWriter.CloseNotify()
}

func (g *ginGzipWriter) Status() int {
	return g.wrapper.Status()
}

func (g *ginGzipWriter) Size() int {
	return g.wrapper.Size()
}

func (g *ginGzipWriter) Written() bool {
	return g.wrapper.Written()
}

func (g *ginGzipWriter) Pusher() http.Pusher {
	// TODO: not sure how to implement gzip for HTTP2
	return nil
}

// WriteString implements interface gin.ResponseWriter
func (g *ginGzipWriter) WriteString(s string) (int, error) {
	return g.wrapper.Write([]byte(s))
}

// Write implements interface gin.ResponseWriter
func (g *ginGzipWriter) Write(data []byte) (int, error) {
	return g.wrapper.Write(data)
}

// WriteHeader implements interface gin.ResponseWriter
func (g *ginGzipWriter) WriteHeader(code int) {
	g.wrapper.WriteHeader(code)
}

// WriteHeader implements interface gin.ResponseWriter
func (g *ginGzipWriter) Header() http.Header {
	return g.wrapper.Header()
}

// Flush implements http.Flusher
func (g *ginGzipWriter) Flush() {
	g.wrapper.Flush()
}

// Gin implement gin's middleware
func (h *Handler) Gin(c *gin.Context) {
	var shouldCompress = true

	for _, filter := range h.requestFilter {
		shouldCompress = filter.ShouldCompress(c.Request)
		if !shouldCompress {
			break
		}
	}

	if shouldCompress {
		wrapper := h.getWriteWrapper()
		wrapper.Reset(c.Writer)
		originWriter := c.Writer
		c.Writer = &ginGzipWriter{
			originWriter: c.Writer,
			wrapper:      wrapper,
		}
		defer func() {
			h.putWriteWrapper(wrapper)
			c.Writer = originWriter
		}()
	}

	c.Next()
}

// WrapHandler wraps a http.Handler, returning its gzip-enabled version
func (h *Handler) WrapHandler(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		var shouldCompress = true

		for _, filter := range h.requestFilter {
			shouldCompress = filter.ShouldCompress(r)
			if !shouldCompress {
				break
			}
		}

		if shouldCompress {
			wrapper := h.getWriteWrapper()
			wrapper.Reset(w)
			originWriter := w
			w = wrapper
			defer func() {
				h.putWriteWrapper(wrapper)
				w = originWriter
			}()
		}

		next.ServeHTTP(w, r)
	})
}

requestfilters

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
package gzip

import (
	"net/http"
	"path"
	"strings"

	"github.com/signalsciences/ac/acascii"
)

// RequestFilter decide whether or not to compress response judging by request
type RequestFilter interface {
	// ShouldCompress decide whether or not to compress response,
	// judging by request
	ShouldCompress(req *http.Request) bool
}

// interface guards
var (
	_ RequestFilter = (*CommonRequestFilter)(nil)
	_ RequestFilter = (*ExtensionFilter)(nil)
)

// CommonRequestFilter judge via common easy criteria like
// http method, accept-encoding header, etc.
type CommonRequestFilter struct{}

// NewCommonRequestFilter ...
func NewCommonRequestFilter() *CommonRequestFilter {
	return &CommonRequestFilter{}
}

// ShouldCompress implements RequestFilter interface
func (c *CommonRequestFilter) ShouldCompress(req *http.Request) bool {
	return req.Method != http.MethodHead &&
		req.Method != http.MethodOptions &&
		req.Header.Get("Upgrade") == "" &&
		strings.Contains(req.Header.Get("Accept-Encoding"), "gzip")
}

// ExtensionFilter judge via the extension in path
//
// Omit this filter if you want to compress all extension.
type ExtensionFilter struct {
	Exts       *acascii.Matcher
	AllowEmpty bool
}

// NewExtensionFilter returns a extension or panics
func NewExtensionFilter(extensions []string) *ExtensionFilter {
	var (
		exts       = make([]string, 0, len(extensions))
		allowEmpty bool
	)

	for _, item := range extensions {
		if item == "" {
			allowEmpty = true
			continue
		}
		exts = append(exts, item)
	}

	return &ExtensionFilter{
		Exts:       acascii.MustCompileString(exts),
		AllowEmpty: allowEmpty,
	}
}

// ShouldCompress implements RequestFilter interface
func (e *ExtensionFilter) ShouldCompress(req *http.Request) bool {
	ext := path.Ext(req.URL.Path)
	if ext == "" {
		return e.AllowEmpty
	}
	return e.Exts.MatchString(ext)
}

// defaultExtensions is the list of default extensions for which to enable gzip.
// original source:
// https://github.com/caddyserver/caddy/blob/7fa90f08aee0861187236b2fbea16b4fa69c5a28/caddyhttp/gzip/requestfilter.go#L32
var defaultExtensions = []string{"", ".txt", ".htm", ".html", ".css", ".php", ".js", ".json",
	".md", ".mdown", ".xml", ".svg", ".go", ".cgi", ".py", ".pl", ".aspx", ".asp", ".m3u", ".m3u8", ".wasm"}

// DefaultExtensionFilter permits
func DefaultExtensionFilter() *ExtensionFilter {
	return NewExtensionFilter(defaultExtensions)
}

responsefilters

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
package gzip

import (
	"net/http"

	"github.com/signalsciences/ac/acascii"
)

// ResponseHeaderFilter decide whether or not to compress response
// judging by response header
type ResponseHeaderFilter interface {
	// ShouldCompress decide whether or not to compress response,
	// judging by response header
	ShouldCompress(header http.Header) bool
}

// interface guards
var (
	_ ResponseHeaderFilter = (*SkipCompressedFilter)(nil)
	_ ResponseHeaderFilter = (*ContentTypeFilter)(nil)
)

// SkipCompressedFilter judges whether content has been
// already compressed
type SkipCompressedFilter struct{}

// NewSkipCompressedFilter ...
func NewSkipCompressedFilter() *SkipCompressedFilter {
	return &SkipCompressedFilter{}
}

// ShouldCompress implements ResponseHeaderFilter interface
//
// Content-Encoding: https://tools.ietf.org/html/rfc2616#section-3.5
func (s *SkipCompressedFilter) ShouldCompress(header http.Header) bool {
	return header.Get("Content-Encoding") == "" && header.Get("Transfer-Encoding") == ""
}

// ContentTypeFilter judge via the response content type
//
// Omit this filter if you want to compress all content type.
type ContentTypeFilter struct {
	Types      *acascii.Matcher
	AllowEmpty bool
}

// NewContentTypeFilter ...
func NewContentTypeFilter(types []string) *ContentTypeFilter {
	var (
		nonEmpty   = make([]string, 0, len(types))
		allowEmpty bool
	)

	for _, item := range types {
		if item == "" {
			allowEmpty = true
			continue
		}
		nonEmpty = append(nonEmpty, item)
	}

	return &ContentTypeFilter{
		Types:      acascii.MustCompileString(nonEmpty),
		AllowEmpty: allowEmpty,
	}
}

// ShouldCompress implements RequestFilter interface
func (e *ContentTypeFilter) ShouldCompress(header http.Header) bool {
	contentType := header.Get("Content-Type")

	if contentType == "" {
		return e.AllowEmpty
	}

	return e.Types.MatchString(contentType)
}

// defaultContentType is the list of default content types for which to enable gzip.
// original source:
// https://support.cloudflare.com/hc/en-us/articles/200168396-What-will-Cloudflare-compress-
var defaultContentType = []string{"text/html", "text/richtext", "text/plain", "text/css", "text/x-script", "text/x-component", "text/x-java-source", "text/x-markdown", "application/javascript", "application/x-javascript", "text/javascript", "text/js", "image/x-icon", "application/x-perl", "application/x-httpd-cgi", "text/xml", "application/xml", "application/xml+rss", "application/json", "multipart/bag", "multipart/mixed", "application/xhtml+xml", "font/ttf", "font/otf", "font/x-woff", "image/svg+xml", "application/vnd.ms-fontobject", "application/ttf", "application/x-ttf", "application/otf", "application/x-otf", "application/truetype", "application/opentype", "application/x-opentype", "application/font-woff", "application/eot", "application/font", "application/font-sfnt", "application/wasm"}

// DefaultContentTypeFilter permits
func DefaultContentTypeFilter() *ContentTypeFilter {
	return NewContentTypeFilter(defaultContentType)
}

writerwrapper

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
package gzip

import (
	"fmt"
	"net/http"
	"strconv"
	"strings"

	"github.com/klauspost/compress/gzip"
)

// writerWrapper wraps the originalHandler
// to test whether to gzip and gzip the body if applicable.
type writerWrapper struct {
	// header filter are applied by its sequence
	Filters []ResponseHeaderFilter
	// min content length to enable compress
	MinContentLength int64
	OriginWriter     http.ResponseWriter
	// use initGzipWriter() to init gzipWriter when in need
	GetGzipWriter func() *gzip.Writer
	// must close gzip writer and put it back to pool
	PutGzipWriter func(*gzip.Writer)

	// internal below
	// *** WARNING ***
	// *writerWrapper.Reset() method must be updated
	// upon following field changing

	// compress or not
	// default to true
	shouldCompress bool
	// whether body is large enough
	bodyBigEnough bool
	// is header already flushed?
	headerFlushed         bool
	responseHeaderChecked bool
	statusCode            int
	// how many raw bytes has been written
	size       int
	gzipWriter *gzip.Writer
	bodyBuffer []byte
}

// interface guard
var _ http.ResponseWriter = (*writerWrapper)(nil)
var _ http.Flusher = (*writerWrapper)(nil)

func newWriterWrapper(filters []ResponseHeaderFilter, minContentLength int64, originWriter http.ResponseWriter, getGzipWriter func() *gzip.Writer, putGzipWriter func(*gzip.Writer)) *writerWrapper {
	return &writerWrapper{
		shouldCompress:   true,
		bodyBuffer:       make([]byte, 0, minContentLength),
		Filters:          filters,
		MinContentLength: minContentLength,
		OriginWriter:     originWriter,
		GetGzipWriter:    getGzipWriter,
		PutGzipWriter:    putGzipWriter,
	}
}

// Reset the wrapper into a fresh one,
// writing to originWriter
func (w *writerWrapper) Reset(originWriter http.ResponseWriter) {
	w.OriginWriter = originWriter

	// internal below

	// reset status with caution
	// all internal fields should be taken good care
	w.shouldCompress = true
	w.headerFlushed = false
	w.responseHeaderChecked = false
	w.bodyBigEnough = false
	w.statusCode = 0
	w.size = 0

	if w.gzipWriter != nil {
		w.PutGzipWriter(w.gzipWriter)
		w.gzipWriter = nil
	}
	if w.bodyBuffer != nil {
		w.bodyBuffer = w.bodyBuffer[:0]
	}
}

func (w *writerWrapper) Status() int {
	return w.statusCode
}

func (w *writerWrapper) Size() int {
	return w.size
}

func (w *writerWrapper) Written() bool {
	return w.headerFlushed || len(w.bodyBuffer) > 0
}

func (w *writerWrapper) WriteHeaderCalled() bool {
	return w.statusCode != 0
}

func (w *writerWrapper) initGzipWriter() {
	w.gzipWriter = w.GetGzipWriter()
	w.gzipWriter.Reset(w.OriginWriter)
}

// Header implements http.ResponseWriter
func (w *writerWrapper) Header() http.Header {
	return w.OriginWriter.Header()
}

// Write implements http.ResponseWriter
func (w *writerWrapper) Write(data []byte) (int, error) {
	w.size += len(data)

	if !w.WriteHeaderCalled() {
		w.WriteHeader(http.StatusOK)
	}

	if !w.shouldCompress {
		return w.OriginWriter.Write(data)
	}
	if w.bodyBigEnough {
		return w.gzipWriter.Write(data)
	}

	// fast check
	if !w.responseHeaderChecked {
		w.responseHeaderChecked = true

		header := w.Header()
		for _, filter := range w.Filters {
			w.shouldCompress = filter.ShouldCompress(header)
			if !w.shouldCompress {
				w.WriteHeaderNow()
				return w.OriginWriter.Write(data)
			}
		}

		if w.enoughContentLength() {
			w.bodyBigEnough = true
			w.WriteHeaderNow()
			w.initGzipWriter()
			return w.gzipWriter.Write(data)
		}
	}

	if !w.writeBuffer(data) {
		w.bodyBigEnough = true

		// detect Content-Type if there's none
		if header := w.Header(); header.Get("Content-Type") == "" {
			header.Set("Content-Type", http.DetectContentType(w.bodyBuffer))
		}

		w.WriteHeaderNow()
		w.initGzipWriter()
		if len(w.bodyBuffer) > 0 {
			written, err := w.gzipWriter.Write(w.bodyBuffer)
			if err != nil {
				err = fmt.Errorf("w.gzipWriter.Write: %w", err)
				return written, err
			}
		}
		return w.gzipWriter.Write(data)
	}

	return len(data), nil
}

func (w *writerWrapper) writeBuffer(data []byte) (fit bool) {
	if int64(len(data)+len(w.bodyBuffer)) > w.MinContentLength {
		return false
	}

	w.bodyBuffer = append(w.bodyBuffer, data...)
	return true
}

func (w *writerWrapper) enoughContentLength() bool {
	contentLength, err := strconv.ParseInt(w.Header().Get("Content-Length"), 10, 64)
	if err != nil {
		return false
	}
	if contentLength != 0 && contentLength >= w.MinContentLength {
		return true
	}

	return false
}

// WriteHeader implements http.ResponseWriter
//
// WriteHeader does not really calls originalHandler's WriteHeader,
// and the calling will actually be handler by WriteHeaderNow().
//
// http.ResponseWriter does not specify clearly whether permitting
// updating status code on second call to WriteHeader(), and it's
// conflicting between http and gin's implementation.
// Here, gzip consider second(and furthermore) calls to WriteHeader()
// valid. WriteHeader() is disabled after flushing header.
// Do note setting status code to 204 or 304 marks content uncompressable,
// and a later status code change does not revert this.
func (w *writerWrapper) WriteHeader(statusCode int) {
	if w.headerFlushed {
		return
	}

	w.statusCode = statusCode

	if !w.shouldCompress {
		return
	}

	if statusCode == http.StatusNoContent ||
		statusCode == http.StatusNotModified {
		w.shouldCompress = false
		return
	}
}

// WriteHeaderNow Forces to write the http header (status code + headers).
//
// WriteHeaderNow must always be called and called after
// WriteHeader() is called and
// w.shouldCompress is decided.
//
// This method is usually called by gin's AbortWithStatus()
func (w *writerWrapper) WriteHeaderNow() {
	if w.headerFlushed {
		return
	}

	// if neither WriteHeader() or Write() are called,
	// do nothing
	if !w.WriteHeaderCalled() {
		return
	}

	if w.shouldCompress {
		header := w.Header()
		header.Del("Content-Length")
		header.Set("Content-Encoding", "gzip")
		header.Add("Vary", "Accept-Encoding")
		originalEtag := w.Header().Get("ETag")
		if originalEtag != "" && !strings.HasPrefix(originalEtag, "W/") {
			w.Header().Set("ETag", "W/"+originalEtag)
		}
	}

	w.OriginWriter.WriteHeader(w.statusCode)

	w.headerFlushed = true
}

// FinishWriting flushes header and closed gzip writer
//
// Write() and WriteHeader() should not be called
// after FinishWriting()
func (w *writerWrapper) FinishWriting() {
	// still buffering
	if w.shouldCompress && !w.bodyBigEnough {
		w.shouldCompress = false
		w.WriteHeaderNow()
		if len(w.bodyBuffer) > 0 {
			_, _ = w.OriginWriter.Write(w.bodyBuffer)
		}
	}

	w.WriteHeaderNow()
	if w.gzipWriter != nil {
		w.PutGzipWriter(w.gzipWriter)
		w.gzipWriter = nil
	}
}

// Flush implements http.Flusher
func (w *writerWrapper) Flush() {
	w.FinishWriting()

	if flusher, ok := w.OriginWriter.(http.Flusher); ok {
		flusher.Flush()
	}
}

参考

经历心得分享:用 Web 框架 Gin 开发 API,但业务返回的 JSON 太大,调研了一圈,最后自己写了个中间件做 gzip 压缩,记录下调研和开发调优时的心得