Connection drops at heise online due to cookies – a trace
The web development team at heise online reports on an interesting bug hunt whose cause turned out to be very simple in the end.
(Image: Zakharchuk/Shutterstock.com)
In web development, we don't just write new software; we also receive error messages. Most of the time, we can help quickly or at least schedule the bug fix for one of the upcoming sprints. But some errors are more persistent and ultimately have a very simple cause. Today, we're talking about one such error.
For a while, we repeatedly received reports that users were unable to establish a connection to www.heise.de with the error message ERR_HTTP2_PROTOCOL_ERROR. It quickly became clear that the affected users had a few things in common: they all used Chrome as their browser and were regular visitors to our site. While this narrowed down the error somewhat, our biggest problem was that we ourselves couldn't reproduce the error for a long time.
Videos by heise
Many Cookies
Nevertheless, the considerations continued. What do users (unfortunately, nowadays) collect in large quantities when they are on a largely ad-financed site like heise online? – Cookies. A test with affected users then at least provided a workaround: Deleting cookies helped.
Initially, we suspected the cookie size and tested with particularly large cookies, but we couldn't reproduce the issue for ourselves even with that. Then, a colleague from the editorial team reported the same error – he even encountered it regularly. We asked him for help in solving it, and he let us know as soon as the bug occurred again. Finally, we could observe the issue directly.
Using tcpdump, we captured the network traffic between us and the browser on the load balancer (BigIP), which terminates TLS and HTTP2. It turned out that BigIP itself was terminating the HTTP2 connection due to a “protocol error.” Since heise online doesn't have a direct connection from the user's browser to our web server but rather various (network) infrastructure in between, it was very helpful for us to identify the point where the connection breaks and which part of the chain triggers this disconnection.
A Look into the Bug Reports
With the insights gained, we scoured the Chrome bug reports. One report included an HTTP2 protocol log, where we could see that Chrome was sending each cookie with a separate Set-Cookie header in the HTTP2 request. This gave us the idea to experiment not with cookie size but with sheer quantity, and lo and behold: the problem could be reproduced with very many, small cookies.
From here, it became simple. With the help of our admins, we found a setting in BigIP that set the maximum allowed number of headers. We then increased this limit significantly, and the issue was solved. At least for now, because naturally, the new higher limit can also be reached with even more cookie headers, and the error would return.
However, a few things about the error are still interesting. In HTTP/1.x, multiple cookie headers were still not allowed (see RFC 6265), but in HTTP/2, the user agent can send each cookie as an individual header (see RFC 7540), and that's exactly what Chrome did here. This behavior is obviously an optimization because the transmission of repeating headers can be enormously optimized in HTTP/2 with HPACK header compression (see RFC 7541). However, this only works for headers that don't change constantly. A large cookie header for all cookies would therefore have to be transmitted completely anew every time even a single cookie changes.
Unfortunately, Chrome didn't show any of this in the developer tools. It always lists only one cookie header, which didn't necessarily make troubleshooting easier.
Whether this is a solution or just a band-aid, time will tell. However, the root cause analysis was definitely one of the more interesting investigations in everyday developer life.
(rme)