Web cache deception vulnerabilities with Cloudflare and Django
If you’re using Django (or any standards-compliant web application using cookie-based sessions) behind Cloudflare, you need to be aware of a massive limitation in their cache implementation that could lead to caching and leakage of private responses (intended for a specific user presenting their session cookie) to everyone.
Background: the Vary header #
The Vary response header is used by an origin server to tell any downstream caches that the response was conditional on one or more request headers, whose names are listed in the Vary header.
Web applications list all headers they consulted when generating the response. In Django for example, a middleware automatically patches the response’s Vary header to append Cookie in there if you access request.session.
Standards-compliant downstream caches may cache this response (conditional on Cache-Control headers obviously), but they must use all the values of headers listed in Vary as part of the cache key.
A user presenting the same session cookie will have their Cookie header match the one from the cached response, so the cache key gets reconstructed properly and they get their private response from the cache.
An unauthenticated (or one with a different session cookie) user, would generate a different cache key and thus would be unable to access the first user’s private response.
The issue #
The problem is that Cloudflare does not support the header:
Cloudflare does not consider vary values in caching decisions. Nevertheless, vary values are respected when Vary for images is configured and when the vary header is vary: accept-encoding.
This is very dangerous as it ignores the origin’s directive that this response will vary based on request headers; this will lead to broken behavior. If there was a technical limitation preventing supporting this header, the safe thing to do would be to skip caching this response entirely.
I suspect the reason behind the lack of support is intentional, to enable easy drop-in caching for badly-designed software or misconfigured servers that needlessly stamp a generic Vary header on resources that don’t actually need it and would otherwise break caching.
Now this alone wouldn’t be an issue, as you still need something else to cause the request to be cached in the first place. This could be a Cache-Control header, but Django doesn’t set those by default.
However, Cloudflare also doesn’t help us there, because they happen to do some heuristics like parsing the URL looking for certain file extensions to determine the cacheability of a response even in absence of an explicit cache rule or Cache-Control header.
This means that anyone could just request /something.csv on your Django website, and Cloudflare will cache it. If your 404 page already contains some private data (maybe the user’s name/email in the navigation bar?) it’s game over.
However, if your website also contains routes where the user can arbitrarily vary the suffix (such as a SEO-friendly URL where the “slug” is purely for display and not actually used for looking up the record), in that case you could access /my/resource/123-seo-friendly-url.csv and the content of /my/resource/123 will get cached. This could expose a private resource meant to be only visible to the authenticated user.
Exploitation #
Get your target to load a URL meeting the requirements above using their authenticated browser. An <img/> tag will suffice; we don’t intend to read the response or bypass the same-origin policy in any way. You just need them to issue a GET request to that URL using their cookies; the response doesn’t matter (we’ll fetch it shortly directly from Cloudflare).
It’s recommended that every target gets its own unique URL so that multiple concurrent attack attempts don’t trample each other. The URL should ideally also be long and random enough, since it effectively becomes an access token for the cached resource (you wouldn’t want other attackers eavesdropping on your own stolen data right?).
Once you have confirmed the target loaded the attack URL (you can do so by sending them two <img/> tags, one pointing to your server to serve as beacon and wait until the beacon gets a hit) you can then iterate through every Cloudflare datacenter with the help of VPNs and request that same URL, when you hit the same Cloudflare datacenter as your target you will be returned their own response.
Mitigation #
Knowing this limitation, you must configure your origin to set a Cache-Control: no-cache header on every response that has a Vary: Cookie header. This will explicitly tell Cloudflare not to cache the response regardless of their heuristics.
In Django, you can do this with a middleware. Save this class somewhere:
class CloudflareSensitiveResponseCachingFixupMiddleware:
def __init__(self, get_response: Callable[[HttpRequest], HttpResponse]):
self.get_response = get_response
def __call__(self, request: HttpRequest) -> HttpResponse:
response = self.get_response(request)
if has_vary_header(response, "Cookie"):
patch_cache_control(response, private=True)
return response
And then reference it in settings.MIDDLEWARE before SessionMiddleware.
You could also do this at the web server level - detect any responses with Vary set, and override Cache-Control accordingly, but this is left as an exercise to the reader.
Like what you see? I may be available for Python, Django or security-related consulting. Reach out on LinkedIn or via e-mail if you need assistance with technical matters!