[PATCH 00 of 15] Serve all requests from single tempfile

Roman Arutyunyan arut at nginx.com
Mon Feb 7 11:31:43 UTC 2022


Hello,

On Fri, Jan 28, 2022 at 05:31:52PM +0100, Jiří Setnička via nginx-devel wrote:
> Hello!
> 
> Over the last few months, we (a small team of developers including me
> and Jan Prachař, both from CDN77) developed a missing feature for the
> proxy caching in Nginx. We are happy to share this feature with the
> community in the following patch series.
> 
> We serve a large number of files to an immense number of clients and
> often multiple clients want the same file at the very same time -
> especially when it came to streaming (when a file is crafted on the
> upstream in real-time and getting it could take seconds).
> 
> Previously there were two options in Nginx when using proxy caching:
> 
> * pass all incoming requests to the origin
> * use proxy_cache_lock feature, pass only the first request (served in
>   real-time) and let other requests wait until the first request
>   completion
> 
> We didn't like any of these options (the first one effectively disables
> CDN and the second one is unusable for streaming). We considered using
> Varnish, which solves this problem better, but we are very happy with
> the Nginx infrastructure we have. Thus we came with the third option.
> 
> We developed the proxy_cache_tempfile mechanism, which acts similarly to
> the proxy_cache_lock, but instead of locking other requests waiting for
> the file completion, we open the tempfile used by the primary request
> and periodically serve parts of it to the waiting requests.
> 
> Because there may be multiple tempfiles for the same file (for example
> when the file expires before it is fully downloaded), we use shared
> memory per cache with `ngx_http_file_cache_tf_node_t` for each created
> tempfiles to synchronize all workers. When a new request is passed to
> the origin, we record its tempfile number and when another request is
> received, we try to open tempfile with this number and serve from it.
> When tempfile is already used for some secondary request, it sticks with
> this same tempfile until its completion.
> 
> To accomplish this we rely on the POSIX filesystem feature, when you can
> open file and retain its file descriptor even when it is moved to a new
> location (on the same filesystem). I'm afraid that this would be hard to
> accomplish on Windows and this feature will be non-Windows only.
> 
> We tested this feature thoroughly for the last few months and we use
> it already in part of our infrastructure without noticing any negative
> impact, We noticed only a very small increase in memory usage and a
> minimal increase in CPU and disk io usage (which corresponds with the
> increased throughput of the server).
> 
> We also did some synthetic benchmarks where we compared vanilla nginx
> and our patched version with and without cache lock and with cache
> tempfiles. Results of the benchmarks, charts, and scripts we used for it
> are available on my Github:
> 
>   https://github.com/setnicka/nginx-tempfiles-benchmark
> 
> It should work also for fastcgi, uwsgi, and scgi caches (as it uses
> internally the same mechanism), but we didn't do testing of these.
> 
> New config:
> 
> * proxy_cache_tempfile on;         -- activate the whole tempfile logic
> * proxy_cache_tempfile_timeout 5s; -- how long to wait for tempfile before 504
> * proxy_cache_tempfile_loop 50ms;  -- loop time for check tempfiles
> (ans same for fastcgi_cache, uwsgi_cache and scgi_cache)
> 
> New option for proxy_cache_path: tf_zone=name:size (defaults to key zone
> name with _tf suffix and 10M size). It creates a shared memory zone used
> to store tempfiles nodes.
> 
> We would be very grateful for any reviews and other testing.
> 
> Jiří Setnička
> CDN77
> _______________________________________________
> nginx-devel mailing list -- nginx-devel at nginx.org
> To unsubscribe send an email to nginx-devel-leave at nginx.org

Thanks for sharing your work.  Indeed, nginx currently lacks a good solution
for serving a file that's being downloaded from upstream.  We tried to address
this issue a few years ago.  Our solution was similar to yours, but instead
of sharing the temp file between workers, we moved the temp file to its
destination right after writing the header.  A new bit was added to the header
signalling that this file is being updated.

The biggest issue with this kind of solutions is how we wait for updates in
a file.  We believe that polling a file with a given time interval is not a
perfect approach, even though nginx does that for cache locks.  Some systems
provide ways to avoid this.  For example, BSD systems have kqueue which
allows to wait for file updates.  On Linux inotify can do similar things,
but the number of watches is limited.  Another approach would be to create an
inter-worker messaging system for signalling file updates.

It's good to know the solution works for you.  Please keep us posted about
future improvements especially the ones which would avoid polling and decrease
complexity.

-- 
Roman Arutyunyan



More information about the nginx-devel mailing list