<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Hi</div><div><br><blockquote type="cite"><div><blockquote type="cite">From the output you provided it looks like all nginx workers are <br></blockquote>locked out, either doing something or waiting for some system <br>resources. As you can see - all connections accepted by nginx (6 <br>connections which have nginx process listed in pid column) are in <br>CLOSE_WAIT state, and there are other connections to port 80 which <br>are sitting in listen queue. Am I right in the assumption that <br>nginx does not answer any requests?<br></div></blockquote><div><br></div><div>Yes, that's the issue. nginx becomes unresponsive at this point until I restart it.</div><br><blockquote type="cite"><div>Note well: you haven't posted full config you use, so please check <br>yourself for possible loops in it. I've recently posted some <br>patches which take care of several loops which aren't automatically <br>resolved now, see here for patch and example loops:<br><br><a href="http://nginx.org/pipermail/nginx-devel/2010-January/000099.html">http://nginx.org/pipermail/nginx-devel/2010-January/000099.html</a><br><br>It should be trivial to find if it's the cause though, as nginx <br>worker will eat 100% cpu once caught in such loop.<br></div></blockquote><div><br></div><div>I have a monitoring script that detects these situations (wget can't download from localhost with a 20s timeout) and restarts nginx, but before that it captures a netstat -nap, ps and other system metrics. This is an example of what ps shows:</div><div><br></div><div><div>www-data 24610 0.0 0.1 7476 2452 ? S 07:44 0:00 nginx: worker process</div><div>www-data 24611 0.0 0.1 7668 2412 ? S 07:44 0:00 nginx: worker process</div><div>www-data 24612 0.0 0.1 7668 2416 ? S 07:44 0:00 nginx: worker process</div><div>www-data 24613 0.0 0.1 7736 2624 ? S 07:44 0:00 nginx: worker process</div><div><br></div><div>And vmstat:</div><div><br></div><div><div>procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----</div><div> r b swpd free buff cache si so bi bo in cs us sy id wa</div><div> 2 0 440 157012 181076 1180340 0 0 2 32 27 46 2 0 95 0</div><div> 0 0 440 156904 181076 1180340 0 0 0 0 26 28 2 0 94 0</div><div> 0 0 440 156888 181076 1180348 0 0 0 0 13 24 0 0 100 0</div><div> 0 0 440 156888 181076 1180348 0 0 0 0 12 21 0 0 100 0</div><div> 0 0 440 156888 181080 1180348 0 0 0 128 22 34 0 0 99 1</div></div><div><br></div><div>So the nginx processes don't seem to be in a loop, CPU use is negligible.</div></div><br><blockquote type="cite"><div>Note well 2: I've already asked you to try compiling without third <br>party modules and patches and check if you are able to reproduce <br>the problem. It doesn't really make sense to proceed any further <br>without doing this.<br></div></blockquote><div><br></div><div>I have to admit I still haven't tried this, sorry. :) Will try.</div><br><blockquote type="cite"><div>You have to enable debug log (see <br><a href="http://nginx.org/en/docs/debugging_log.html">http://nginx.org/en/docs/debugging_log.html</a>). Then it will be <br>possible to map fd number to the particular request (and it's full <br>logs). Under linux it should be possible to find out fd number of <br>the particular connection via lsof -p <pid-of-nginx-worker>.<br></div></blockquote><div><br></div><div>Will look into this too and get that info on the monitoring script. Can you think of any other system parameter that can be useful to monitor in these cases?</div><div><br></div><div>Thanks a lot Maxim. You're being really helpful. :-)</div></div><div><br></div>Regards<div><br><div>
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>-- </div><div> Vicente Aguilar <<a href="mailto:bisente@bisente.com">bisente@bisente.com</a>> | <a href="http://www.bisente.com/">http://www.bisente.com</a></div></div>
</div>
<br></div></body></html>