我有 50 个 sidekiq 线程在网络上爬行,几周前这些线程在运行约 20 分钟后开始挂起.当我进行回溯转储时,大多数线程都卡在 net/http 初始化上:
I have 50 sidekiq threads crawling the web, and a few weeks ago the threads started hanging after about 20 minutes of running. When I do a backtrace dump, most of the threads are stuck on net/http initialize:
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `initialize' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `open' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `block in connect' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:76:in `timeout' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:878:in `connect' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:863:in `do_start' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:858:in `start' /app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start' /app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in `connection_for' /app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request' /app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in `fetch' /app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:974:in `response_redirect' /app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:298:in `fetch' /app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize.rb:432:in `get' /app/app/workers/crawl_page.rb:24:in `block in perform' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `block in catch' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:106:in `timeout'我不认为 sidekiq 会卡在 net/http 上,因为我已经将整个调用包装在超时中: Timeout::timeout(APP_CONFIG['crawl_page_timeout']) { @page = agent.get(url) }
I didn't think sidekiq would get stuck on net/http because I've wrapped the entire call in a timeout: Timeout::timeout(APP_CONFIG['crawl_page_timeout']) { @page = agent.get(url) }
...但后来我开始阅读一些关于 ruby 的超时如何不是线程安全的旧帖子:blog.headius/2008/02/rubys-threadraise-threadkill-timeoutrb.html
...but then I started reading some old posts about how ruby's Timeout is not thread safe: blog.headius/2008/02/rubys-threadraise-threadkill-timeoutrb.html
ruby 的超时是否仍然不是线程安全的?
Is ruby's Timeout still not thread safe?
我知道很多人用 Ruby 编写爬虫.如果 Timeout 不是线程安全的,那么人们如何编写爬虫来处理 net/http 卡住的问题?
I know a lot of people write crawlers in Ruby. If Timeout isn't thread-safe, how are people writing crawlers handling the issue of net/http getting stuck?
更新:
我已经切换到 HTTPClient(特别说明它的线程安全)来替换 mechanize.我们似乎仍然被困在初始化线程上.同样,这可能是由于 ruby 的超时无法正常工作,也可能是 sidekiq 问题.这是最近挂起的 sidekiq 线程的堆栈跟踪:
I've switched to HTTPClient (which specifically says its thread safe) to replace mechanize. We appear to still be getting stuck on initializing a thread. Again, this could be due to ruby'ss Timeout not working properly, or it could be a sidekiq issue. Here's the stacktrace from the most recent hung sidekiq threads:
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `initialize' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `new' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `create_socket' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:752:in `block in connect' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `call' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `timeout' /app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:127:in `timeout' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:751:in `connect' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:609:in `query' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:164:in `query' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:1087:in `do_get_block' /app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:34:in `block in do_get_block_with_newrelic' /app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/cross_app_tracing.rb:43:in `tl_trace_http_request' /app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:33:in `do_get_block_with_newrelic' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:891:in `block in do_request' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:985:in `protect_keep_alive_disconnected' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:890:in `do_request' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:963:in `follow_redirect' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:776:in `request' /app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:677:in `get' /app/app/ohm_models/queued_page.rb:20:in `run_crawl' 推荐答案正确,在 Ruby 代码中使用 Timeout 仍然不安全,除非你确切地知道什么正在该块中发生(其中包括任何 C 代码可能正在执行的操作).因此,我亲自观察到连接池中发生了灾难性的事情.
Correct, it is still not safe to use Timeout in Ruby code, unless you know exactly what is happening within that block (which includes what any C code might be doing). I have personally observed catastrophic things happening in connection pools because of this.
您可能能够避免错误并重试,但如果您不走运,您的过程可能会卡住并需要重新启动.
You may be able to get away with rescuing errors and retrying, but if you're unlucky your process might get wedged and require a restart.
如果你 fork 创建新进程,如果它们运行很长时间,你可以安全地杀死它们(或者使用 timeout(1) 因为他们没有办法破坏你的父进程.
If you fork to create new processes, you can kill those safely if they run long (or use timeout(1) because they don't have any way to corrupt your parent process.
我知道很多人用 Ruby 编写爬虫.如果 Timeout 不是线程安全的,那么人们如何编写爬虫来处理 net/http 卡住的问题?
I know a lot of people write crawlers in Ruby. If Timeout isn't thread-safe, how are people writing crawlers handling the issue of net/http getting stuck?
你有一个具体的例子吗?
Do you have a specific example of one that works?
更多推荐
ruby 2.1.2 超时仍然不是线程安全的吗?
发布评论