这是一个让我发疯的问题.我的本地LAN上运行着一台计算机Storm实例.我当前正在运行v0.9.1-incubating发行版(来自storm supervisor进程每次重启后拒绝拒绝启动.修复非常简单,请从Storm本地目录中删除supervisor和workers文件夹,然后重新运行该过程;然后运行hunky dory,直到下次重新启动为止.
This is an issue that is simply driving me nuts. I have a one machine Storm instance running on my Local LAN. I am currently running v0.9.1-incubating release version (from the Apache Incubator site. The issue is simply that my storm supervisor process refuses to start after EVERY SINGLE reboot. The hack fix is quite simple, remove the supervisor and workers folders from the storm local directory and re run the process; things run hunky dory then on until next reboot.
我提供了我认为可能与调试此问题有关的所有信息.如果需要,请索取更多信息,但请帮助我解决一些问题.
I'm providing every bit of information I think might be relevant to debug this issue. Please ask for more if needed, but just help me get some resolution.
PS:是否运行拓扑都没关系.
PS: It doesn't matter if I have topologies running or not.
主管配置
[program:zookeeper] command=/path/to/zookeeper/bin/zkServer.sh "start-foreground" process_name=zookeeper directory=/path/to/zookeeper/bin stdout_logfile=/var/log/zookeeper.log ; stdout log path, NONE$ stderr_logfile=/var/log/err.zookeeper.log ; stderr log path, $ priority=2 user=root [program:storm-nimbus] command=/path/to/storm/bin/storm nimbus user=root autostart=true autorestart=true startsecs=10 startretries=2 log_stdout=true log_stderr=true stderr_logfile=/var/log/storm/nimbus.err.log stdout_logfile=/var/log/storm/nimbus.out.log logfile_maxbytes=20MB logfile_backups=2 priority=10 [program:storm-ui] command=/path/to/storm/bin/storm ui user=root autostart=true autorestart=true startsecs=10 startretries=2 log_stdout=true log_stderr=true stderr_logfile=/var/log/storm/ui.err.log stdout_logfile=/var/log/storm/ui.out.log logfile_maxbytes=20MB logfile_backups=2 priority=500 [program:storm-supervisor] command=/path/to/storm/bin/storm supervisor user=root autostart=true autorestart=true startsecs=10 startretries=2 log_stdout=true log_stderr=true stderr_logfile=/var/log/storm/supervisor.err.log stdout_logfile=/var/log/storm/supervisor.log.log logfile_maxbytes=20MB logfile_backups=2 priority=600 [program:storm-logviewer] command=/path/to/storm/bin/storm logviewer user=root autostart=true autorestart=true startsecs=10 startretries=2 log_stdout=true log_stderr=true stderr_logfile=/var/log/storm/log.err.log stdout_logfile=/var/log/storm/log.out.log logfile_maxbytes=20MB logfile_backups=2 priority=900风暴配置
#Zookeeper storm.zookeeper.servers: - "192.168.1.11" # Nimbus nimbus.host: "192.168.1.11" nimbus.childopts: '-Xmx1024m -Djava.preferIPv4Stack=true -Dprocess=storm' # UI ui.port: 9090 ui.childopts: "-Xmx768m -Djava.preferIPv4Stack=true -Dprocess=storm" # Supervisor supervisor.childopts: '-Djava.preferIPv4Stack=true -Dprocess=storm' # Worker worker.childopts: '-Xmx768m -Djava.preferIPv4Stack=true -Dprocess=storm' storm.local.dir: "/path/to/storm" storm.messaging.transport: "backtype.storm.messagingty.Context" storm.messagingty.server_worker_threads: 1 storm.messagingty.client_worker_threads: 1 storm.messagingty.buffer_size: 5242880 storm.messagingty.max_retries: 100 storm.messagingty.max_wait_ms: 1000 storm.messagingty.min_wait_ms: 100错误消息 用于日志错误消息的Pastebin .我在这里交叉张贴相关的内容.
Error message Pastebin for log error message. I'm cross posting the relevant bits here.
java.lang.RuntimeException: java.io.EOFException at backtype.storm.utils.Utils.deserialize(Utils.java:86) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] at backtype.storm.utils.LocalState.get(LocalState.java:56) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.4.0.jar:na] at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na] at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na] at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) ~[clojure-1.4.0.jar:na] at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na] at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) ~[na:na] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na] at java.lang.Thread.run(Thread.java:679) [na:1.6.0_27] Caused by: java.io.EOFException: null at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2322) ~[na:1.6.0_27] at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2791) ~[na:1.6.0_27] at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:798) ~[na:1.6.0_27] at java.io.ObjectInputStream.<init>(ObjectInputStream.java:298) ~[na:1.6.0_27] at backtype.storm.utils.Utils.deserialize(Utils.java:81) ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] ... 11 common frames omitted 2014-03-11 12:27:25 b.s.util [INFO] Halting process: ("Error when processing an event")推荐答案
当我们的两台开发服务器断电时,我们遇到了完全相同的问题(主管在启动时崩溃,并且出现了相同的日志错误消息).我猜想只是停止服务器而不必先停止主管也将具有相同的效果.
We had that exact same problem (supervisor crashing on start and same log error message) when we had a power outage on 2 of our development servers. I guess just stopping the server without previously stopping the supervisor would have the same effect.
我们找到的唯一可行的解决方案是删除" storm-local/supervisor "文件夹(我猜那里的东西已损坏).
The only working solution we found was to remove the "storm-local/supervisor" folder (I guess something in there got corrupted).
更多推荐
风暴
发布评论