2010-01-19

Hbase重启之后就无法启动的原因及解决办法

Posted in FreeBSD/Unix服务器, 云计算 at 15:33 Author:仲远

标签:

Hbase重启之后就无法启动的现象
当我们使用Hbase 0.20.2的时候,遇到了2个奇怪的问题。

我们使用了数台机器构建了一个集群,并且按照Hadoop/Hbase的”Getting Started”安装配置了Hadoop和Hbase。之后能够正常启动Hadoop和Hbase,并且创建table和插入数据。

不过,当我们访问Master的页面时: http://10.37.17.252:60010/master.jsp ,我们发现了第一个问题:在regionserver区域,出现了2个127.0.0.1的regionserver,但是我们并没有在conf/regionservers将master设置为regionserver:

Region Servers
Address Start Code Load
127.0.0.1:60030 1263383321075 requests=0, regions=0, usedHeap=0, maxHeap=0
127.0.0.1:60030 1263383321096 requests=0, regions=0, usedHeap=0, maxHeap=0
WAMDM1.ruc.edu.cn:60030 1263383350174 requests=0, regions=0, usedHeap=24, maxHeap=991
WAMDM2.ruc.edu.cn:60030 1263383320980 requests=0, regions=1, usedHeap=32, maxHeap=991
WAMDM3.ruc.edu.cn:60030 1263383320985 requests=0, regions=1, usedHeap=33, maxHeap=991
WAMDM4.ruc.edu.cn:60030 1263383322246 requests=0, regions=0, usedHeap=33, maxHeap=991
……

但是,虽然出现了以上的怪现象,但是hbase似乎仍然能够正常工作。只是,当我们打算重启hbase的时候,我们发现了第二个问题:我们尝试运行bin/stop-hbase.sh,之后,又运行启动hbase的脚本:bin/ start-hbase.sh,这一次,当我们访问master页面的时候http://10.37.17.252:60010/master.jsp,出现了如下的错误

HTTP ERROR: 500
Trying to contact region server null for region , row ”, but failed after 3 attempts.
Exceptions:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1

RequestURI=/master.jsp

Caused by:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row ”, but failed after 3 attempts.
Exceptions:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1

at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55)
 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28)
 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:433)
 at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
 at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
 at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:324)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
 at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
 at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
 at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

Powered by Jetty://

同时,在master的log中,出现了如下的错误:
2010-01-13 18:34:04,424 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:04,425 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:04,425 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:06,425 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:06,429 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:06,430 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:06,430 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:08,430 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:08,434 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:08,435 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:08,435 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 3 failed; retrying after sleep of 2000
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:424)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:865)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:881)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:936)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:581)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:557)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:631)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:590)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:557)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:407)
  at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
  at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:62)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:988)
  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55)
  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:433)
  at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
  at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125)
  at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
  at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
  at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
  at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
  at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
  at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:324)
  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
  at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
  at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
  at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
2010-01-13 18:34:08,451 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:08,451 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:08,451 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:10,451 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:10,456 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:10,465 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:10,465 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:12,465 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:12,469 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:12,470 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:14,474 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:14,475 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:14,475 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:16,475 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:16,480 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:16,480 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:16,480 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:18,480 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:18,485 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:18,486 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:18,486 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 3 failed; retrying after sleep of 2000
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:424)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:865)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:881)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:936)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:581)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:557)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:631)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:590)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:563)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:407)
  at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
  at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:62)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:988)
  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55)
  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:433)
  at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
  at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125)
  at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
  at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
  at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
  at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
  at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
  at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:324)
  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
  at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
  at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
  at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
2010-01-13 18:34:18,500 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:18,501 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:18,501 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:20,501 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:20,506 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:20,506 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:20,506 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:22,507 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:22,511 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:22,511 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:24,516 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:24,516 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:24,516 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:26,517 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:26,521 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:26,522 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:26,522 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:28,522 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:28,526 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:28,527 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:28,527 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 3 failed; retrying after sleep of 2000
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:424)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:865)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:881)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:936)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:581)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:557)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:631)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:590)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:563)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:407)
  at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
  at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:62)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:988)
  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55)
  at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:433)
  at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
  at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125)
  at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
  at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
  at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
  at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
  at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
  at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
  at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
  at org.mortbay.jetty.Server.handle(Server.java:324)
  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
  at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
  at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
  at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
2010-01-13 18:34:28,547 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:28,547 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:28,547 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:30,547 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:30,552 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:30,552 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:30,552 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Root region location changed. Sleeping.
2010-01-13 18:34:31,575 INFO org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, average load 0.16666666666666666
2010-01-13 18:34:32,552 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Wake. Retry finding root region.
2010-01-13 18:34:32,557 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.37.17.248:60020
2010-01-13 18:34:32,558 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:32,740 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.37.17.248:60020, regionname: -ROOT-,,0, startKey: <>}
2010-01-13 18:34:32,741 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned
2010-01-13 18:34:32,741 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:34:32,741 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:424)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:865)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:881)
  at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:156)
  at org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
  at org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
  at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
2010-01-13 18:35:31,575 INFO org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, average load 0.16666666666666666
2010-01-13 18:35:32,740 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.37.17.248:60020, regionname: -ROOT-,,0, startKey: <>}
2010-01-13 18:35:32,741 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:35:32,741 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:424)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:865)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:881)
  at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:156)
  at org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
  at org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
  at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
2010-01-13 18:35:32,742 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned
2010-01-13 18:36:31,575 INFO org.apache.hadoop.hbase.master.ServerManager: 5 region servers, 0 dead, average load 0.16666666666666666
2010-01-13 18:36:32,740 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.37.17.248:60020, regionname: -ROOT-,,0, startKey: <>}
2010-01-13 18:36:32,741 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.
2010-01-13 18:36:32,741 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:424)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:865)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:881)
  at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:156)
  at org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
  at org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
  at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:68)

此时,Hbase能够进入shell,但是无法执行任何操作。于是再次试图关闭hbase的时候,却发现无法停止master,那个”stop master”后面的“.”出现了许许多多,仍然无法停止master节点。于是我们不得不强制kill掉master。Hbase就这样挂掉了。。。。。

Hbase重启之后就无法启动的原因分析
经过多方排查,最后我在尝试使用netstat -an查看端口占用情况的时候发现:
在WAMDM1节点上,regionserver占用的60020端口占用为:127.0.0.1:60020
而在WAMDM2节点上, regionserver占用的60020端口占用为:10.37.17.249:60020
我感觉颇为蹊跷,之后便检查/etc/hosts文件,果然发现在WAMDM1和WAMDM2下的hosts文件不同。在WAMDM1的hosts文件中的内容为:

127.0.0.1 WAMDM1 localhost.localdomain localhost
10.37.17.248 WAMDM1.ruc.edu.cn WAMDM1
10.37.17.249 WAMDM2.ruc.edu.cn WAMDM2
10.37.17.250 WAMDM3.ruc.edu.cn WAMDM3
10.37.17.251 WAMDM4.ruc.edu.cn WAMDM4
10.37.17.252 WAMDM5.ruc.edu.cn WAMDM5

大家注意第一行。我们在使用配置Hadoop/Hbase的时候,常常使用主机名来代替IP使用,但是在WAMDM1的机器上,WAMDM1被映射为127.0.0.1,于是master和regionserver之间的通信就出错。这也就是为什么我们经常在日志中以及错误提示中看到如下信息:

Server at /10.37.17.248:60020 could not be reached after 1 tries, giving up.

org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /10.37.17.248:60020 after attempts=1
  at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:424)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:865)
  at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:881)
  at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:156)
  at org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
  at org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
  at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:68)

Hbase重启之后就无法启动的解决办法
于是,我将所有机器的/etc/hosts文件,都改为了如下配置:

127.0.0.1 localhost
10.37.17.248 WAMDM1.ruc.edu.cn WAMDM1
10.37.17.249 WAMDM2.ruc.edu.cn WAMDM2
10.37.17.250 WAMDM3.ruc.edu.cn WAMDM3
10.37.17.251 WAMDM4.ruc.edu.cn WAMDM4
10.37.17.252 WAMDM5.ruc.edu.cn WAMDM5
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

同时,为了保险起见,我在stop-hbase.sh中,也加入停止regionservers的命令(虽然在网上查不到停止regionservers的必要性,以及这个脚本存在bug的证据,但是这样改经过测试是没有问题的):

“$bin”/hbase-daemons.sh –config “${HBASE_CONF_DIR}” –hosts “${HBASE_REGIONSERVERS}” stop regionserver 

这个命令需要加在stop master之前。至于不加入这个停止regionservers的命令是否可行,在未来我会进一步测试。

通过以上修改,出现两个127.0.0.1的Regionserver的问题,以及Hbase重启就挂的问题得到彻底解决!

Hbase重启之后就无法启动的问题解决之后的反思
从这次问题解决中吸取如下教训:

在配置分布式系统的时候,一定要注意各个机器之间配置的统一性,包括主机名(hosts文件)、用户名、Hadoop/Hbase各种配置文件等,对于不一致的情况,一定要特别仔细的检查,然后统一起来。已经不止一次在这方面吃亏了,希望大家切记!!!!

本文可以自由转载,转载时请保留全文并注明出处:
转载自仲子说 [ http://www.wangzhongyuan.com/ ]
原文链接:

1 Comment »

  1. 子猴 said,

    2010年March2日 at 21:57

    写得很好

Leave a Comment

*
To prove you're a person (not a spam script), type the security text shown in the picture. Click here to regenerate some new text.
Click to hear an audio file of the anti-spam word