关于VIP的奇怪故障一例
OS:HP-UX 11.31 DB:Oracle 10.0.2.5 RAC (2nodes)
问题描述:节点二在服务启动后,经过1分钟至3分钟会发生VIP导致的服务切换,切换后节点二VIP在节点一上启动,检查日志主要报错为:
1 | Invalid parameters, or failed to bring up VIP (host=essrzc2) |
经过检查,系统网关配置正常,响应速度正常,系统日志无异常。开启服务debug后收获如下日志:
1 2 3 4 5 6 7 8 | 2013-12-16 14:57:08.192: [ RACG][1] [21721][1][ora.essrzc2.vip]: Mon Dec 16 14:57:03 EAT 2013 [ 21730 ] Checking interface existance Mon Dec 16 14:57:03 EAT 2013 [ 21730 ] Calling getifbyip Mon Dec 16 14:57:03 EAT 2013 [ 21730 ] getifbyip: started for 132.42.37.144 2013-12-16 14:57:08.192: [ RACG][1] [21721][1][ora.essrzc2.vip]: Mon Dec 16 14:57:03 EAT 2013 [ 21730 ] Completed getifbyip lan900:801 Mon Dec 16 14:57:03 EAT 2013 [ 21730 ] Completed with initial interface test Mon Dec 16 14:57:03 EAT 2013 [ 21730 ] Broadcast = 132.42.37.255 <a href="https://www.forzw.com/archives/707#more-707" class="more-link">继续阅读 »</a> |
从日志基本看出,出现问题是从“checkIf: RX packets checked if=lan900 failed”开始,从而导致资源切换,执行检测的命令为“racgvip”,此命令在完成2次“Completed second gateway test”后如果失败就进行切换,因此要解决问题,需要增加test次数,修改racgvip如下参数。
打开$CRS_HOME/bin/racgvip,修改CHECK_TIMES=2为CHECK_TIMES=10,之后不再出现异常。