之前运行良好的P55A突然大量报错,宕机了,随后换了一个主板两个CPU,启动正常;空跑了一天后,主机又开始报错,信息为Service Processor Firmware Predictive Error ,表现为HMC上主机和LPAR显示正常运行,lpar无法打开窗口,只能重启后,回复正常,但是过不了多久又变成这样;升级微码到240_417,出现了新变化,具体表现为lpar无法启动,进入asmi查看,其中一个CPU的上的所有12G内存都被deconfig(两块CPU,一个cpu上4G,一个CPU上12G内存,1g和2g混插),于是将CPU调换位置,重启主机,故障神器消失,不再出现任何告警,观察5天直到今天早晨,再次出现问题,HMC上主机和LPAR显示正常运行,lpar无法打开窗口,,只能重启后解决,具体的故障代码如下:
Platform Event Log - 503B67D9 |
Created at : | 04/16/2014 07:08:53 |
Driver Name : | fips240/b0118b_1214.240 |
Subsystem : | Service Processor Firmware |
Event Severity : | Predictive Error |
Action Flags : | Report to Operating System |
| Service Action Required |
| HMC Call Home |
Action Status : | Processed |
Primary System Reference Code |
Reference Code : | B181F142 |
Hex Words 2 - 5 : | 030020F0 28D90810 C13920FF 400000FF |
Hex Words 6 - 9 : | 00000000 00000011 00810421 88786F01 |
Log Hex Dump 略 |
|
Platform Event Log - 503B67BE |
Created at : | 04/16/2014 07:08:27 |
Driver Name : | fips240/b0118b_1214.240 |
Subsystem : | Service Processor Firmware |
Event Severity : | Predictive Error |
Action Flags : | Report to Operating System |
| Service Action Required |
| HMC Call Home |
Action Status : | Processed |
Primary System Reference Code |
Reference Code : | B181F22C |
Hex Words 2 - 5 : | 030020F0 28D90510 C13920FF 400000FF |
Hex Words 6 - 9 : | 00000000 000000D1 00800000 00000000 |
Maintenance Procedure Required |
Priority : | Mandatory, replace all with this type as a unit |
Procedure Number : | FSPSP04 |
Log Hex Dump 略 |
|
Error/Event Logs
Platform Event Log - 503B6753 |
Created at : | 04/16/2014 07:06:46 |
Driver Name : | fips240/b0118b_1214.240 |
Subsystem : | Service Processor Firmware |
Event Severity : | Predictive Error |
Action Flags : | Report to Operating System |
| Service Action Required |
| HMC Call Home |
Action Status : | Processed |
Primary System Reference Code |
Reference Code : | B181F141 |
Hex Words 2 - 5 : | 030020F0 28D90810 C13920FF 400000FF |
Hex Words 6 - 9 : | 00000000 0000000F 00810421 88A86F01 |
Log Hex Dump 略 |
|
Platform Event Log - 503B6788 |
Created at : | 04/16/2014 07:07:37 |
Driver Name : | fips240/b0118b_1214.240 |
Subsystem : | Service Processor Firmware |
Event Severity : | Predictive Error |
Action Flags : | Report to Operating System |
| Service Action Required |
| HMC Call Home |
Action Status : | Processed |
Primary System Reference Code |
Reference Code : | B181F142 |
Hex Words 2 - 5 : | 030020F0 28D90810 C13920FF 400000FF |
Hex Words 6 - 9 : | 00000000 00000011 00810421 88A86F01 |
Log Hex Dump 略 |
.
|
Platform Event Log - 503B66CB |
Created at : | 04/16/2014 07:04:37 |
Driver Name : | fips240/b0118b_1214.240 |
Subsystem : | Service Processor Firmware |
Event Severity : | Predictive Error, Correctable |
Action Flags : | Report to Operating System |
| Service Action Required |
| HMC Call Home |
Action Status : | Processed |
Primary System Reference Code |
Reference Code : | B1819509 |
Hex Words 2 - 5 : | 030020F0 28D90C10 C13920FF 400000FF |
Hex Words 6 - 9 : | FFFFFFD8 D25A0000 00000014 0000000B |
Maintenance Procedure Required |
Priority : | Mandatory, replace all with this type as a unit |
Procedure Number : | FSPSP04 |
Maintenance Procedure Required |
Priority : | Medium Priority |
Procedure Number : | FSPSP06 |
Log Hex Dump 略 |
硬件供货商说是由于VRM故障,导致此现象产生, 我想请教一下大伙,这东西到底是什么原因造成的?多谢 |
收起