系统集成AIX

aix意外重启后,一直报错errpt,请帮忙看下是不是内存条坏了。怎么定位是哪根坏了?

今天凌晨服务器意外重启。conslog 日志alog -f /var/adm/ras/conslog -o          0 Sat May 23 01:50:37 GMT+08:00 2020 *         0 Sat May 23 01:50:37 GMT+08:00 2020 Starting Desktop Login on display :0...         0 Sat May 23 01:50:37 ...显示全部

今天凌晨服务器意外重启。conslog 日志

alog -f /var/adm/ras/conslog -o 

         0 Sat May 23 01:50:37 GMT+08:00 2020 *
         0 Sat May 23 01:50:37 GMT+08:00 2020 Starting Desktop Login on display :0...
         0 Sat May 23 01:50:37 GMT+08:00 2020 * Wait for the Desktop Login screen before logging in.

  •        0 Sat May 23 01:50:37 GMT+08:00 2020
             0 Sat May 23 01:50:48 GMT+08:00 2020 
             0 Sat May 23 01:50:48 GMT+08:00 2020 Saving Base Customize Data to boot disk
             0 Sat May 23 01:50:49 GMT+08:00 2020 Starting the sync daemon
             0 Sat May 23 01:50:49 GMT+08:00 2020 Mounting the platform dump file system, /var/adm/ras/platform
             0 Sat May 23 01:50:49 GMT+08:00 2020 Starting the error daemon
             0 Sat May 23 01:50:53 GMT+08:00 2020 
             0 Sat May 23 01:50:53 GMT+08:00 2020 System initialization completed.
             0 Sat May 23 01:50:53 GMT+08:00 2020 Sat May 23 01:50:53 GMT+08:00 2020
             0 Sat May 23 01:50:53 GMT+08:00 2020 Automatic Error Log Analysis for sysplanar0 has detected a problem.
    The Service Request Number is 
      B123E504: Memory subsystem including external cache Predictive Error,
                general. Refer to the system service documentation for more
                information.
               Additional Words: 2-030000F0 3-2BFC0110 4-C13920FF 5-400000FF
                                 6-81032E40 7-00000303 8-0FFF0024 9-A9008270.
             0 Sat May 23 01:50:53 GMT+08:00 2020 
             0 Sat May 23 01:50:53 GMT+08:00 2020 in sinpolhndlr OFF 
             0 Sat May 23 01:50:53 GMT+08:00 2020 TE=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 CHKEXEC=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 CHKSHLIB=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 CHKSCRIPT=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 CHKKERNEXT=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 STOP_UNTRUSTD=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 STOP_ON_CHKFAIL=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 LOCK_KERN_POLICIES=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 TSD_FILES_LOCK=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 TSD_LOCK=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 TEP=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 TLP=OFF
             0 Sat May 23 01:50:53 GMT+08:00 2020 Successfully updated the Kernel Authorization Table.
             0 Sat May 23 01:50:53 GMT+08:00 2020 Successfully updated the Kernel Role Table.
             0 Sat May 23 01:50:53 GMT+08:00 2020 Successfully updated the Kernel Command Table.
             0 Sat May 23 01:50:53 GMT+08:00 2020 Successfully updated the Kernel Device Table.
             0 Sat May 23 01:50:53 GMT+08:00 2020 Successfully updated the Kernel Object Domain Table.
             0 Sat May 23 01:50:53 GMT+08:00 2020 Successfully updated the Kernel  Domains Table.
             0 Sat May 23 01:50:53 GMT+08:00 2020 OPERATIONAL MODE Security Flags
             0 Sat May 23 01:50:53 GMT+08:00 2020 ROOT                      :   DISABLED
             0 Sat May 23 01:50:53 GMT+08:00 2020 TRACEAUTH                 :   DISABLED
             0 Sat May 23 01:50:53 GMT+08:00 2020 System runtime mode is now OPERATIONAL MODE.
             0 Sat May 23 01:50:54 GMT+08:00 2020 Setting tunable parameters...         0 Sat May 23 01:50:55 GMT+08:00 2020 complete
             0 Sat May 23 01:50:55 GMT+08:00 2020 Starting Multi-user Initialization
             0 Sat May 23 01:50:55 GMT+08:00 2020  Performing auto-varyon of Volume Groups 
             0 Sat May 23 01:50:56 GMT+08:00 2020  Activating all paging spaces 
             0 Sat May 23 01:50:57 GMT+08:00 2020 0517-075 swapon: Paging device /dev/hd6 is already active.
             0 Sat May 23 01:50:59 GMT+08:00 2020 
             0 Sat May 23 01:50:59 GMT+08:00 2020 The current volume is: /dev/hd1
             0 Sat May 23 01:50:59 GMT+08:00 2020 Primary superblock is valid.
             0 Sat May 23 01:50:59 GMT+08:00 2020 
             0 Sat May 23 01:50:59 GMT+08:00 2020 The current volume is: /dev/hd10opt
             0 Sat May 23 01:50:59 GMT+08:00 2020 Primary superblock is valid.
             0 Sat May 23 01:50:59 GMT+08:00 2020  Performing all automatic mounts 
             0 Sat May 23 01:50:59 GMT+08:00 2020 Replaying log for /dev/oracle_lv.
             0 Sat May 23 01:52:05 GMT+08:00 2020 Multi-user initialization completed
             0 Sat May 23 01:52:05 GMT+08:00 2020 Checking for srcmstr active...         0 Sat May 23 01:52:06 GMT+08:00 2020 success
             0 Sat May 23 01:52:06 GMT+08:00 2020 complete
             0 Sat May 23 01:52:06 GMT+08:00 2020 Starting tcpip daemons:
             0 Sat May 23 01:52:07 GMT+08:00 2020 success
             0 Sat May 23 01:52:07 GMT+08:00 2020 success
             0 Sat May 23 01:52:11 GMT+08:00 2020 0513-059 The syslogd Subsystem has been started. Subsystem PID is 4980904.
             0 Sat May 23 01:52:11 GMT+08:00 2020 0513-059 The sendmail Subsystem has been started. Subsystem PID is 5964030.
             0 Sat May 23 01:52:11 GMT+08:00 2020 0513-059 The portmap Subsystem has been started. Subsystem PID is 6750392.
             0 Sat May 23 01:52:11 GMT+08:00 2020 0513-059 The inetd Subsystem has been started. Subsystem PID is 6881532.
             0 Sat May 23 01:52:11 GMT+08:00 2020 0513-029 The snmpd Subsystem is already active.
    Multiple instances are not supported.
             0 Sat May 23 01:52:12 GMT+08:00 2020 0513-059 The aixmibd Subsystem has been started. Subsystem PID is 5767168.
             0 Sat May 23 01:52:12 GMT+08:00 2020 0513-059 The snmpmibd Subsystem has been started. Subsystem PID is 6029326.
             0 Sat May 23 01:52:12 GMT+08:00 2020 0513-059 The hostmibd Subsystem has been started. Subsystem PID is 7274730.
             0 Sat May 23 01:52:12 GMT+08:00 2020 Finished starting tcpip daemons.
             0 Sat May 23 01:52:12 GMT+08:00 2020 nsmb0 Available
             0 Sat May 23 01:52:12 GMT+08:00 2020 Starting NFS services:
             0 Sat May 23 01:52:12 GMT+08:00 2020 0513-059 The biod Subsystem has been started. Subsystem PID is 6291654.
             0 Sat May 23 01:52:14 GMT+08:00 2020 0513-059 The rpc.statd Subsystem has been started. Subsystem PID is 8126468.
             0 Sat May 23 01:52:14 GMT+08:00 2020 0513-059 The rpc.lockd Subsystem has been started. Subsystem PID is 7471342.
             0 Sat May 23 01:52:14 GMT+08:00 2020 Completed NFS services.
             0 Sat May 23 01:52:19 GMT+08:00 2020 success
             0 Sat May 23 01:52:20 GMT+08:00 2020 success
             0 Sat May 23 01:52:33 GMT+08:00 2020 0513-059 The ctrmc Subsystem has been started. Subsystem PID is 8388648.
             0 Sat May 23 01:52:50 GMT+08:00 2020 
             0 Sat May 23 01:52:50 GMT+08:00 2020 Sat May 23 01:52:50 GMT+08:00 2020
             0 Sat May 23 01:52:50 GMT+08:00 2020 Automatic Error Log Analysis for sysplanar0 has detected a problem.
    The Service Request Number is 
      B123E504: Memory subsystem including external cache Predictive Error,
                general. Refer to the system service documentation for more
                information.
               Additional Words: 2-030000F0 3-2BFC0110 4-C13920FF 5-400000FF
                                 6-81032E40 7-00000303 8-0FFF0024 9-A9008270.
             0 Sat May 23 01:52:50 GMT+08:00 2020 

errpt 发现每分钟都有告警日志

errpt

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
51E537B5   0523134820 P H sysplanar0     platform_dump saved to file
291D64C3   0523134820 I H sysplanar0     Platform dump data
BFE4C025   0523134820 P H sysplanar0     UNDETERMINED ERROR

查看51E537B5报错信息

errpt -aj 51E537B5      

LABEL:          PLAT_DUMP_COMPLETE
IDENTIFIER:     51E537B5
Date/Time:       Sat May 23 13:48:34 GMT+08:00 2020
Sequence Number: 20925
Machine Id:      00F6945B4C00
Node Id:         szzd_db1
Class:           H
Type:            PERM
WPAR:            Global
Resource Name:   sysplanar0      
Resource Class:  
Resource Type:   
Location:        

Description
platform_dump saved to file

Detail Data
platform_dump indicator event
...... 

Diagnostic Analysis
Diagnostic Log sequence number: 13401
Resource tested:        sysplanar0
Menu Number:            651303
Description:

The following informational event was reported by Platform Firmware.

Platform Firmware Dump Notification.

查看BFE4C025报错信息

 #errpt -aj BFE4C025

LABEL:          SCAN_ERROR_CHRP
IDENTIFIER:     BFE4C025

Date/Time:       Sat May 23 13:48:33 GMT+08:00 2020
Sequence Number: 20923
Machine Id:      00F6945B4C00
Node Id:         szzd_db1
Class:           H
Type:            PERM
WPAR:            Global
Resource Name:   sysplanar0      
Resource Class:  
Resource Type:   
Location:        

Description
UNDETERMINED ERROR

Failure Causes
UNDETERMINED

        Recommended Actions
        RUN SYSTEM DIAGNOSTICS.

Detail Data
PROBLEM DATA
......

Diagnostic Analysis
Diagnostic Log sequence number: 13399
Resource tested:        sysplanar0
Resource Description:   System Planar
Location:               
SRC:                    B123E504
Description:            Memory subsystem including external cache Predictive
                        Error, general. Refer to the system service
                        documentation for more information.
Additional Words:       2-030000F0 3-2BFC0110 4-C13920FF 5-400000FF
                        6-81032E40 7-00000303 8-09A00029 9-A9008070
Possible FRUs:
    Priority: H FRU: 77P8784  S/N: n/a          CCIN: 31C5 
    Location: U78AA.001.WZSGD13-P1-C17-C7
    Priority: H FRU: 77P8784  S/N: n/a          CCIN: 31C5 
    Location: U78AA.001.WZSGD13-P1-C17-C9

收起
参与11

查看其它 3 个回答lipeng9239的回答

lipeng9239lipeng9239系统运维工程师北京智控美信

交换测试的过程中内存报错硬件位置在变是正常的。8202的机器根据一路或者两路处理器,可以配置一到四个内存盒子,每个盒子有8个内存插槽,内存盒子的多少,内存条的大小和数量,都有相应的内存插槽使用规则。请结合你机器实际情况,去官网查看内存插法,看懂了,也就理解你现在遇到的问题了。

互联网服务 · 2020-06-30
浏览3857

回答者

lipeng9239
系统运维工程师北京智控美信
擅长领域: 服务器存储AIX

lipeng9239 最近回答过的问题

回答状态

  • 发布时间:2020-06-30
  • 关注会员:4 人
  • 回答浏览:3857
  • X社区推广