solaris系统


问题描述:

    有两台SUN机器,用共享存储,有三个LUN,quorum:1G、vol1:700、vol2:900.原来是cluster,后来跑成单机了。现在1G和900G在A机挂载,700G在B机挂载。现在是A机无法正常启动了,下面是通过串口收集到的A机启动信息。


还有就是:

ERROR: Power On Self Test Failed. Cause:

CPU0 Bank 1 Dimm 0, J3100 side 1

ERROR: POST failed.

内存如果有问题机器就不能正常进系统吗?


另外:


    想把A机存储里的数据copy出来,可以把A机的存储挂载到B机上吗?能正常识别吗?我昨天把A机关机,在B机上mountA机的存储,提示无法mount,硬盘占用。



Hardware Power On


Probing core system FRUs.. Done

Executing POST w/%o0 = 0000.0800.0101.4041

0:0>

0:0>@(#) Sun Fire[TM] V880/V890 POST4.18.11 2006/05/03 07:51

      /export/delivery/delivery/4.18/4.18.11/post4.18.0/Camelot/daktari/integrated  (root)

0:0>Copyright 2006 Sun Microsystems,Inc. All rights reserved

  SUNPROPRIETARY/CONFIDENTIAL.

  Useis subject to license terms.

0:0>Jump from OBP->POST.

0:0>Diag level set to MAX.

0:0>Verbosity level set to NORMAL.

0:0>

0:0>Start selftest...

0:0>CPUs present in system: 0:0 1:0 2:03:0

0:0>Test CPU(s)....Done

0:0>Init Scan/I2C....Done

0:0>Basic Memory Test....Done

0:0>Full CPU Test....Done

0:0>Memory Block....Done

0:0>IO-Bridge Tests... Done

0:0>Full Memory Test..../

0:0>WARNING: TEST = Block Memory

0:0>H/W under test = CPU0, All CPU0Memory

0:0>MSG = Data or Instruction AccessError,

       Trap Type      00000000.00000063

       Trap PC        ffffffff.f0124e30

       Trap Level     00000000.00000001

       AFSR           00100002.00000016

       AFAR           00000001.1def0030

0:0>END_WARNING


|

0:0>ERROR: TEST = Block Memory

0:0>H/W under test = CPU0 Bank 1 Dimm 0,J3100 side 1

0:0>Repair Instructions: Replace itemsin order listed by 'H/W under test' above.

0:0>MSG = DIMM failure Bank 1 DIMM 0 Pin148

0:0>END_ERROR


Done

0:0>Enable Errors....Done

0:0>ERROR:

0:0>   POST toplevel status has the following failures:

0:0>            CPU0 Memory Bank 1

0:0>   POST failed the following devices on CPU 0:

0:0>            Mem Bank1 DIMM0

0:0>END_ERROR


0:0>POST:       Return to OBP.

POST Reset


Enabling system bus....... Done

Probing Memory............ Done

Initializing CPUs......... Done

Initializing boot memory.. Done

Initializing OpenBoot

Probing system devices

ChassisSerialNumber 0821AM0106

Probing I/O buses

screen not found.

keyboard not found.

Keyboard not present.  Using ttya for input and output.

Probing system devices

ChassisSerialNumber 0821AM0106

Probing I/O buses



Sun Fire V890, No Keyboard

Copyright 2005 Sun Microsystems, Inc.  All rights reserved.

OpenBoot 4.18.11, 14336 MB memoryinstalled, Serial #79165586.

Ethernet address 0:14:4f:b7:f8:92, Host ID:84b7f892.


Running diagnostic script obdiag/normal


Testing /pci@8,600000/network@1

Testing /pci@8,600000/SUNW,qlc@2

Testing /pci@9,700000/ebus@1/i2c@1,2e

Testing /pci@9,700000/ebus@1/i2c@1,30

Testing /pci@9,700000/ebus@1/i2c@1,50002e

Testing /pci@9,700000/ebus@1/i2c@1,500030

Testing /pci@9,700000/ebus@1/bbc@1,0

Testing /pci@9,700000/ebus@1/bbc@1,500000

Testing /pci@8,700000/ide@1

Testing /pci@9,700000/network@1,1

Testing /pci@9,700000/usb@1,3

Testing /pci@9,700000/ebus@1/gpio@1,300600

Testing /pci@9,700000/ebus@1/pmc@1,300700

Testing /pci@9,700000/ebus@1/rtc@1,300070

                                                                     

ERROR: Power On Self Test Failed. Cause:

CPU0 Bank 1 Dimm 0, J3100 side 1

ERROR: POST failed.


Resetting ...


Enabling system bus....... Done

Initializing CPUs......... Done

Initializing boot memory.. Done

Initializing OpenBoot

Probing system devices

ChassisSerialNumber 0821AM0106

Probing I/O buses

screen not found.

keyboard not found.

Keyboard not present.  Using ttya for input and output.

Probing system devices

ChassisSerialNumber 0821AM0106

Probing I/O buses



Sun Fire V890, No Keyboard

Copyright 2005 Sun Microsystems, Inc.  All rights reserved.

OpenBoot 4.18.11, 14336 MB memoryinstalled, Serial #79165586.

Ethernet address 0:14:4f:b7:f8:92, Host ID:84b7f892.

                                                                 

ERROR: Power On Self Test Failed. Cause:

CPU0 Bank 1 Dimm 0, J3100 side 1

ERROR: POST failed.

Rebooting with command: boot

Boot device:/pci@8,600000/SUNW,qlc@2/fp@0,0/disk@w2100001d385b7029,0:a  File and args:

SunOS Release 5.10 VersionGeneric_139555-08 64-bit

Copyright 1983-2009 Sun Microsystems,Inc.  All rights reserved.

Use is subject to license terms.

Failed to configure IPv4 interface(s):e1000g1

Hostname: BB-MASTER

WARNING:/scsi_vhci/ssd@g600a0b800026dc88000005dd4654f576 (ssd17):

       reservation conflict


WARNING:/scsi_vhci/ssd@g600a0b800026dd0a000004404654f4e9 (ssd18):

       reservation conflict


WARNING:/scsi_vhci/ssd@g600a0b800026dc88000005d34654f354 (ssd19):

       reservation conflict


Booting as part of a cluster

NOTICE: CMM: Node BB-MASTER (nodeid = 1)with votecount = 1 added.

NOTICE: CMM: Node BB-STANDBY (nodeid = 2)with votecount = 1 added.

NOTICE: CMM: Quorum device 1(/dev/did/rdsk/d8s2) added; votecount = 1, bitmask of nodes with configuredpaths = 0x3.

NOTICE: clcomm: Adapter e1000g2 constructed

NOTICE: clcomm: Adapter e1000g0 constructed

NOTICE: CMM: Node BB-MASTER: attempting tojoin cluster.

NOTICE: clcomm: Path BB-MASTER:e1000g2 -BB-STANDBY:e1000g2 errors during initiation

NOTICE: clcomm: Path BB-MASTER:e1000g0 -BB-STANDBY:e1000g0 errors during initiation

WARNING: Path BB-MASTER:e1000g2 -BB-STANDBY:e1000g2 initiation encountered errors, errno = 62. Remote node maybe down or unreachable through this path.

WARNING: Path BB-MASTER:e1000g0 -BB-STANDBY:e1000g0 initiation encountered errors, errno = 62. Remote node maybe down or unreachable through this path.

WARNING: CMM: Reading reservation keys fromquorum device /dev/did/rdsk/d8s2 failed.

NOTICE: CMM: Cluster doesn't haveoperational quorum yet; waiting for quorum.

参与4

3同行回答

cwnlinuxcwnlinux系统工程师CCSU
内存报错不影响主机运行 NOTICE: clcomm: Path BB-MASTER:e1000g2 -BB-STANDBY:e1000g2 errors during initiationNOTICE: clcomm: Path BB-MASTER:e1000g0 -BB-STANDBY:e1000g0 errors during initiation检查双机的心跳线,A机和B机都检查一下,SUN的Cluster的心跳线一般都...显示全部
内存报错不影响主机运行

NOTICE: clcomm: Path BB-MASTER:e1000g2 -BB-STANDBY:e1000g2 errors during initiation
NOTICE: clcomm: Path BB-MASTER:e1000g0 -BB-STANDBY:e1000g0 errors during initiation

检查双机的心跳线,A机和B机都检查一下,SUN的Cluster的心跳线一般都是两台机器直连的。

WARNING: CMM: Reading reservation keys fromquorum device /dev/did/rdsk/d8s2 failed.
NOTICE: CMM: Cluster doesn't haveoperational quorum yet; waiting for quorum.

仲裁盘也有问题,不能访问。你A机启动之后是1票,B机本来也是1票,没有仲裁盘,所以不满足多数票,双机不能启动。
可以boot -x启动到单机模式看看心跳网卡和仲裁盘是否正常。

你A机关了,B机应该是可以挂载存储盘,因为你原来是双机存储盘在两台机器都应该做了映射的。
现在不能挂载应该是卷组没有切换过来,你可以手工把A上的存储盘切换过来然后挂载起来恢复业务。收起
互联网服务 · 2015-01-12
浏览1736
blue_diamondblue_diamond系统架构师制造业
内存也在报错哦显示全部
内存也在报错哦收起
机械装备 · 2015-01-09
浏览1693
热心冰块热心冰块项目经理浪潮INSPUR
网卡在报错,磁盘在报错,fask看看,网线是不是动了显示全部
网卡在报错,磁盘在报错,fask看看,网线是不是动了收起
系统集成 · 2015-01-09
浏览1699

提问者

gzhsjz
系统工程师北京正群欣世信息技术有限公司
擅长领域: 服务器云计算存储

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2015-01-09
  • 关注会员:1 人
  • 问题浏览:4617
  • 最近回答:2015-01-12
  • X社区推广