[经验分享] 关于SCSI锁

2009-08-04 修改:DS4000AIX下使用RDAC,缺省是SCSI-2的锁。

2009-09-18修改:DS4000无法使用hlmTestLunShow查看SCSI-3的锁。

[url=#_Toc197850095]1.     何为SCSI锁?... 1[/url]

[url=#_Toc197850096]2.     SCSI锁的类型。... 1[/url]

[url=#_Toc197850097]3.     什么情况下设备会被加锁?... 2[/url]

[url=#_Toc197850098]4.     处理SCSI-3... 2[/url]

[url=#_Toc197850099]1)     查看SC_DISK_ERR* or FSCSI_ERR*的sense data. 2[/url]

[url=#_Toc197850100]2)     处理ESS/DS8000/DS6000上的persistent reservation。... 2[/url]

[url=#_Toc197850101]3)     处理DS4000上的persistent reservation。... 2[/url]

[url=#_Toc197850102]5.     处理SCSI-2... 3[/url]

[url=#_Toc197850103]1)     DS4000: hlmTestLunShow SSID查看。... 3[/url]

[url=#_Toc197850104]2)     DS6800: catreef "fb/volstatuslss"查看... 3[/url]

[url=#_Toc197850105]3)     DS8000: cat"/dev/cpss0/fb/volstatus lss"查看... 4[/url]

[url=#_Toc197850106]6.     reserve_policy. 4[/url]

[url=#_Toc197850107]7.     AIX上有关锁的命令... 5[/url]

[url=#_Toc197850108]1)     varyonvg/varyoffvg. 5[/url]

[url=#_Toc197850109]2)     HACMP相关命令... 6[/url]

[url=#_Toc197850110]3)     lquerypr/pcmquerypr/pcmgenprkey. 6[/url]

[url=#_Toc197850111]8.     需要注意的一种情况... 6[/url]

[url=#_Toc197850112]附录:... 7[/url]

[url=#_Toc197850113]1.     varyonvg -b参数说明:... 7[/url]

[url=#_Toc197850114]2.     Using the SC_FORCED_OPEN Option. 7[/url]

[url=#_Toc197850115]3.     Responsibilities of the SCSIDevice Driver. 7[/url]

[url=#_Toc197850116]4.     Responsibilities of the DeviceDriver. 8[/url]

[url=#_Toc197850117]5.     参考文献:... 8[/url]

1.     何为SCSI锁?

在一个共享存储的环境下,多台主机可能会同时访问同一台存储设备,如果此时多台主机在同一时点上对一个Lun进行写的操作,那么可想而知这个Lun将不知道哪个数据先写,哪个数据后写。为了防止这种情况发生而导致的数据损坏,于是就引入了SCSI锁的概念。如下图中HostA对Lun进行读写时,对Lun加上SCSI锁,此时HostB将无法对该Lun进行访问。

HostA   HostB

\       /

\     /

Lun

2.      SCSI锁的类型。

通常来讲目前SCSI锁有两种类型:SCSI-2Reservation和SCSI-3 Reservation,这里SCSI-3Reservation也称之为Persistent Reservation。这两种类型的的锁是不能共存在一个Lun上的。

SCSI-2 Reservation只允许设备被发出加锁的Initiator访问,这里Initiator一般指HBA。比如HostA上的fcs0对访问的LUN加上SCSI-2锁,此时即使HostA上的fcs1也无法访问该Lun。所以SCSI-2 Reservation有时也被称为single-path reservation。

SCSI-3 Reservation(Persistent Reservation)是使用PRKey来对设备进行加锁。通常一台Host会有唯一的PR Key,不同的host,PR Key也不同。所以一般SCSI-3 Reservation通常被应用在多通路的共享环境下面。

3.      什么情况下设备会被加锁?

一般设备被打开时将会被加上锁。比如varyonvg、dd等等,需要注意的是对于dd这种命令当它运行时设备会被加锁,运行完成后会自动解锁。

注意:varyonvg -c不会对设备加锁。

另外,当vg varyon之后,只有varyoffvg或者varyonvg -b才会对vg相关的设备进行解锁。直接用shutdown命令不会做varyoffvg的动作,因此不会解锁。

4.      处理SCSI-3

1)      查看SC_DISK_ERR* or FSCSI_ERR*的sense data.

01 – this indicates the SCSI status field is valid

18 - SCSI device is reserved by another Host

For example:

SENSE DATA

0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 00000000 0118 0000 0000 0000

This is usually seen in an SC_DISK_ERR* or FSCSI_ERR*error (errpt –a output).

2)     处理ESS/DS8000/DS6000上的persistent reservation。

lquerypr -vh /dev/vpathX可以查看persistent reservation。如果有PR的话,返回值将是PR Key,可与uname -a相比较对应。

lquerypr -ch /dev/vpathX可以用来清除persistent reservation。注意:该命令慎用!!!

在SDDPCM的环境下用pcmquerypr。

3)     处理DS4000上的persistent reservation。

通过SM中Advanced->Maintenance->Persistent Reservations查看与清除logical drive的SCSI-3 reservation。

5.      处理SCSI-2

有时我们用lquerypr无法看到vpath上有锁,或者SM的persistent reservation输出也显示无锁,但hdisk/vpath就是无法访问。这时请检查是否有SCSI-2 reservation。

注意:以下命令方式仅限于SCSI-2锁。

1)      DS4000: hlmTestLunShow SSID查看。

               hlmTestLunShow 8

LunNumber:0x8 LunInfo :0x6642c5c State:0x0

QuiescenceFlag:0x0 Owner:0x1 IsReady :0x1

reserveId:0xe resv3rdId:0xffff

value = 128 = 0x80

输出中reseveId表示SCSI-2reservation。值为0xffff表示没有锁,0xe表示被hostid为0xe的host占用。

解锁方法:

ü  hlmTestRelease reservId,SSID

ü  将Lun从一个控制器切换到另一个控制器

ü  在AIX上 ,使用HACMP的相关命令/usr/sbin/cluster/events/utils/cl_flutereset  /dev/hdiskXX

2)      DS6800: catreef "fb/volstatus lss"查看

catreef "fb/volstatus 0x12"

Vol    Rsv           DA State            FB Status                   KnownFormat Status

----    -----           ------------------    ---------------------                  -------------------------

1200 PPRC       GOOD               Ready                        formatted

1201 TRAD       GOOD               Ready                        formatted

1202 PR            GOOD               Ready                        formatted

1203 None         GOOD               Ready                        formatted

PPRC   Means PPRC(suggest its probably a PPRC target)

TRAD   TraditionalSCSI2 reserve

PR     SCSI3Persistent reserve.

解锁方法:

ü  cmt -a RESET_LUN_RESERVATION -t volume

如:cmt -a RESET_LUN_RESERVATION -t 0x1202

3)      DS8000: cat "/dev/cpss0/fb/volstatus lss"查看

cat "/dev/cpss0/fb/volstatus 0x98"

Vol            Rsv  DA State   FBStatus Known       Format Status

---              -----   ------------- ---------------------                  -----------------------

981D         PR    GOOD      Ready                         formatted  

       解锁方法:

ü  cmt -a RESET_LUN_RESERVATION -t volume

如:cmt -a RESET_LUN_RESERVATION -t 0x981D

6.      reserve_policy

每个厂商的设备与驱动都有自己的属性,但大多都类似。我们这里以MPIO的reserve_policy为例:

No Reserve reservation policy

If you set MPIO devices with this reservepolicy, there is no reserve being made on MPIO devices. A device withoutreservation can be accessed by any initiators at any time. Input/output can besent from all the paths of the MPIO device. This is the default reserve policy ofSDDPCM. (请一定注意这点)

Exclusive Host Access single-path reservation policy

This is the scsi-2 reservation policy. Ifyou set this reserve policy for MPIO devices, only the fail_over path selectionalgorithm can be selected for the devices. With this reservation policy, anMPIO device only has one path being opened, and a scsi-2 reservation is made bythis path on the device. Input/output can only be sent through this path. Whenthis path is broken, another path will be opened and scsi-2 reservation will bemade by the new path. All input and output will be routed to this path.

Persistent Reserve Exclusive Host Access reservationpolicy

If you set an MPIO device with thispersistent reserve policy, a persistent reservation is made on this device witha persistent reserve (PR) key. Any initiators who register with the same PR keycan access this device. Normally, you should pick a unique PR key for a server.Different servers should have different unique PR key. Input and output isrouted to all paths of the MPIO device, because all paths of an MPIO device areregistered with the same PR key. In a nonconcurrent clustering environment, such as HACMP, this is thereserve policy that you should select.

Current HACMP clustering software supportsno_reserve policy with Enhanced Concurrent Mode volume group. HACMP support forpersistent reserve policies for supported storage MPIO devices is notavailable.

Persistent Reserve Shared Host Access reservation policy

If you set an MPIO device with thispersistent reserve policy, a persistent reservation is made on this device witha persistent reserve (PR) key. However, any initiators that implementedpersistent registration can access this MPIO device, even if the initiators areregistered with different PR keys. In a concurrent clustering environment, suchas HACMP, this is the reserve policy that you should select for sharingresources among multiple servers.

Current HACMP clustering software supportsno_reserve policy with Enhanced Concurrent Mode volume group. HACMP support forpersistent reserve policies for supported storage MPIO devices is notavailable.

7.      AIX上有关锁的命令

1)     varyonvg/varyoffvg

varyonvg会对相关的hdisk/vpath等设备加上锁。一般情况下,DS4000的hdisk在AIX下使用RDAC,加上SCSI-2的锁;DS6000/DS8000/ESS的vpath设备会被加上SCSI-3的锁。

对于DS6000/DS8000/ESS,如果没有使用vpath,则使用SCSI-2的锁。

当然我们也可以通过修改dpo、hdisk、vpath等属性指定锁的方式。

varyoffvg则会对VG相关的设备进行正常的解锁操作。

varyonvg -b也会将VG相关的设备进行解锁操作。通常该命令在VG正在被使用的主机上运行,与“-u”参数一起使用,可用来在HA环境下的LVM操作。

注意:“-b”参数会调用SC_FORCED_OPEN去打开VG中hdisk的锁,但同时对于SCSI和FC设备,该命令会解开这个hdisk所在的target address上所有LUN的锁。例如,如果hdisk0、hdisk1都在fcs0下,hdisk0属于datavg,hdisk1属于testvg,此时使用varyonvg -b datavg,hdisk0与hdisk1都会被解锁。

另外,在某些特定环境下可以在另一台共享该VG但没有varyon该VG的AIX主机上使用varyonvg -b来解锁。例如,HostA与HostB共享hdisk0,hdisk0组成datavg,该datavg目前在HostA上varyon。在某些特定环境下,在HostB上运行“varyonvg -b datavg”可以解掉hdisk0上的锁。特别注意:varyonvg设计不是用在此种环境下的,可能出现一些不可预知情况,请慎用!一个可以尝试该方法的情况是HostA与HostB共享一台非IBM存储,这台存储没有自己的工具用来解锁,也不被HACMP支持。hdisk0所构成的datavg在HostA上被varyon,此时HostA异常宕机,此时HostB肯定无法正常接管,因此hdisk0的锁无法释放。这时候可以尝试在HostB上使用varyonvg -b datavg来解锁,但不一定成功(需要看存储厂商的支持情况)。

总而言之,当一个设备不用之后,请正常varyoffvg后再关机。慎用varyonvg -b来解锁。

2)     HACMP相关命令

正常情况下,在HACMP切换时,会调用/usr/es/sbin/cluster/events/utils/cl_disk_available脚本去判断设备的类型、是否有锁等,然后再调用相关命令用于解锁。

/usr/es/sbin/cluster/events/utils:

cl_flutereset (for DS4000)

cl_fscsilunreset (for SCSI-3)

cl_iscsilunreset (for iSCSI)

cl_pscsilunreset (for SCSI-2)

cl_scdiskreset (for IBM 7135)

cl_vpathreset (for sdd)

注意:单独使用这些命令不一定对所有存储都适用,而且单独使用这些命令不被IBM官方所支持。

3)     lquerypr/pcmquerypr/pcmgenprkey

这几条命令在《Multipath Subsystem Device Driver User’s Guide》上有详细说明。在这不一一阐述。

8.      需要注意的一种情况(节选自《Multipath Subsystem Device Driver Users Guide》)

Understanding the persistent reserve issue when migratingfrom SDD to non-SDD volume groups after a system reboot

There is an issue with migrating from SDDto non-SDD volume groups after a system reboot. This issue only occurs if theSDD volume group was varied on prior to the system reboot and auto varyon wasnot set when the volume group was created. After the system reboot, the volumegroup will not be varied on.

The command to migrate from SDD to non-SDDvolume group (vp2hd) will succeed, but a subsequent command to vary on thevolume group will fail. This is because during the reboot, the persistentreserve on the physical volume of the volume group was not released, so whenyou vary on the volume group, the command will do a SCSI-2 reserve and failwith a reservation conflict.

There are two ways to avoid this issue.

1. Unmount the filesystems and vary offthe volume groups before rebooting the system.

2. Execute lquerypr -Vh /dev/vpathX on thephysical LUN before varying on volume groups after the system reboot. If theLUN is reserved by the current host, release the reserve by executing lquerypr-Vrh /dev/vpathX command. After successful execution, you will be able to varyon the volume group successfully.

简单总结来讲,这个问题产生原因是两个:

1.    AIX对hdisk组成的VG做varyon,相关hdisk加上的是SCSI-2 reservation;对vpath组成的VG做varyon,相关vpaht加上的是SCSI-3 reservation。

2.    在VG Varyon的情况下,直接shutdown AIX不会解锁。

该问题非常典型,请大家自己举一反三。

附录:

1.      varyonvg -b参数说明:

Breaks diskreservations on disks locked as a result of a normal varyonvg command. Use thisflag on a volume group that is already varied on.

Notes:

l  This flag unlocks all disks in a given volume group.

l  The -b flag opens the disks in the volume group usingSC_FORCED_OPEN flag. For SCSI and FC disks this forces open all luns on thetarget address that this disk resides on. Volume Groups should therefore notshare target addresses when using this varyon option.

l  The -b flag can cause a system hang if used on a volumegroup that contains an active paging space.

2.      Using the SC_FORCED_OPEN Option

The SC_FORCED_OPEN option causes the SCSIdevice driver to call the SCSI adapter device driver's Bus Device Reset ioctl (SCIORESET) operation on the first open.This forces the device to release another initiator's reservation. After the SCIORESET command is completed, otherSCSI commands are sent as in a normal open. If any of the SCSI commands faildue to a reservation conflict, the open registers the failure as an EBUSY status. This is also the resultif a reservation conflict occurs during a normal open. The SCSI device drivershould require the caller to have appropriate authority to request the SC_FORCED_OPEN option because thisrequest can force a device to drop a SCSI reservation. If the caller attemptsto initiate this system call without the proper authority, the SCSI devicedriver should return a value of -1, with the errno global variable set to a value of EPERM.

3.      Responsibilities of the SCSI Device Driver

SCSI devicedrivers are responsible for the following actions:

l  Interfacing with block I/O and logical-volumedevice-driver code in the operating system.

l  Translating I/O requests from the operating system intoSCSI commands suitable for the particular SCSI device. These commands are thengiven to the SCSI adapter device driver for execution.

l  Issuing any and all SCSI commands to the attached device.The SCSI adapter device driver sends no SCSI commands except those it isdirected to send by the calling SCSI device driver.

l  Managing SCSI device reservations and releases. In theoperating system, it is assumed that other SCSI initiators might be active onthe SCSI bus. Usually, the SCSI device driver reserves the SCSI device at opentime and releases it at close time (except when told to do otherwise throughparameters in the SCSI device driver interface). Once the device is reserved,the SCSI device driver must be prepared to reserve the SCSI device againwhenever a Unit Attention condition is reported through the SCSI request-sensedata.

4.      Responsibilities of the Device Driver

FCP, iSCSI, and VirtualSCSI Client device drivers are responsible for the following actions:

l  Interfacing with block I/O and logical-volumedevice-driver code in the operating system.

l  Translating I/O requests from the operating system intocommands suitable for the particular device. These commands are then given tothe adapter device driver for execution.

l  Issuing any and all commands to the attached device. Theadapter device driver sends no commands except those it is directed to send bythe calling device driver.

l  Managing device reservations and releases. In theoperating system, it is assumed that other initiators might be active on thetransport layer. Usually, the device driver reserves the device at open timeand releases it at close time (except when told to do otherwise through parametersin the device driver interface). Once the device is reserved, the device drivermust be prepared to reserve the device again whenever a Unit Attentioncondition is reported through the request-sense data.

5.      参考文献:

Multipathing on AIXby James Lee

Multipath SubsystemDevice Driver Users Guide

Kernel Extensionsand Device Support Programming Concepts

AIX CommandsReference

参与3

1同行回答

zwz99999zwz99999系统工程师dcits
自问自答显示全部

自问自答

收起
系统集成 · 2017-02-17
浏览2607

提问者

cuizengshun
系统运维工程师民生银行
擅长领域: 云计算服务器iaas

问题来自

相关问题

相关资料

相关文章

问题状态

  • 发布时间:2017-02-17
  • 关注会员:2 人
  • 问题浏览:5319
  • 最近回答:2017-02-17
  • X社区推广