孔再华
作者孔再华2017-09-12 16:14
数据库运维工程师, 中国民生银行

双活环境修改NSD server方案测试报告

字数 13178阅读 833评论 0赞 9

1.双活环境修改nsdserver步骤(临时)

1.1 现状问题分析

CF节点在nsdserver配置过程中之前被放在第一位,导致网络复制首选CF节点。CF节点CPU使用率高会影响nsdserver网络访问的效率。所以需要去掉CF节点的nsdserver身份。

1.2 临时在线解决方案

临时解决方案是轮流停止CF节点服务,CF节点mmfsshutdown,删除nsd盘符,启动mmfs,启动CF服务。双节点做完后,验证nsdserver已经变更。

1.2.1 检查当前mmfs的io性能

收集mmdiag信息:

while true
do
mmdiag --all >>iohist.cfnsd.all
sleep 5
done

会发现访问远程CF节点的盘延时毛刺比较多:

14:17:10.621474  WlogData    1:140423722        1    1.389  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.622953  W        data    1:38843216         8    7.797  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.631103  WlogData    1:140423722        1   12.734  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.643937  W        data    1:38843224        32    6.844  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.652004  WlogData    1:140423722        2    7.702  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.659794  W        data    1:38843256        40    5.479  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.666542  WlogData    1:140423723        1   22.412  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.689041  W        data    1:38843296         8   33.878  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.724138  WlogData    1:140423723        1    8.347  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.732581  W        data    1:38843304        40    7.568  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.741464  WlogData    1:140423723        1   22.095  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.763665  W        data    1:38843344         8   19.850  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.784732  WlogData    1:140423723        1   14.185  cli   C503991D:56551AC8    197.3.153.29 
14:17:10.799021  W        data    1:38843352         8    7.211  cli   C503991D:56551AC8    197.3.153.29

1.2.2 停止CF 128

停止主CF服务:

AGDPCCF1:/home/db2gdpc$date;db2stop 128
Thu Mar  9 14:28:52 BEIST 2017
SQL1064N  DB2STOP processing was successful.

QQ图片20170911163413.png

QQ图片20170911163413.png

QQ图片20170911163439.png
QQ图片20170911163439.png

QQ图片20170911163500.png
QQ图片20170911163500.png

TPS当时会有慢一下,没有事务失败回滚,事务整体平均最低延时1.35秒。影响事件10秒。从计费表里查到当时最长交易时间10秒,一共几十笔。

 1 10101015509142855936308096050453 2017-03-09-14.28.55.072000              10.394000
     1 10101015509142855499308012572818 2017-03-09-14.28.55.072000              10.508000
     1 10101015509142855459308009129650 2017-03-09-14.28.55.073000              10.523000
     1 10101015509142855782308079430060 2017-03-09-14.28.55.073000              10.405000
     1 10101015509142855133308046351536 2017-03-09-14.28.55.073000              10.393000
     1 10101015509142855851308095118236 2017-03-09-14.28.55.073000              10.393000
     1 10101015509142855589308068119281 2017-03-09-14.28.55.079000              10.501000
     1 10101015509142855757308090765055 2017-03-09-14.28.55.086000              10.512000
     1 10101015509142855496308083238624 2017-03-09-14.28.55.086000              10.494000
     1 10101015509142855542308025928627 2017-03-09-14.28.55.087000              10.512000
     1 10101015509142855471308048771297 2017-03-09-14.28.55.087000              10.379000
     1 10101015509142855627308037490400 2017-03-09-14.28.55.087000              10.491000
     1 10101015509142855671308080128957 2017-03-09-14.28.55.097000              10.486000
     1 10101015509142855281308044368335 2017-03-09-14.28.55.109000              10.470000
     3 10101015509142855702335241916396 2017-03-09-14.28.55.165000              10.268000
     3 10101015509142855557335284688742 2017-03-09-14.28.55.169000              10.319000
     3 10101015509142855670335253547049 2017-03-09-14.28.55.169000              10.321000
     3 10101015509142855502335228107488 2017-03-09-14.28.55.170000              10.318000
     3 10101015509142855654335234218712 2017-03-09-14.28.55.170000              10.316000
     3 10101015509142855399335282667885 2017-03-09-14.28.55.171000              10.262000
     3 10101015509142855169335210989428 2017-03-09-14.28.55.171000              10.262000
     3 10101015509142855812335266463084 2017-03-09-14.28.55.171000              10.262000
     3 10101015509142855044335226787200 2017-03-09-14.28.55.171000              10.262000
     3 10101015509142855800335285053503 2017-03-09-14.28.55.171000              10.262000
     3 10101015509142855411335265116754 2017-03-09-14.28.55.172000              10.261000
     1 10101015509142855185308087429892 2017-03-09-14.28.55.175000              10.415000
     3 10101015509142855442335274188365 2017-03-09-14.28.55.176000              10.312000
     1 10101015509142855480308035133600 2017-03-09-14.28.55.177000              10.414000
     1 10101015509142855832308088509360 2017-03-09-14.28.55.180000              10.422000
     3 10101015509142855740335294075636 2017-03-09-14.28.55.181000              10.309000
     3 10101015509142855949335280253007 2017-03-09-14.28.55.181000              10.303000
     3 10101015509142855061335280469010 2017-03-09-14.28.55.182000              10.307000
     3 10101015509142855634335207282559 2017-03-09-14.28.55.183000              10.309000
     3 10101015509142855395335296422759 2017-03-09-14.28.55.183000              10.306000
     3 10101015509142855064335235673430 2017-03-09-14.28.55.187000              10.296000
     3 10101015509142855748335280858660 2017-03-09-14.28.55.187000              10.302000
     3 10101015509142855118335220997486 2017-03-09-14.28.55.187000              10.302000
     3 10101015509142855189335252885424 2017-03-09-14.28.55.188000              10.298000
     3 10101015509142855345335258486914 2017-03-09-14.28.55.188000              10.301000
     3 10101015509142855088335250905487 2017-03-09-14.28.55.188000              10.298000
     3 10101015509142855761335227894831 2017-03-09-14.28.55.189000              10.299000
     3 10101015509142855104335223718087 2017-03-09-14.28.55.190000              10.295000
     3 10101015509142855844335237573358 2017-03-09-14.28.55.190000              10.299000
     3 10101015509142855963335289414630 2017-03-09-14.28.55.190000              10.299000
     3 10101015509142855939335297827038 2017-03-09-14.28.55.190000              10.293000
     3 10101015509142855380335272910813 2017-03-09-14.28.55.190000              10.296000
     3 10101015509142855866335272715061 2017-03-09-14.28.55.190000              10.299000
     3 10101015509142855124335223949487 2017-03-09-14.28.55.191000              10.298000
     3 10101015509142855713335209504251 2017-03-09-14.28.55.197000              10.289000

1.2.3 停止GPFS,删除盘

AGDPCCF1:/cmbc_admin/kzh#date;mmshutdown
Thu Mar  9 14:36:27 BEIST 2017
Thu Mar  9 14:36:27 BEIST 2017: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems
forcedunmount of /db2home_shared
forcedunmount of /chgm_db2log
forcedunmount of /chgm_db2data
forcedunmount of /chgm_db2arc
Thu Mar  9 14:36:32 BEIST 2017: 6027-1344 mmshutdown: Shutting down GPFS daemons
Shutting down!
'shutdown' command about to kill process 7471104
Thu Mar  9 14:36:44 BEIST 2017: 6027-1345 mmshutdown: Finished
AGDPCCF1:/cmbc_admin/kzh#
AGDPCCF1:/cmbc_admin/kzh#
AGDPCCF1:/cmbc_admin/kzh#
AGDPCCF1:/cmbc_admin/kzh#date;rmdev -l hdisk8;rmdev -l hdisk16
Thu Mar  9 14:36:55 BEIST 2017
hdisk8 Defined
hdisk16 Defined

检查nsdserver已经发生变化。
QQ截图20170911163906.png

QQ截图20170911163906.png

检查GPFS操作过程中,TPS不受影响:
QQ图片20170911163953.png
QQ图片20170911163953.png

QQ图片20170911164025.png
QQ图片20170911164025.png

1.2.4 启动mmfs,启动cf服务:

启动GPFS:

AGDPCCF1:/cmbc_admin/kzh#date;mmstartup
Thu Mar  9 14:38:57 BEIST 2017
Thu Mar  9 14:39:00 BEIST 2017: 6027-1642 mmstartup: Starting GPFS ...
6027-2114 The GPFS subsystem is already active.

启动CF服务

AGDPCCF1:/home/db2gdpc$date;db2start 128
Thu Mar  9 14:43:01 BEIST 2017
SQL1063N  DB2START processing was successful.

检查启动之后,CF catchup(CF_CATCHUP_TRGT=1)对TPS有一点点影响,事务平均时间峰值0.5秒,持续4分钟左右完全恢复,没有事务失败。
QQ图片20170911164644.png

QQ图片20170911164644.png

QQ图片20170911164713.png
QQ图片20170911164713.png

查看计费表,部分交易延时多点。

1 10101014209144542708308094540057 2017-03-09-14.45.42.683000               1.162000
     1 10101014209144542204308036616771 2017-03-09-14.45.42.685000               1.158000
     1 10101014209144542479308030425140 2017-03-09-14.45.42.687000               1.158000
     1 10101014209144542651308069568194 2017-03-09-14.45.42.687000               1.157000
     3 10101014609144546912335295734821 2017-03-09-14.45.46.573000               3.498000
     3 10101015109144551106335208160765 2017-03-09-14.45.51.660000               1.025000
     3 10101015409144554300335258598932 2017-03-09-14.45.54.714000               1.687000
     1 10101015509144555657308005211055 2017-03-09-14.45.55.504000               1.250000
     1 10101015509144555089308034315481 2017-03-09-14.45.55.509000               1.241000
     1 10101015809144558640308055146910 2017-03-09-14.45.58.143000               1.002000
     3 10101015809144558605335272176019 2017-03-09-14.45.58.730000               1.186000
     3 10101015809144558668335244807229 2017-03-09-14.45.58.746000               1.392000
     3 10101015809144558745335282482239 2017-03-09-14.45.58.763000               1.154000
     3 10101010509144605638335256888549 2017-03-09-14.46.05.077000               7.649000
     3 10101010509144605345335271991492 2017-03-09-14.46.05.881000               6.856000
     3 10101012009144620418335241353954 2017-03-09-14.46.20.903000               3.598000
     3 10101012009144620903335220734053 2017-03-09-14.46.20.962000               3.532000
     3 10101012109144621518335245495823 2017-03-09-14.46.21.015000               3.478000
     3 10101013409144634816335248459126 2017-03-09-14.46.34.667000               3.616000
     1 10101013609144636269308037248010 2017-03-09-14.46.36.669000               1.625000
     1 10101013709144637667308046670440 2017-03-09-14.46.37.093000               1.203000
     1 10101014709144647978308010485258 2017-03-09-14.46.47.148000               1.067000
     3 10101015209144652444335277285210 2017-03-09-14.46.52.610000               1.113000
     1 10101010509144705161308083456798 2017-03-09-14.47.05.953000               1.135000

1.2.5 停止CF 129

确认CF是peer状态:
QQ截图20170911165942.png

QQ截图20170911165942.png

停止CF 129服务:

BGDPCCF1:/home/db2gdpc$date;db2stop 129
Thu Mar  9 15:02:42 BEIST 2017
SQL1064N  DB2STOP processing was successful.

QQ图片20170911170152.png

QQ图片20170911170152.png

QQ图片20170911170205.png
QQ图片20170911170205.png

QQ图片20170911170215.png
QQ图片20170911170215.png

1.2.6 停止GPFS,删除盘

BGDPCCF1:/chgm_db2arc/fio-2.0.14/examples#date;mmshutdown
Thu Mar  9 15:07:02 BEIST 2017
Thu Mar  9 15:07:02 BEIST 2017: 6027-1341 mmshutdown: Starting force unmount of GPFS file systems
forcedunmount of /db2home_shared
forcedunmount of /chgm_db2log
forcedunmount of /chgm_db2data
forcedunmount of /chgm_db2arc
Thu Mar  9 15:07:07 BEIST 2017: 6027-1344 mmshutdown: Shutting down GPFS daemons
mmremote: Invalid current working directory detected: /chgm_db2arc/fio-2.0.14/examples
  The command may fail in an unexpected way.  Processing continues ...
mmfsadm: Invalid current working directory detected: /chgm_db2arc/fio-2.0.14/examples
  The command may fail in an unexpected way.  Processing continues ...
Shutting down!
'shutdown' command about to kill process 9699478
Thu Mar  9 15:07:17 BEIST 2017: 6027-1345 mmshutdown: Finished
BGDPCCF1:/chgm_db2arc/fio-2.0.14/examples#date;rmdev -l hdisk8;rmdev -l hdisk16
Thu Mar  9 15:07:19 BEIST 2017
hdisk8 Defined
hdisk16 Defined

检查nsdserver已经发生变化。
QQ截图20170911170737.png

QQ截图20170911170737.png

GPFS操作过程中,TPS不受影响

1.2.7 启动mmfs,启动cf服务:

启动GPFS:

BGDPCCF1:/#mmstartup
Thu Mar  9 15:09:34 BEIST 2017: 6027-1642 mmstartup: Starting GPFS ...
6027-2114 The GPFS subsystem is already active.

启动CF服务

BGDPCCF1:/home/db2gdpc$date;db2start 129
Thu Mar  9 15:09:59 BEIST 2017
SQL1063N  DB2START processing was successful.

检查启动之后,CF catchup对TPS影响与上一次一样,事务时间峰值0.5秒,持续4分钟左右完全恢复,没有事务失败。
QQ图片20170911171037.png

QQ图片20170911171037.png

QQ图片20170911171048.png
QQ图片20170911171048.png

整个方案过程中,没有事务失败回滚。
QQ图片20170911171442.png
QQ图片20170911171442.png

QQ图片20170911171650.png
QQ图片20170911171650.png

1.2.8 检查当前mmfs的io性能

收集mmdiag信息:

while true
do
mmdiag --all >>iohist.memnsd.all
sleep 5
done

会发现访问远程member节点的盘延时毛刺少很多:
QQ截图20170911171940.png

QQ截图20170911171940.png

Mmdiag收集数据
QQ截图20170911172455.png
QQ截图20170911172455.png

1.3 临时CF节点重启解决方案

如果CF节点机器维护需要重启,为了不再成为nsdserver。临时方案是启动机器后,执行如下语句:

#date;mmshutdown
#date;rmdev -l hdisk8;rmdev -l hdisk16
#date;mmstartup

2. 双活环境修改nsdserver步骤(长久)

2.1 停集群修改nsdserver

停止数据库集群,将CM置维护,umount所有GPFS文件系统,修改NSD属性生效。

2.1.1 停止集群

QQ截图20170911172749.png

QQ截图20170911172749.png

2.1.2 置维护

AGDPCMB1:/chgm_db2arc/fio-2.0.14/examples#db2cluster  -CM  -ENTER -MAINTENANCE  -ALL
Domain 'db2domain_20160519090623' has entered maintenance mode.

2.1.3 Umount文件系统

AGDPCMB1:/#mmumount all -f -a
Thu Mar  9 15:58:02 BEIST 2017: 6027-1674 mmumount: Unmounting file systems ...
BGDPCMB1:  forced unmount of /chgm_db2arc

2.1.4 修改磁盘属性

查看当前属性
QQ截图20170911172951.png

QQ截图20170911172951.png

修改属性:
QQ截图20170911173040.png
QQ截图20170911173040.png

QQ截图20170911173056.png
QQ截图20170911173056.png

查看结果:
QQ截图20170912160510.png
QQ截图20170912160510.png

2.1.5 mount文件系统

AGDPCMB1:/#db2cluster  -CM  -EXIT -MAINTENANCE  -ALL 
Domain 'db2domain_20160519090623' has successfully exited maintenance mode.

检查文件系统nsdserver属性:
QQ截图20170912161305.png

QQ截图20170912161305.png

2.1.6 取消维护

AGDPCMB1:/chgm_db2arc/fio-2.0.14/examples#db2cluster  -CM  -ENTER -MAINTENANCE  -ALL
Domain 'db2domain_20160519090623' has entered maintenance mode.

2.1.7 启动集群

db2startinstance on BGDPCCF1
db2startinstance on AGDPCCF1
db2startinstance on BGDPCMB2
db2start instance on BGDPCMB1
db2startinstance on AGDPCMB2
db2startinstance on AGDPCMB1
db2start

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

9

添加新评论0 条评论

Ctrl+Enter 发表

关于TWT  使用指南  社区专家合作  厂商入驻社区  企业招聘  投诉建议  版权与免责声明  联系我们
© 2019  talkwithtrend — talk with trend,talk with technologist 京ICP备09031017号-30