AIX disk I/O pacing vs PowerHA

字数 3827阅读 5368评论 1赞 0

Recently i’ve played with disk I/O pacing (technically minpout and maxpout parameters) on AIX 6.1. The root cause of this investigation is based on some high I/O waits in production. The default AIX 6.1 and 7.1 systems are coming with maxpout set to 8193 and minpout respecitvletly to 4096. However old deployments and HACMP <= 5.4.1 installations procedures are still recommending setting value of 33/20, just take a look on some old High Availability Cluster Multi-Processing Troubleshooting guide:
“Although the most efficient high- and low-water marks vary from system to system, an initial high-water mark of 33 and a low-water mark of 24 provides a good starting point. These settings only slightly reduce write times and consistently generate correct fallover behavior from the HACMP software.”

Additionally IBM described the whole disk I/O pacing mechanics in this link

100% write results with maxpout=8193 minpout=4096 (defaults)
orion_20110706_0525_summary.txt:Maximum Large MBPS=213.55 @ Small=0 and Large=6
orion_20110706_0525_summary.txt:Maximum Small IOPS=32323 @ Small=2 and Large=0

100% write results with maxpout=33 minpout=24 (old HACMP)
orion_20110706_0552_summary.txt:Maximum Large MBPS=109.64 @ Small=0 and Large=10
orion_20110706_0552_summary.txt:Maximum Small IOPS=31203 @ Small=5 and Large=0

As you can see there is drastic change when it comes to large sequential writes, but don’t change that parameters yet! You really need to read the following sentence from the IBM site “The maxpout parameter specifies the number of pages that can be scheduled in the I/O state to a file before the threads are suspended. “. It is obvious that those parameters are set for the reason of avoiding I/O resource hungry processes dominating the CPU. If you take that the smallest page size is 4kB with 64kB automatically used sometimes by Oracle you might get 33*64kB = 2112kB value in best case, after which thread doing the I/O is suspended ! Ouch! So what is the process of taking the CPU from active process? It is called involuntary context switch…

Why it is all important with HACMP/PowerHA software? Because it is implemented as normal shell scripts and some RSCT processes which are mostly not running in kernel mode at all (at least everything below PowerHA 7.1)! The only way PowerHA/HACMP can give more CPU horse power to the processes needing is to give them higher priority on the run queue, just take a look on the PRI column..

root@X: /home/root :# ps -efo "sched,nice,pid,pri,comm"|sort -n +3|head 
SCH NI     PID PRI COMMAND 
  - --  520286  31 hatsd 
  - --  729314  38 hats_diskhb_nim 
  - --  938220  38 hats_nim 
  - --  622658  39 hagsd 
  - 20  319502  39 clstrmgr 
  - 20  348330  39 rmcd 
  - 20  458874  39 gsclvmd 
  - 20  508082  39 gsclvmd 
  - 20  528554  39 gsclvmd 
root@X: /home/root :#

.. but it appears that even that could not be not enough in some situations. As you can see there is no easy answer to that question and you would have to drive your exact system configuration to 99/100% of CPU utilization on all POWER cores to see if you are not getting random failovers, because e.g. some HACMP process would not get heartbeat too late due to the CPU scheduling. So perhaps the way forward for 33/20 environments here is to leave them somewhere between 33/20 and 8193/4096 (like the mentioned 513/256 in the IBM manuals). Another thing is that you should never have a system driven to > 80% CPU usage on all physical cores, but you should now that already:)

systems default

著作权归作者所有

如果觉得我的文章对您有用，请点赞。您的支持将鼓励我继续创作！

添加新评论1 条评论

wushuangri工程师fivefu
2013-01-04 09:57

good

Ctrl+Enter 发表

匿名评论

AIX disk I/O pacing vs PowerHA

添加新评论1 条评论

作者其他文章