作者·2010-11-11 10:41

使用“Hung thread detection” 可以调试应用程序的性能问题

字数 5878阅读 7375评论 0赞 0

关键在于出现问题后让其生成javacore，这样就能跟踪线程堆栈了。

参考资料：

Hung thread detection in WebSphere Application Server

We've probably all see a hung JVM at one time or another and chances are you've figured this out in one of two was if you're dealing with WebSphere Application Server: 1. the users are complaining that the browser just “spins” and never returns a web page, or 2. you've noticed output in the WebSphere logs (SystemOut.log) that indicate potentially hung threads. For the purposes of this discussion, we'll focus on the latter method.

WebSphere Application Server provides a feature that detects hung threads or that is to say, threads that have been active passed a certain time threshold and are suspected of being hung. Let's just stop here for a moment to clarify some terms and concepts.

A hung thread is a thread that is being blocked by a blocking call or is waiting on a monitor ( sync locked object) to be releases so that it can use it.
WebSphere Application Server will output messages in the SystemOut log file with a message ID of WSVR0605W. This message simply indicates that a thread MAY be hung but, there is no way for WebSphere to make certain of this since it does not know the expected transaction length for the operations the thread is performing at the time. It's goal is simply to tell you about it so you can investigate it.
The hung thread detection code will also notify you (via output in the SystemOut log file) that a previously reported hung thread actually completed its work. This message ID is WSVR0606W.
Hang detection works only with WebSphere managed threads (e.g. thread pools) and does NOT monitor user created threads.

So let's look at an example :

WSVR0605W: Thread "WebContainer : 1" has been active for 612,000 milliseconds and may be hung. There are 3 threads in total in the server that may be hung.

So the message above tells us that the thread named “ WebContainer : 1” has been doing something for 612 seconds or about 10 minutes and that there 3 other threads active in the JVM that my also hung (been active for longer than the threshold time)

An obvious question you may ask at this point is : “How long does a thread have to be active before the hang detection feature identifies the thread and tells me about it?”. And that answer is 10 minutes, by default. The good news is that the hang detection feature can be tuned a bit to better suit your needs. But before we go there, let's talk for a minute about what happens when the hang detection feature fires off a warning in the log.

First, as we've already seen, a log entry is output. Also, at the same time a JMX event is emitted from the server of the type TYPE_THREAD_MONITOR_THREAD_HUNG . Using this event type, you could code a JMX listener that could take some sort of action whenever it received notification of a hung thread. Likewise, if you wanted to take some action when the notification about the thread clearing up (if it ever does) you could listen for the JMX event type TYPE_THREAD_MONITOR_THREAD_CLEAR .

The hang detection feature also attempts to self-tune based on the number of hang warnings and subsequent clearing messages that it emits. It will attempt to adjust the trigger threshold (10 mins by default) to a higher or lower value in order to minimize the false positives seen in the logs. A message will be displayed in the log when the self-tuning occurs.

WSVR0607W: Too many thread hangs have been falsely reported. The hang threshold is now being set to thresholdtime .

OK, so now we know what to look for and how it gets there, let's look at how to tune the hang detection feature to match our needs. To set these values simply navigate to the application server instance you wish to configure click on Administration and then Custom Properties (all of this is in the Administration Console).

Here is a list of the properties you can configure (take directly from the WAS Information Center)

Name:com.ibm.websphere.threadmonitor.interval Value: The frequency (in seconds) at which managed threads in the selected application server will be interrogated. Default: 180 seconds (three minutes). Name: com.ibm.websphere.threadmonitor.threshold Value: The length of time (in seconds) in which a thread can be active before it is considered hung. Any thread that is detected as active for longer than this length of time is reported as hung. Default: The default value is 600 seconds (ten minutes). Name: com.ibm.websphere.threadmonitor.false.alarm.threshold Value: The number of times (T) that false alarms can occur before automatically increasing the threshold. It is possible that a thread that is reported as hung eventually completes its work, resulting in a false alarm. A large number of these events indicates that the threshhold value is too small. The hang detection facility can automatically respond to this situation: For every T false alarms, the threshold T is increased by a factor of 1.5. Set the value to zero (or less) to disable the automatic adjustment. Default: 100 Name: com.ibm.websphere.threadmonitor.dump.java Value: Set to true to cause a javacore to be created when a hung thread is detected and a WSVR0605W message is printed. The threads section of the javacore can be analyzed to determine what the reported thread and other related threads are doing. Default: False

Now, should you need to, you can configure the hang detection feature of WebSphere Application Server to meet your exacting specifications for detecting potentially hung threads in your JVM.

性能中间件 thread 调试

著作权归作者所有

如果觉得我的文章对您有用，请点赞。您的支持将鼓励我继续创作！

添加新评论0 条评论

Ctrl+Enter 发表

匿名评论

使用“Hung thread detection” 可以调试应用程序的性能问题

添加新评论0 条评论

作者其他文章

相关文章

相关问题

相关资料