When your java application is getting unresponsive or requests are taking time, taking thread dumps will help find the cause.

If you take one thread dump, you might not be getting much information. You need to take at least 10 to 15 thread dumps with 3 to 5 seconds apart, and then you go through each thread dump to find what is causing threads to hang.

Getting thread dumps

Below is how you can take thread dumps with

  • Find the JVM process id. ps -ef | grep java can be used to find the java pid.
  • Take thread dump using jstack for every 3 seconds
    for i in $(seq 1 10); do jstack $pid > td_${i}.log; sleep 3; done
    

    Will generate ten thread dumps, taken 3 seconds apart.

    -rw-rw-r-- 1 user app 287254 Oct 29 19:02 td_1.log
    -rw-rw-r-- 1 user app 287254 Oct 29 19:02 td_2.log
    -rw-rw-r-- 1 user app 287254 Oct 29 19:02 td_3.log
    -rw-rw-r-- 1 user app 288211 Oct 29 19:02 td_4.log
    -rw-rw-r-- 1 user app 287726 Oct 29 19:02 td_5.log
    -rw-rw-r-- 1 user app 288594 Oct 29 19:02 td_6.log
    -rw-rw-r-- 1 user app 288581 Oct 29 19:03 td_7.log
    -rw-rw-r-- 1 user app 288581 Oct 29 19:03 td_8.log
    -rw-rw-r-- 1 user app 287996 Oct 29 19:03 td_9.log
    -rw-rw-r-- 1 user app 287983 Oct 29 19:03 td_10.log
    

Analyzing thread dumps

Now we have ten thread dumps first, open the first thread, and check for the request which is not responding. For example, you know that the /atg/bcc request is taking time, then search /atg/bcc or respective session id in the thread.

For example in below thread dump (td_1.log), where thread ajp-0.0.0.0-20109-24 with request uri /atg/bcc is at socket read waiting for oracle database result.

"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.74.8 userid=iuser
36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 method=GET SAT_TRAN
S=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:128)
        at oracle.net.ns.Packet.receive(Packet.java:300)
        at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
        at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
        at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
        at oracle.jdbc.driver.T4CPreparedStatement.fetch(T4CPreparedStatement.java:1119)
        at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:373)
        - locked <0x000000058b076d18> (a oracle.jdbc.driver.T4CConnection)

Now we can check what is the state of this thread ajp-0.0.0.0-20109-24 in remaining threads.

grep -A 2 "ajp-0.0.0.0-20109-24" `ls -ltr td* | awk '{print $9}'` | less -i

Above will search for thread ajp-0.0.0.0-20109-24 and -A 2 will get the next two lines. ls -ltr td* | awk '{print $9}' will make sure that threads are listed in order. Optionally use less -i to open in less to view and search for any additional patterns. Below is how the sample output would be

td_1.log:"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.7
4.8 userid=iuser36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 met
hod=GET SAT_TRANS=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
td_1.log-   java.lang.Thread.State: RUNNABLE
td_1.log-        at java.net.SocketInputStream.socketRead0(Native Method)
--
td_2.log:"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.7
4.8 userid=iuser36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 met
hod=GET SAT_TRANS=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
td_2.log-   java.lang.Thread.State: RUNNABLE
td_2.log-        at java.net.SocketInputStream.socketRead0(Native Method)
--
...
--
td_10.log:"ajp-0.0.0.0-20109-24 requestURI=/atg/bcc jsessionid=7E6D7F3AE9176254E572EE0EAA935F5B remoteAddr=10.65.
74.8 userid=iuser36000215 userAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36 me
thod=GET SAT_TRANS=null" daemon prio=10 tid=0x00007fd198022000 nid=0x471f runnable [0x00007fcf35c53000]
td_10.log-   java.lang.Thread.State: RUNNABLE
td_10.log-       at java.net.SocketInputStream.socketRead0(Native Method)

As you can see, the above thread is waiting for the oracle result in the last thread dump, too (after 30 secs). We can check the main stack trace and see if the query is taking time due to a database issue. Similarly, you can also search for any lock acquired by threads.

You can also get snapshot of all thread dumps which are waiting on oracle result.

grep -B 19 'oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next' \ 
     `ls -ltr td* | awk '{print $9}'` \ 
     | grep ajp | awk '{print $1}'

Output

td_1.log-"ajp-0.0.0.0-20109-24
td_2.log-"ajp-0.0.0.0-20109-24
td_4.log-"ajp-0.0.0.0-20109-27
td_5.log-"ajp-0.0.0.0-20109-27
td_6.log-"ajp-0.0.0.0-20109-27
td_7.log-"ajp-0.0.0.0-20109-27
td_7.log-"ajp-0.0.0.0-20109-23
td_8.log-"ajp-0.0.0.0-20109-27
td_8.log-"ajp-0.0.0.0-20109-23
td_9.log-"ajp-0.0.0.0-20109-27
td_9.log-"ajp-0.0.0.0-20109-23
td_10.log-"ajp-0.0.0.0-20109-27
td_10.log-"ajp-0.0.0.0-20109-23

First you search oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next (this can be any pattern you are searching for) and use -B 19 (print 19 lines before this pattern) and on these results, search for ajp threads.

Conclusion

By following the above steps, you can analyze thread dumps to find issues with jstack, grep, ls, ‘ awk’. Hope this helps – RC