The term "hang" refers to a failure to process. Processes may hang for a variety of reasons that have nothing to do with GT.M. However, hanging GT.M processes may indicate that a database has become inaccessible. When you suspect a hang, first determine the extent of the problem.
Your tools include:
Knowledge of the application and how it is used
Communication with users
The ps command and other UNIX system utilities
WHEN MANY PROCESSES ON A SYSTEM ARE HANGING, determine if the hangs are confined to a particular application. If all applications are affected or if processes not using GT.M databases are affected, the problem is not a database-specific problem, but something more general, such as a UNIX problem. Refer to section H6.
WHEN ONLY ONE PROCESS IS HANGING, find out whether that process is the only one using a particular GT.M application. If it is the only process, start some appropriate second process and determine whether the second process is also affected.
IF A PROCESS HANGS WHILE OTHER PROCESSES ACCESSING THE SAME DATABASE CONTINUE TO PROCESS, the problem is not a database problem. Refer to section H2 and then to section H8.
WHEN ONLY GT.M PROCESSES RUNNING A PARTICULAR APPLICATION HANG,the problem may be a database problem. Refer to section H2.
Is the system "hung?" If so, consider the following additional questions:
Does LKE work? If not, then a database has problems (see below).
Are there locks owned by a nonexistent process? Can they be cleared? What were the circumstances of a process leaving locks?
Are there locks which are not changing? What is the state of the owning process(es)? If not all processes are hung, can the stalled process(es) be MUPIP STOPped?
Does some region have a "persistent" owner of the critical section (crit)? Which one(s)?
If there is a crit owner, what is its state? If it is a nonexistent process can it be -REMOVED?
Does a CRIT -INIT -RESET free the section or just change who owns it?
If CRIT -INIT -RESET doesn't free the problem, the cache is damaged.
The following is another way of testing the cache: If CRIT is cleared and DSE BUFFER hangs, the cache is not working. Use MUPIP STOP and/or CRIT -INIT -RESET to get everyone out of the segment, then use DSE WCINIT. After a WCINIT, make sure that you can successfully exit from DSE. Use MUPIP INTEG (-FAST) to check for damage which can be induced by WCINIT.