Appendix B: DiagnosticsΒΆ
- root ssh keys not setup – If you are prompted for a password when ssh to the service node, then check to see if
/root/.sshdirectory on MN hasauthorized_keysfile. If the directory does not exist or no keys, runxdsh service -K, to exchange the ssh keys for root. You will be prompted for the root password, which should be the password you set for thekey=systemin the passwd table. - XCAT rpms not on SN – On the SN, run
rpm -qa | grep xCATand make sure the appropriate xCAT rpms are installed on the servicenode. See the list of xCAT rpms in Diskful (Stateful) Installation. If rpms are missing, check your install setup as outlined in Diskless (Stateless) Installation for diskless or Diskful (Stateful) Installation for diskful installs. - otherpkgs(including xCAT rpms) installation failed on the SN – The OS repository is not created on the SN. When the “yum” command is processing the dependency, the rpm packages (including expect, nmap, and httpd, etc) required by xCATsn can’t be found. In this case, check whether the
/install/postscripts/repos/<osver>/<arch>/directory exists on the MN. If it is not on the MN, you need to re-run thecopycdscommand, and there will be files created under the/install/postscripts/repos/<osver>/<arch>directory on the MN. Then, you need to re-install the SN. - Error finding the database/starting xcatd – If on the Service node when you run tabdump site, you get “Connection failure: IO::Socket::SSL: connect: Connection refused at
/opt/xcat/lib/perl/xCAT/Client.pm”. Then restart the xcatd daemon and see if it passes by running the commandservice xcatd restart. If it fails with the same error, then check to see if/etc/xcat/cfglocfile exists. It should exist and be the same as/etc/xcat/cfglocon the MN. If it is not there, copy it from the MN to the SN. The runservice xcatd restart. This indicates the servicenode postscripts did not complete successfully. Runlsdef <service node> -i postscripts -cand verifyservicenodepostscript appears on the list.. - Error accessing database/starting xcatd credential failure– If you run
tabdump siteon the service node and get “Connection failure: IO::Socket::SSL: SSL connect attempt failed because of handshake problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown at/opt/xcat/lib/perl/xCAT/Client.pm”, check/etc/xcat/cert. The directory should contain the filesca.pemandserver-cred.pem. These were suppose to transfer from the MN/etc/xcat/certdirectory during the install. Also check the/etc/xcat/cadirectory. This directory should contain most files from the/etc/xcat/cadirectory on the MN. You can manually copy them from the MN to the SN, recursively. This indicates the the servicenode postscripts did not complete successfully. Runlsdef <service node> -i postscripts -cand verifyservicenodepostscript appears on the list. Runservice xcatd restartagain and try the tabdump site again. - Missing ssh hostkeys – Check to see if
/etc/xcat/hostkeyson the SN, has the same files as/etc/xcat/hostkeyson the MN. These are the ssh keys that will be installed on the compute nodes, so root can ssh between compute nodes without password prompting. If they are not there copy them from the MN to the SN. Again, these should have been setup by the servicenode postscripts. - Errors running hierarchical commands such as xdsh – xCAT has a number of commands that run hierarchically. That is, the commands are sent from xcatd on the management node to the correct service node xcatd, which in turn processes the command and sends the results back to xcatd on the management node. If a hierarchical command such as xcatd fails with something like “Error: Permission denied for request”, check
/var/log/messageson the management node for errors. One error might be “Request matched no policy rule”. This may mean you will need to add policy table entries for your xCAT management node and service node. - /install is not mounted on service node from managemen mode – If service node does not have
/installdirectory mounted from management node, runlsdef -t site clustersite -i installlocand verifyinstallloc="/install"