Last week we upgraded our Master and Media servers to Netbackup 7.6.0.4 from 7.5.0.7 completing on the 17th April. Everything worked fine until 19th April when the we noticed Oracle and SQL backups were failing with an error 6 and an Exchange backup which normally takes 2.5 hours was still running after 2 days. Client servers are running a variety of versions from 7.1 to 7.5.0.7. We have two media servers and we only have this problem on one media server and the problem only affects some of the servers.
Error codes we are getting are 6, 41 and 196.
Investigating the problem the we found that the problem is the backups to the MSDP storage times out or is very slow in the case of the Exchange backup. Looking in the logs I have seen the following errors:
bptm
00:00:56.003 [7068.8068] <16> 86851:bptm:7068:hpwp-med01-bkp.hcuk.pri: [ERROR] PDVFS: pdvfs_lib_log: mtstrm_is_server_started: Cr_ErrnoException: Get Process Status command failed: Cr_ErrnoException: Timed out after waiting 1200s to send command Get Process Status to mtstrmd
00:00:56.003 [7068.8068] <16> 86851:bptm:7068:hpwp-med01-bkp.hcuk.pri: [ERROR] PDVFS: PdvfsMtstrmStartDaemon: The multi-threaded streaming daemon is not ready to receive requests even after 4 attempts. Check if mtstrmd is running and look for errors in mtstrmd.log.
00:00:56.003 [7068.8068] <16> 86851:bptm:7068:hpwp-med01-bkp.hcuk.pri: [ERROR] PDSTS: start_mtstrm_session: PdvfsMtstrmStartDaemon failed (18:Improper link)
00:30:28.676 [8116.13800] <16> 84077:bptm:8116:hpwp-med01-bkp.hcuk.pri: libsts openp() 15/04/21 00:30:28: stsm_open_module C:\Program Files\Veritas\NetBackup\bin\\ost-plugins\libstspizapi.dll failed err 2060053 platerr 193: LoadLibrary
dbclient
00:45:29.702 [11258] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/11258.0.1429572612
01:00:30.755 [11258] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/11258.0.1429572612
01:00:30.756 [11258] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/11258.0.1429572612>
01:00:30.756 [11258] <16> CreateNewImage: ERR - serverResponse() failed
01:00:30.756 [11258] <16> VxBSACreateObject: ERR - Could not create new image with file /arch_ultq4t6kk_s17085_p1_t877566612.
bpdm
00:00:17.860 [900.10140] <16> 87153:bptm:900:hpwp-med01-bkp.hcuk.pri: libsts openp() 15/04/22 00:00:17: stsm_open_module C:\Program Files\Veritas\NetBackup\bin\\ost-plugins\libstspizapi.dll failed err 2060053 platerr 193: LoadLibrary failed
These errors seem to show the backups are unable to write to the storage as the backups seem to go to the media server ok and then fail because they cannot write to the storage. Apart from the upgrade the Netbackup configuration has not changed.
My gut feeling is that the failures are not caused by the upgrade and possibly not by Netbackup as these errors seem to be a consquence rather than a cause and we had at least a days worth of successful backups beform we started to have problems but as always for a Netbackup Administrator the problem is proving it.
Is anyone able give me any ideas? I've worked on this for 2 days and I am out of them.